Wednesday, March 10, 2010

Blast in ubuntu

BLAST
http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/unix_setup.html

http://www.ncbi.nlm.nih.gov/staff/tao/URLAPI/blastdb.html

Fasta -> DB
For nucleotide: formatdb -i input_db -p F -o T
For protein: formatdb -i input_db -p T -o T

----refp_db----
>gi|113722133|ref|NP_055861.3| probable helicase senataxin [Homo sapiens]
MSTCCWCTPGGASTIDFLKRYASNTPSGEFQTADEDLCYCLECVAEYHKARDELPFLHEVLWELETLRLI
NHFEKSMKAEIGDDDELYIVDNNGEMPLFDITGQDFENKLRVPLLEILKYPYLLLHERVNELCVEALCRM
EQANCSFQVFDKHPGIYLFLVHPNEMVRRWAILTARNLGKVDRDDYYDLQEVLLCLFKVIELGLLESPDI
YTSSVLEKGKLILLPSHMYDTTNYKSYWLGICMLLTILEEQAMDSLLLGSDKQNDFMQSILHTMEREADD
DSVDPFWPALHCFMVILDRLGSKVWGQLMDPIVAFQTIINNASYNREIRHIRNSSVRTKLEPESYLDDMV
TCSQIVYNYNPEKTKKDSGWRTAICPDYCPNMYEEMETLASVLQSDIGQDMRVHNSTFLWFIPFVQSLMD
LKDLGVAYIAQVVNHLYSEVKEVLNQTDAVCDKVTEFFLLILVSVIELHRNKKCLHLLWVSSQQWVEAVV
KCAKLPTTAFTRSSEKSSGNCSKGTAMISSLSLHSMPSNSVQLAYVQLIRSLLKEGYQLGQQSLCKRFWD
KLNLFLRGNLSLGWQLTSQETHELQSCLKQIIRNIKFKAPPCNTFVDLTSACKISPASYNKEESEQMGKT
SRKDMHCLEASSPTFSKEPMKVQDSVLIKADNTIEGDNNEQNYIKDVKLEDHLLAGSCLKQSSKNIFTER
AEDQIKISTRKQKSVKEISSYTPKDCTSRNGPERGCDRGIIVSTRLLTDSSTDALEKVSTSNEDFSLKDD
ALAKTSKRKTKVQKDEICAKLSHVIKKQHRKSTLVDNTINLDENLTVSNIESFYSRKDTGVQKGDGFIHN
LSLDPSGVLDDKNGEQKSQNNVLPKEKQLKNEELVIFSFHENNCKIQEFHVDGKELIPFTEMTNASEKKS
SPFKDLMTVPESRDEEMSNSTSVIYSNLTREQAPDISPKSDTLTDSQIDRDLHKLSLLAQASVITFPSDS
PQNSSQLQRKVKEDKRCFTANQNNVGDTSRGQVIIISDSDDDDDERILSLEKLTKQDKICLEREHPEQHV
STVNSKEEKNPVKEEKTETLFQFEESDSQCFEFESSSEVFSVWQDHPDDNNSVQDGEKKCLAPIANTTNG
QGCTDYVSEVVKKGAEGIEEHTRPRSISVEEFCEIEVKKPKRKRSEKPMAEDPVRPSSSVRNEGQSDTNK
RDLVGNDFKSIDRRTSTPNSRIQRATTVSQKKSSKLCTCTEPIRKVPVSKTPKKTHSDAKKGQNRSSNYL
SCRTTPAIVPPKKFRQCPEPTSTAEKLGLKKGPRKAYELSQRSLDYVAQLRDHGKTVGVVDTRKKTKLIS
PQNLSVRNNKKLLTSQELQMQRQIRPKSQKNRRRLSDCESTDVKRAGSHTAQNSDIFVPESDRSDYNCTG
GTEVLANSNRKQLIKCMPSEPETIKAKHGSPATDDACPLNQCDSVVLNGTVPTNEVIVSTSEDPLGGGDP
TARHIEMAALKEGEPDSSSDAEEDNLFLTQNDPEDMDLCSQMENDNYKLIELIHGKDTVEVEEDSVSRPQ
LESLSGTKCKYKDCLETTKNQGEYCPKHSEVKAADEDVFRKPGLPPPASKPLRPTTKIFSSKSTSRIAGL
SKSLETSSALSPSLKNKSKGIQSILKVPQPVPLIAQKPVGEMKNSCNVLHPQSPNNSNRQGCKVPFGESK
YFPSSSPVNILLSSQSVSDTFVKEVLKWKYEMFLNFGQCGPPASLCQSISRPVPVRFHNYGDYFNVFFPL
MVLNTFETVAQEWLNSPNRENFYQLQVRKFPADYIKYWEFAVYLEECELAKQLYPKENDLVFLAPERINE
EKKDTERNDIQDLHEYHSGYVHKFRRTSVMRNGKTECYLSIQTQENFPANLNELVNCIVISSLVTTQRKL
KAMSLLGSRNQLARAVLNPNPMDFCTKDLLTTTSERIIAYLRDFNEDQKKAIETAYAMVKHSPSVAKICL
IHGPPGTGKSKTIVGLLYRLLTENQRKGHSDENSNAKIKQNRVLVCAPSNAAVDELMKKIILEFKEKCKD
KKNPLGNCGDINLVRLGPEKSINSEVLKFSLDSQVNHRMKKELPSHVQAMHKRKEFLDYQLDELSRQRAL
CRGGREIQRQELDENISKVSKERQELASKIKEVQGRPQKTQSIIILESHIICCTLSTSGGLLLESAFRGQ
GGVPFSCVIVDEAGQSCEIETLTPLIHRCNKLILVGDPKQLPPTVISMKAQEYGYDQSMMARFCRLLEEN
VEHNMISRLPILQLTVQYRMHPDICLFPSNYVYNRNLKTNRQTEAIRCSSDWPFQPYLVFDVGDGSERRD
NDSYINVQEIKLVMEIIKLIKDKRKDVSFRNIGIITHYKAQKTMIQKDLDKEFDRKGPAEVDTVDAFQGR
QKDCVIVTCVRANSIQGSIGFLASLQRLNVTITRAKYSLFILGHLRTLMENQHWNQLIQDAQKRGAIIKT
CDKNYRHDAVKILKLKPVLQRSLTHPPTIAPEGSRPQGGLPSSKLDSGFAKTSVAASLYHTPSDSKEITL
TVTSKDPERPPVHDQLQDPRLLKRMGIEVKGGIFLWDPQPSSPQHPGATPPTGEPGFPVVHQDLSHIQQP
AAVVAALSSHKPPVRGEPPAASPEASTCQSKCDDPEEELCHRREARAFSEGEQEKCGSETHHTRRNSRWD
KRTLEQEDSSSKKRKLL
>gi|187233964|gb|ACD01221.1| TP53 [Homo sapiens]
RAMAIYKQSQHMTEVVRRCPTNERCSDSDGLAPPQHLIR
>gi|119395734|ref|NP_000050.2| breast cancer type 2 susceptibility protein [Homo sapiens]
MPIGSKERPTFFEIFKTRCNKADLGPISLNWFEELSSEAPPYNSEPAEESEHKNNNYEPNLFKTPQRKPS
YNQLASTPIIFKEQGLTLPLYQSPVKELDKFKLDLGRNVPNSRHKSLRTVKTKMDQADDVSCPLLNSCLS
ESPVVLQCTHVTPQRDKSVVCGSLFHTPKFVKGRQTPKHISESLGAEVDPDMSWSSSLATPPTLSSTVLI
VRNEEASETVFPHDTTANVKSYFSNHDESLKKNDRFIASVTDSENTNQREAASHGFGKTSGNSFKVNSCK
DHIGKSMPNVLEDEVYETVVDTSEEDSFSLCFSKCRTKNLQKVRTSKTRKKIFHEANADECEKSKNQVKE
KYSFVSEVEPNDTDPLDSNVANQKPFESGSDKISKEVVPSLACEWSQLTLSGLNGAQMEKIPLLHISSCD
QNISEKDLLDTENKRKKDFLTSENSLPRISSLPKSEKPLNEETVVNKRDEEQHLESHTDCILAVKQAISG
TSPVASSFQGIKKSIFRIRESPKETFNASFSGHMTDPNFKKETEASESGLEIHTVCSQKEDSLCPNLIDN
GSWPATTTQNSVALKNAGLISTLKKKTNKFIYAIHDETSYKGKKIPKDQKSELINCSAQFEANAFEAPLT
FANADSGLLHSSVKRSCSQNDSEEPTLSLTSSFGTILRKCSRNETCSNNTVISQDLDYKEAKCNKEKLQL
FITPEADSLSCLQEGQCENDPKSKKVSDIKEEVLAAACHPVQHSKVEYSDTDFQSQKSLLYDHENASTLI
LTPTSKDVLSNLVMISRGKESYKMSDKLKGNNYESDVELTKNIPMEKNQDVCALNENYKNVELLPPEKYM
RVASPSRKVQFNQNTNLRVIQKNQEETTSISKITVNPDSEELFSDNENNFVFQVANERNNLALGNTKELH
ETDLTCVNEPIFKNSTMVLYGDTGDKQATQVSIKKDLVYVLAEENKNSVKQHIKMTLGQDLKSDISLNID
KIPEKNNDYMNKWAGLLGPISNHSFGGSFRTASNKEIKLSEHNIKKSKMFFKDIEEQYPTSLACVEIVNT
LALDNQKKLSKPQSINTVSAHLQSSVVVSDCKNSHITPQMLFSKQDFNSNHNLTPSQKAEITELSTILEE
SGSQFEFTQFRKPSYILQKSTFEVPENQMTILKTTSEECRDADLHVIMNAPSIGQVDSSKQFEGTVEIKR
KFAGLLKNDCNKSASGYLTDENEVGFRGFYSAHGTKLNVSTEALQKAVKLFSDIENISEETSAEVHPISL
SSSKCHDSVVSMFKIENHNDKTVSEKNNKCQLILQNNIEMTTGTFVEEITENYKRNTENEDNKYTAASRN
SHNLEFDGSDSSKNDTVCIHKDETDLLFTDQHNICLKLSGQFMKEGNTQIKEDLSDLTFLEVAKAQEACH
GNTSNKEQLTATKTEQNIKDFETSDTFFQTASGKNISVAKESFNKIVNFFDQKPEELHNFSLNSELHSDI
RKNKMDILSYEETDIVKHKILKESVPVGTGNQLVTFQGQPERDEKIKEPTLLGFHTASGKKVKIAKESLD
KVKNLFDEKEQGTSEITSFSHQWAKTLKYREACKDLELACETIEITAAPKCKEMQNSLNNDKNLVSIETV
VPPKLLSDNLCRQTENLKTSKSIFLKVKVHENVEKETAKSPATCYTNQSPYSVIENSALAFYTSCSRKTS
VSQTSLLEAKKWLREGIFDGQPERINTADYVGNYLYENNSNSTIAENDKNHLSEKQDTYLSNSSMSNSYS
YHSDEVYNDSGYLSKNKLDSGIEPVLKNVEDQKNTSFSKVISNVKDANAYPQTVNEDICVEELVTSSSPC
KNKNAAIKLSISNSNNFEVGPPAFRIASGKIVCVSHETIKKVKDIFTDSFSKVIKENNENKSKICQTKIM
AGCYEALDDSEDILHNSLDNDECSTHSHKVFADIQSEEILQHNQNMSGLEKVSKISPCDVSLETSDICKC
SIGKLHKSVSSANTCGIFSTASGKSVQVSDASLQNARQVFSEIEDSTKQVFSKVLFKSNEHSDQLTREEN
TAIRTPEHLISQKGFSYNVVNSSAFSGFSTASGKQVSILESSLHKVKGVLEEFDLIRTEHSLHYSPTSRQ
NVSKILPRVDKRNPEHCVNSEMEKTCSKEFKLSNNLNVEGGSSENNHSIKVSPYLSQFQQDKQQLVLGTK
VSLVENIHVLGKEQASPKNVKMEIGKTETFSDVPVKTNIEVCSTYSKDSENYFETEAVEIAKAFMEDDEL
TDSKLPSHATHSLFTCPENEEMVLSNSRIGKRRGEPLILVGEPSIKRNLLNEFDRIIENQEKSLKASKST
PDGTIKDRRLFMHHVSLEPITCVPFRTTKERQEIQNPNFTAPGQEFLSKSHLYEHLTLEKSSSNLAVSGH
PFYQVSATRNEKMRHLITTGRPTKVFVPPFKTKSHFHRVEQCVRNINLEENRQKQNIDGHGSDDSKNKIN
DNEIHQFNKNNSNQAAAVTFTKCEEEPLDLITSLQNARDIQDMRIKKKQRQRVFPQPGSLYLAKTSTLPR
ISLKAAVGGQVPSACSHKQLYTYGVSKHCIKINSKNAESFQFHTEDYFGKESLWTGKGIQLADGGWLIPS
NDGKAGKEEFYRALCDTPGVDPKLISRIWVYNHYRWIIWKLAAMECAFPKEFANRCLSPERVLLQLKYRY
DTEIDRSRRSAIKKIMERDDTAAKTLVLCVSDIISLSANISETSSNKTSSADTQKVAIIELTDGWYAVKA
QLDPPLLAVLKNGRLTVGQKIILHGAELVGSPDACTPLEAPESLMLKISANSTRPARWYTKLGFFPDPRP
FPLPLSSLFSDGGNVGCVDVIIQRAYPIQWMEKTSSGLYIFRNEREEEKEAAKYVEAQQKRLEALFTKIQ
EEFEEHEENTTKPYLPSRALTRQQVRALQDGAELYEAVKNAADPAYLEGYFSEEQLRALNNHRQMLNDKK
QAQIQLEIRKAMESAEQKEQGLSRDVTTVWKLRIVSYSKKEKDSVILSIWRPSSDLYSLLTEGKRYRIYH
LATSKSKSKSERANIQLAATKKTQYQQLPVSDEILFQIYQPREPLHFSKFLDPDFQPSCSEVDLIGFVVS
VVKKTGLAPFVYLSDECYNLLAIKFWIDLNEDIIKPHMLIAASNLQWRPESKSGLLTLFAGDFSVFSASP
KEGHFQETFNKMKNTVENIDILCNEAENKLMHILHANDPKWSTPTKDCTSGPYTAQIIPGTGNKLLMSSP
NCEIYYQSPLSLCMAKRKSVSTPVSAQMTSKSCKGEKEIDDQKNCKKRRALDFLSRLPLPPPVSPICTFV
SPAAQKAFQPPRSCGTKYETPIKKKELNSPQMTPFKKFNEISLLESNSIADEELALINTQALLSGSTGEK
QFISVSESTRTAPTSSEDYLRLKRRCTTSLIKEQESSQASTEECEKNKQDTITTKKYI
----refp_db-----

$ formatdb -i refp_db -p T -o T

----tp53.fa-----
>gi|187233964|gb|ACD01221.1| TP53 [Homo sapiens]
RAMAIYKQSQHMTEVVRRCPTNERCSDSDGLAPPQHLIR
----tp53.fa-----
$ blastall -p blastp -i tp53.fa -d refp_db

No comments: