python实现 提取gbk文件的登记码和ORIGIN序列

file_r = open("sequence.gbk")
file_w = open("sequence.fasta", "w")
flag = 0
for line in file_r:if line[0:9] == 'ACCESSION':#得到第一个空格和第二个空格之间的内容,[2]:得到第二个空格和第三个空格之间的内容AC = line.split()[1].strip() # AC = line.split( )[1]file_w.write('>'+AC + '\n')elif line[0:6] == 'ORIGIN':flag = 1elif flag == 1:fields = line.split()#[]不为空,!="",字符串不为空if fields != []:   seq = ''.join(fields[1:])#去掉列表中第一个元素(下标为0)的其他元素file_w.write(seq.upper() + '\n')#upper将小写字母转为大写字母
file_r.close()
file_w.close()

gbk文件:

LOCUS       DQ199648                 399 bp    mRNA    linear   PLN 01-OCT-2006
DEFINITION  Astragalus sinicus isolate AsE246 LTP-like protein 1 mRNA, completecds.
ACCESSION   DQ199648
VERSION     DQ199648.1
KEYWORDS    .
SOURCE      Astragalus sinicusORGANISM  Astragalus sinicusEukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;Pentapetalae; rosids; fabids; Fabales; Fabaceae; Papilionoideae; 50kb inversion clade; NPAAA clade; Hologalegina; IRL clade; Galegeae;Astragalus.
REFERENCE   1  (bases 1 to 399)AUTHORS   Chou,M.-X., Wei,X.-Y. and Zhou,J.-C.TITLE     Identification of new nodulin cDNAs from Astragalus sinicus by SSHJOURNAL   Unpublished
REFERENCE   2  (bases 1 to 399)AUTHORS   Chou,M.-X., Wei,X.-Y. and Zhou,J.-C.TITLE     Direct SubmissionJOURNAL   Submitted (11-SEP-2005) National Key Laboratory of AgriculturalMicrobiology, Huazhong Agricultural University, Wuhan, Hubei430070, China
FEATURES             Location/Qualifierssource          1..399/organism="Astragalus sinicus"/mol_type="mRNA"/isolate="AsE246"/db_xref="taxon:47065"CDS             1..399/codon_start=1/product="LTP-like protein 1"/protein_id="ABB13623.1"/translation="MKFAYVVVVMCIMVVLNPSMTEAETISCREVVVTLTPCFPYLLSGYGPSQSCCEAIKSFKIVFKNKINGQIACNCMKKAAFFGLSNANAEALPEKCNVKMHYKINTSFDCTSIQDLKNVNVEKIQILQTLLV"
ORIGIN      1 atgaaatttg catatgtggt tgtggtgatg tgcatcatgg tagtgttgaa tccatccatg61 actgaggcag aaacaattag ttgccgtgaa gtggtggtga cgctcactcc ttgcttccca121 tatttgctta gtggttatgg tccatcccaa tcttgttgtg aagcaattaa gagtttcaaa181 attgtcttta aaaacaaaat taacggtcaa atcgcctgta attgtatgaa aaaagcagcg241 ttttttgggt tgagcaacgc taatgctgaa gcactccctg aaaaatgcaa tgtcaaaatg301 cactacaaga tcaacacatc cttcgactgt accagcatac aagatctaaa gaacgtgaat361 gtggagaaga ttcagatact tcaaactttg ttggtctag
//

结果文件:

>DQ199648
ATGAAATTTGCATATGTGGTTGTGGTGATGTGCATCATGGTAGTGTTGAATCCATCCATG
ACTGAGGCAGAAACAATTAGTTGCCGTGAAGTGGTGGTGACGCTCACTCCTTGCTTCCCA
TATTTGCTTAGTGGTTATGGTCCATCCCAATCTTGTTGTGAAGCAATTAAGAGTTTCAAA
ATTGTCTTTAAAAACAAAATTAACGGTCAAATCGCCTGTAATTGTATGAAAAAAGCAGCG
TTTTTTGGGTTGAGCAACGCTAATGCTGAAGCACTCCCTGAAAAATGCAATGTCAAAATG
CACTACAAGATCAACACATCCTTCGACTGTACCAGCATACAAGATCTAAAGAACGTGAAT
GTGGAGAAGATTCAGATACTTCAAACTTTGTTGGTCTAG


本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部