python实现 提取gbk文件的登记码和ORIGIN序列
file_r = open("sequence.gbk")
file_w = open("sequence.fasta", "w")
flag = 0
for line in file_r:if line[0:9] == 'ACCESSION':#得到第一个空格和第二个空格之间的内容,[2]:得到第二个空格和第三个空格之间的内容AC = line.split()[1].strip() # AC = line.split( )[1]file_w.write('>'+AC + '\n')elif line[0:6] == 'ORIGIN':flag = 1elif flag == 1:fields = line.split()#[]不为空,!="",字符串不为空if fields != []: seq = ''.join(fields[1:])#去掉列表中第一个元素(下标为0)的其他元素file_w.write(seq.upper() + '\n')#upper将小写字母转为大写字母
file_r.close()
file_w.close()
gbk文件:
LOCUS DQ199648 399 bp mRNA linear PLN 01-OCT-2006
DEFINITION Astragalus sinicus isolate AsE246 LTP-like protein 1 mRNA, completecds.
ACCESSION DQ199648
VERSION DQ199648.1
KEYWORDS .
SOURCE Astragalus sinicusORGANISM Astragalus sinicusEukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta;Spermatophyta; Magnoliopsida; eudicotyledons; Gunneridae;Pentapetalae; rosids; fabids; Fabales; Fabaceae; Papilionoideae; 50kb inversion clade; NPAAA clade; Hologalegina; IRL clade; Galegeae;Astragalus.
REFERENCE 1 (bases 1 to 399)AUTHORS Chou,M.-X., Wei,X.-Y. and Zhou,J.-C.TITLE Identification of new nodulin cDNAs from Astragalus sinicus by SSHJOURNAL Unpublished
REFERENCE 2 (bases 1 to 399)AUTHORS Chou,M.-X., Wei,X.-Y. and Zhou,J.-C.TITLE Direct SubmissionJOURNAL Submitted (11-SEP-2005) National Key Laboratory of AgriculturalMicrobiology, Huazhong Agricultural University, Wuhan, Hubei430070, China
FEATURES Location/Qualifierssource 1..399/organism="Astragalus sinicus"/mol_type="mRNA"/isolate="AsE246"/db_xref="taxon:47065"CDS 1..399/codon_start=1/product="LTP-like protein 1"/protein_id="ABB13623.1"/translation="MKFAYVVVVMCIMVVLNPSMTEAETISCREVVVTLTPCFPYLLSGYGPSQSCCEAIKSFKIVFKNKINGQIACNCMKKAAFFGLSNANAEALPEKCNVKMHYKINTSFDCTSIQDLKNVNVEKIQILQTLLV"
ORIGIN 1 atgaaatttg catatgtggt tgtggtgatg tgcatcatgg tagtgttgaa tccatccatg61 actgaggcag aaacaattag ttgccgtgaa gtggtggtga cgctcactcc ttgcttccca121 tatttgctta gtggttatgg tccatcccaa tcttgttgtg aagcaattaa gagtttcaaa181 attgtcttta aaaacaaaat taacggtcaa atcgcctgta attgtatgaa aaaagcagcg241 ttttttgggt tgagcaacgc taatgctgaa gcactccctg aaaaatgcaa tgtcaaaatg301 cactacaaga tcaacacatc cttcgactgt accagcatac aagatctaaa gaacgtgaat361 gtggagaaga ttcagatact tcaaactttg ttggtctag
//
结果文件:
>DQ199648
ATGAAATTTGCATATGTGGTTGTGGTGATGTGCATCATGGTAGTGTTGAATCCATCCATG
ACTGAGGCAGAAACAATTAGTTGCCGTGAAGTGGTGGTGACGCTCACTCCTTGCTTCCCA
TATTTGCTTAGTGGTTATGGTCCATCCCAATCTTGTTGTGAAGCAATTAAGAGTTTCAAA
ATTGTCTTTAAAAACAAAATTAACGGTCAAATCGCCTGTAATTGTATGAAAAAAGCAGCG
TTTTTTGGGTTGAGCAACGCTAATGCTGAAGCACTCCCTGAAAAATGCAATGTCAAAATG
CACTACAAGATCAACACATCCTTCGACTGTACCAGCATACAAGATCTAAAGAACGTGAAT
GTGGAGAAGATTCAGATACTTCAAACTTTGTTGGTCTAG
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
