Python|将CDS序列转为PEP序列
有蛋白编码基因的核苷酸序列,想要转化成对应的氨基酸序列,可以利用Python的Biopython Module来实现。
- 提取核苷酸序列信息
原始的保存着核苷酸序列的fasta文件里还有着序列的id等说明信息,我们只需要核苷酸序列,可以用for循环遍历每一行,将偶数行输出到一个新的文本文档中。
# opening the file
file1 = open('D:/.../PCGs/cytb/cytb.fas', 'r')# creating another file to store even lines
file2 = open('D:/.../PCGs/cytb/cytb_no_label.fas', 'w')# reading content of the files and writing even lines to another file
lines = file1.readlines()for i in range(0, len(lines)):if (i % 2 != 0):file2.write(lines[i])# closing the files
file1.close()
file2.close()
- 利用Biopython将CDS转为PEP
#importing the Biopython package
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC#opening the file
file1 = open('D:/.../PCGs/atp6/atp6_no_label.fas')
file2 = open('D:/.../PCGs/atp6/atp6_aa_no_label.fas', 'w')#creating a list to store the nucleotide sequence of each row
dataMat = []
for line in file1.readlines():curLine = line.strip().split(" ")dataMat.append(curLine[:])for i in dataMat[0:]:#list to stringj = "".join(i)coding_dna = Seq(j, IUPAC.unambiguous_dna)pep = coding_dna.translate(table="Invertebrate Mitochondrial")pep2 = str(pep)print(pep2)file2.write(pep2)file2.write("\n")
file2.close()
- 将输出的PEP序列添加每条序列原始信息
之前输出的PEP序列保存只有氨基酸排列信息,没有最初的序列id等说明信息了,可以再用下面的脚本添加上。
file1=open('D:/.../PCGs/nad6/nad6_aa_no_label.fas','r')
lines=[]
for line in file1:lines.append(line)
file1.close()file1=open('D:/.../PCGs/nad6/nad6_aa_no_label.fas','w')
lines.insert(0,'>td')
lines.insert(2,'>tj')
lines.insert(4,'>to')
lines.insert(6,'>tchi')
lines.insert(8,'>tcae')
lines.insert(10,'>tp')
lines.insert(12,'>ma')
s = '\n'.join(lines)
file1.write(s)
file1.close()
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
