从cds到pep

鉴于有几位同学在问如何批量转化cds为pep序列,那么本人就把自己的一段代码从另一个脚本中抽取出来。故意写成两个子函数是方面单独使用,比如输入序列不是单纯的fasta格式而是phy格式的,可以对cds2pep函数做一下调整,code子函数则无需改动。

这段代码要求的输入文件格式是fasta。

用法:perl  cds2pep.pl  input.cds.fa  out.pep.fa  

#! /usr/bin/perl -w
use strict;
die "#usage;perl $0 \n" unless @ARGV==2;
my $incds=shift;
my $outpep=shift;
###output the pep sequeces###
cds2pep($incds,$outpep);
#####################################
##############subroutine#############
#####################################
sub cds2pep{my ($infile,$outfile)=@_;open IN,'<',$infile||die;open OUT,'>',$outfile||die;my $p=code();$/=">";;$/="\n";while(){chomp;my $head=$_;$/=">";chomp(my $seq=);$/="\n";$seq=~s/\n+//g;my $out;for(my $i=0;$i{"standard"}{$codon} ? $p->{"standard"}{$codon} : "X";}$out =~ s/U$//;my $len=length$out;$out =~ s/([A-Z]{50})/$1\n/g;chop $out unless $len % 50;print OUT ">$head [translate_table: standard]\n$out\n"}close OUT;
}
#####################################
sub code{my $p={"standard" =>{       'GCA' => 'A', 'GCC' => 'A', 'GCG' => 'A', 'GCT' => 'A',                               # Alanine'TGC' => 'C', 'TGT' => 'C',                                                           # Cysteine'GAC' => 'D', 'GAT' => 'D',                                                           # Aspartic Aci'GAA' => 'E', 'GAG' => 'E',                                                           # Glutamic Aci'TTC' => 'F', 'TTT' => 'F',                                                           # Phenylalanin'GGA' => 'G', 'GGC' => 'G', 'GGG' => 'G', 'GGT' => 'G',                               # Glycine'CAC' => 'H', 'CAT' => 'H',                                                           # Histidine'ATA' => 'I', 'ATC' => 'I', 'ATT' => 'I',                                             # Isoleucine'AAA' => 'K', 'AAG' => 'K',                                                           # Lysine'CTA' => 'L', 'CTC' => 'L', 'CTG' => 'L', 'CTT' => 'L', 'TTA' => 'L', 'TTG' => 'L',   # Leucine'ATG' => 'M',                                                                         # Methionine'AAC' => 'N', 'AAT' => 'N',                                                           # Asparagine'CCA' => 'P', 'CCC' => 'P', 'CCG' => 'P', 'CCT' => 'P',                               # Proline'CAA' => 'Q', 'CAG' => 'Q',                                                           # Glutamine'CGA' => 'R', 'CGC' => 'R', 'CGG' => 'R', 'CGT' => 'R', 'AGA' => 'R', 'AGG' => 'R',   # Arginine'TCA' => 'S', 'TCC' => 'S', 'TCG' => 'S', 'TCT' => 'S', 'AGC' => 'S', 'AGT' => 'S',   # Serine'ACA' => 'T', 'ACC' => 'T', 'ACG' => 'T', 'ACT' => 'T',                               # Threonine'GTA' => 'V', 'GTC' => 'V', 'GTG' => 'V', 'GTT' => 'V',                               # Valine'TGG' => 'W',                                                                         # Tryptophan'TAC' => 'Y', 'TAT' => 'Y',                                                           # Tyrosine'TAA' => 'U', 'TAG' => 'U', 'TGA' => 'U'                                              # Stop}## more translate table could be added here in future## more translate table could be added here in future## more translate table could be added here in future};return $p;
}
__END__



本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!

相关文章

立即
投稿

微信公众账号

微信扫一扫加关注

返回
顶部