gatk4的安装与下载
1. 下载安装包
点击下载 gatk4下载网址 ,其对应的版本是gatk4-4.4.0.0.zip
wget https://github.com/broadinstitute/gatk/releases/download/4.2.1.0/gatk-4.4.0.0.zip #下载安装包
unzip gatk-4.2.1.0 #解压缩安装包
cd gatk-4.2.1.0/
ls -hlt #查看安装包内容
echo "PATH=$PATH:/gatk-4.2.1.0/path" >>~/.bashrc
source ~/.bashrc
gatk -h #检查是否可正常运行
2. 功能更新说明
4.4.0.0
Download release: gatk-4.4.0.0.zip
Docker image: https://hub.docker.com/r/broadinstitute/gatk/
Highlights of the 4.4.0.0 release:
-
We've moved to Java 17, the latest long-term support (LTS) Java release, for building and running GATK! Previously we required Java 8, which is now end-of-life.
- Newer non-LTS Java releases such as Java 18 or Java 19 may work as well, but since they are untested by us we only officially support running with Java 17.
-
Significant enhancements to
SelectVariants, including arguments to enableGVCFfiltering support and to work with genotype fields more easily. -
A new tool
SVConcordance, that calculates SV genotype concordance between an "evaluation" VCF and a "truth" VCF -
Bug fixes and enhancements to the support for the Ultima Genomics flow-based sequencing platform introduced in GATK 4.3.0.0
Full list of changes:
-
Flow-based Variant Calling
FlowFeatureMapper: added surrounding-median-quality-size feature (#8222)- Removed hardcoded limit on max homopolymer call (#8088)
- Fixed bug in dynamic read disqualification (#8171)
- Fixed a bug in the parsing of the T0 tag (#8185)
- Updated flow-based calling
Mutect2parameters to make them consistent with theHaplotypeCallerparameters (#8186)
-
SelectVariants
- Enabled GVCF type filtering support in
SelectVariants(#7193)- Added an optional argument
--ignore-non-ref-in-typesto support correct handling of VariantContexts that contain a NON_REF allele. This is necessary because every variant in a GVCF file would otherwise be assigned the type MIXED, which makes it impossible to filter for e.g. SNPs. - Note that this only enables correct handling of GVCF input. The filtered output files are VCF (not GVCF) files, since reference blocks are not extended when a variant is filtered out.
- Added an optional argument
SelectVariants: added new arguments for controlling genotype JEXL filtering (#8092)-select-genotype: with this new genotype-specific JEXL argument, we support easily filtering by genotype fields with expressions like 'GQ > 0', where the behavior in the multi-sample case is 'GQ > 0' in at least one sample. It's still possible to manually access genotype fields using the old-selectargument and expressions such asvc.getGenotype('NA12878').getGQ() > 0.--apply-jexl-filters-first: This flag is provided to allow the user to do JEXL filtering before subsetting the format fields, in particular the case where the filtering is done on INFO fields only, which may improve speed when working with a large cohort VCF that contains genotypes for thousands of samples.
- Enabled GVCF type filtering support in
-
SV Calling
- Added a new tool
SVConcordance, that calculates SV genotype concordance between an "evaluation" VCF and a "truth" VCF (#7977) - Recognize MEI DELs with ALT format DEL:ME in
SVAnnotate(#8125) - Don't sort rejected reads output from
AnalyzeSaturationMutagenesis(#8053)
- Added a new tool
-
Notable Enhancements
GenotypeGVCFs: added an--keep-specific-combined-raw-annotationargument to keep specified raw annotations (#7996)VariantAnnotatornow warns instead of fails when the variant contains too many alleles (#8075)- Read filters now output total reads processed in addition to the number of reads filtered (#7947)
- Added
GenomicsDBarguments to theCreateSomaticPanelOfNormalstool (#6746) - Added a
DeprecatedFeatureannotation and a process for officially marking GATK tools as deprecated (#8100) - Prevent tool
close()methods from hiding underlying errors (#7764)
-
Bug Fixes
- Fixed issue causing
VariantRecalibratorto sometimes fail if user provided duplicate -an options (#8227) ReblockGVCF: remove A,R, and G length attributes whenReblockGVCFsubsets an allele (#8209)- Previously if an input gVCF had allele length, reference length, or genotype length annotations in the FORMAT field,
ReblockGVCFwould not remove all of them at sites where an allele was dropped. This makes the output gVCF invalid since the annotation length no longer matches the length described in the header at those sites. Now we fix up F1R2, F2R1, and AF annotations and remove any other annotations that are not already handled that are defined as A, R, or G length in the header.
- Previously if an input gVCF had allele length, reference length, or genotype length annotations in the FORMAT field,
- Fixed a
gCNVbug that breaks the inference when only 2 intervals are provided (#8180) - Fixed NPE from unintialized logger in
GenotypingEngine(#8159) - Fixed asynchronous Python exception propagation in
StreamingPythonExecutor/CNNScoreVariants(#7402) - Fixed issue in
ShiftFastawhere the interval list output was never written (#8070) - Bugfix for the type of some output files in the somatic CNV WDL (#6735) (#8130)
MergeAnnotatedRegionsnow requires a reference as asserted in its documentation (#8067)
- Fixed issue causing
-
Miscellaneous Changes
- Deprecated an untested
VariantRecalibratorargument and an oldReblockGVCFargument that produced invalid GVCFs (#8140) - Removed old
GnarlyGenotypercode with a diploid assumption to prepare for adding haploid support toGnarlyGenotyper(#8140) ReblockGVCF: add error message for when tree-score-threshold is set but the TREE_SCORE annotation is not present (#8218)TransferReadTags: allow empty unaligned bams as input (#8198)- Refactored
JointVcfFilteringWDL and expanded tests. (#8074) - Updated the carrot github action workflow to the most recent version, which supports using
#carrot_prto trigger branch vs master comparison runs (#8084) - Replaced uses of
File.createTempFile()withIOUtils.createTempFile()to ensure that temp files are deleted on shutdown (#6780) - Don't require python just to instantiate the
CNNScoreVariantstool classes. (#8128) - Made several
Funcotatormethods and fields protected so it is easier to extend the tool (#8124) (#8166) - Test for presence of ack result message and simplify
ProcessControllerAckResultAPI (#7816) - Fixed the path reported by the gatkbot when there are test failures (#8069)
- Fixed incorrect boolean value in
DirichletAlleleDepthAndFractionIntegrationTest(#7963) - Removed two ancient and unused
HaplotypeCallertest files that are no longer needed (#7634) - Added scattered gCNV case WDL to dockstore file (#8217)
- Deprecated an untested
-
Documentation
- Updated instructions for installing Java in the README (#8089)
- Added documentation on
OMP_NUM_THREADSandMKL_NUM_THREADStoGermlineCNVCallerandDetermineGermlineContigPloidy(#8223) - Improvements to
PileupDetectionArgumentCollectiondocumentation (#8050) - Fixed typo in documentation for
VariantAnnotator(#8145)
-
Dependencies
- Moved to
Java 17, the latest LTS Java release, for building/running GATK (#8035) - Updated
Gradleto 7.5.1 (#8098) - Updated the GATK base docker image to 3.0.0 (#8228)
- Updated
HTSJDKto 3.0.5 (#8035) - Updated
Picardto 3.0.0 (#8035) - Updated
Barclayto 5.0.0 (#8035) - Updated
GenomicsDBto 1.4.4 (#7978) - Updated
Sparkto 3.3.1 (#8035) - Updated
Hadoopto 3.3.1. (#8102) - Require
commons-text1.10.0 to fix a security vulnerability (#8071)
3. 帮助文档
- Moved to
-
[hgzhong@head01 gatk-4.4.0.0]$ gatk -hUsage template for all tools (uses --spark-runner LOCAL when used with a Spark tool)gatk AnyTool toolArgsUsage template for Spark tools (will NOT work on non-Spark tools)gatk SparkTool toolArgs [ -- --spark-runnersparkArgs ]Getting helpgatk --list Print the list of available toolsgatk Tool --help Print help on a particular toolConfiguration File Specification--gatk-config-file PATH/TO/GATK/PROPERTIES/FILEgatk forwards commands to GATK and adds some sugar for submitting spark jobs--spark-runner controls how spark tools are runvalid targets are:LOCAL: run using the in-memory spark runnerSPARK: run using spark-submit on an existing cluster --spark-master must be specified--spark-submit-command may be specified to control the Spark submit commandarguments to spark-submit may optionally be specified after -- GCS: run using Google cloud dataproccommands after the -- will be passed to dataproc--cluster must be specified after the --spark properties and some common spark-submit parameters will be translated to dataproc equivalents--dry-run may be specified to output the generated command line without running it--java-options 'OPTION1[ OPTION2=Y ... ]' optional - pass the given string of options to the java JVM at runtime. Java options MUST be passed inside a single string with space-separated values.--debug-port sets up a Java VM debug agent to listen to debugger connections on aparticular port number. This in turn will add the necessary java VM argumentsso that you don't need to explicitly indicate these using --java-options.--debug-suspend sets the Java VM debug agent up so that the run get immediatelly suspendedwaiting for a debugger to connect. By default the port number is 5005 butcan be customized using --debug-port 4. 安装所需的包
-
conda install -c bioconda fastqc fastp multiqc trimmomatic bwa samtools sambamba bcftools vcftools gffread #这些包都在bioconda里,所以指定channel为bioconda
本文来自互联网用户投稿,文章观点仅代表作者本人,不代表本站立场,不承担相关法律责任。如若转载,请注明出处。 如若内容造成侵权/违法违规/事实不符,请点击【内容举报】进行投诉反馈!
