VariantAnnotator
简要说明
用途: 利用上下文信息注释识别的变异位点(variant calls)
分类: 变异位点操作工具
概要: 根据变异位点的背景信息(与功能注释相对)进行注释。目前有许多的注释模块(见注释模块一节)可供使用。
输入文件
用于注释的VCF文件和可选的BAM文件
输出文件
注释完毕的VCF文件
使用案例
对HaplotypeCaller或UnifiedGenotyper的结果中增加每个样本的深度和dbSNP ID信息。
java -jar GenomeAnalysisTK.jar \-R reference.fasta \-T VariantAnnotator \-I input.bam \-V input.vcf \-o output.vcf \-A Coverage \--dbsnp dbsnp.vcf
参数说明:
-R/--reference_sequence:参考基因组
-T/--analysis_type : 运行的工具
-I/--input_file: 和vcf相应的BAM文件
-o :输出文件
-V/--varaint: 输入的VCF文件
-A/--annotation: 要添加哪些注释项
--dbsnp: 已有的snp信息注释数据库
注HaplotypeCaller和MuTect2也有-A选项,并且有些注释模块只能在HaplotypeCaller和MuTect2计算,例如StrandAlleleCountsBySample
 如下是 -A可接的内容:
Standard annotations in the list below are marked with a '*'.
Available annotations for the VCF INFO field:AS_BaseQualityRankSumTestAS_FisherStrandAS_InbreedingCoeffAS_InsertSizeRankSumAS_MQMateRankSumTestAS_MappingQualityRankSumTestAS_QualByDepthAS_RMSMappingQualityAS_ReadPosRankSumTestAS_StrandOddsRatioAlleleBalanceBaseCounts*BaseQualityRankSumTest*ChromosomeCountsClippingRankSumTestClusteredReadPosition*Coverage*ExcessHet*FisherStrandFractionInformativeReadsGCContentGenotypeSummaries*HaplotypeScoreHardyWeinbergHomopolymerRun*InbreedingCoeffLikelihoodRankSumTestLowMQMVLikelihoodRatio*MappingQualityRankSumTestMappingQualityZeroNBaseCountPossibleDeNovo*QualByDepth*RMSMappingQuality*ReadPosRankSumTestSampleListSnpEffSpanningDeletions*StrandOddsRatioTandemRepeatAnnotatorTransmissionDisequilibriumTestVariantTypeAvailable annotations for the VCF FORMAT field:AlleleBalanceBySampleAlleleCountBySampleBaseCountsBySampleBaseQualitySumPerAlleleBySample*DepthPerAlleleBySampleDepthPerSampleHCMappingQualityZeroBySampleOxoGReadCountsStrandAlleleCountsBySampleStrandBiasBySampleAvailable classes/groups of annotations:AS_RMSAnnotationAS_RankSumTestAS_StandardAnnotationAS_StrandBiasTestActiveRegionBasedAnnotationBetaTestingAnnotationExperimentalAnnotationRMSAnnotationRankSumTestReducibleAnnotationRodRequiringAnnotationStandardAnnotationStandardHCAnnotationStandardSomaticAnnotationStandardUGAnnotationStrandBiasTestWorkInProgressAnnotation
注释模块
这是官方文档提供的注释模块:
| Name | Summary | 
|---|---|
| AS_BaseQualityRankSumTest | Allele-specific rank Sum Test of REF versus ALT base quality scores | 
| AS_FisherStrand | Allele-specific strand bias estimated using Fisher's Exact Test * | 
| AS_InbreedingCoeff | Allele-specific likelihood-based test for the inbreeding among samples | 
| AS_InsertSizeRankSum | Allele specific Rank Sum Test for insert sizes of REF versus ALT reads | 
| AS_MQMateRankSumTest | Allele specific Rank Sum Test for mate's mapping qualities of REF versus ALT reads | 
| AS_MappingQualityRankSumTest | Allele specific Rank Sum Test for mapping qualities of REF versus ALT reads | 
| AS_QualByDepth | Allele-specific call confidence normalized by depth of sample reads supporting the allele | 
| AS_RMSMappingQuality | Allele-specific Root Mean Square of the mapping quality of reads across all samples. | 
| AS_ReadPosRankSumTest | Allele-specific Rank Sum Test for relative positioning of REF versus ALT allele within reads | 
| AS_StrandOddsRatio | Allele-specific strand bias estimated by the Symmetric Odds Ratio test | 
| AlleleBalance | Allele balance across all samples | 
| AlleleBalanceBySample | Allele balance per sample | 
| AlleleCountBySample | Allele count and frequency expectation per sample | 
| BaseCounts | Count of A, C, G, T bases across all samples | 
| BaseCountsBySample | Count of A, C, G, T bases for each sample | 
| BaseQualityRankSumTest | Rank Sum Test of REF versus ALT base quality scores | 
| BaseQualitySumPerAlleleBySample | Sum of evidence in reads supporting each allele for each sample | 
| ChromosomeCounts | Counts and frequency of alleles in called genotypes | 
| ClippingRankSumTest | Rank Sum Test for hard-clipped bases on REF versus ALT reads | 
| ClusteredReadPosition | Detect clustering of variants near the ends of reads | 
| Coverage | Total depth of coverage per sample and over all samples. | 
| DepthPerAlleleBySample | Depth of coverage of each allele per sample | 
| DepthPerSampleHC | Depth of informative coverage for each sample. | 
| ExcessHet | Phred-scaled p-value for exact test of excess heterozygosity | 
| FisherStrand | Strand bias estimated using Fisher's Exact Test | 
| FractionInformativeReads | The fraction of reads deemed informative over the entire cohort | 
| GCContent | GC content of the reference around the given site | 
| GenotypeSummaries | Summarize genotype statistics from all samples at the site level | 
| HaplotypeScore | Consistency of the site with strictly two segregating haplotypes | 
| HardyWeinberg | Hardy-Weinberg test for transmission disequilibrium | 
| HomopolymerRun | Largest contiguous homopolymer run of the variant allele | 
| InbreedingCoeff | Likelihood-based test for the inbreeding among samples | 
| LikelihoodRankSumTest | Rank Sum Test of per-read likelihoods of REF versus ALT reads | 
| LowMQ | Proportion of low quality reads | 
| MVLikelihoodRatio | Likelihood of being a Mendelian Violation | 
| MappingQualityRankSumTest | Rank Sum Test for mapping qualities of REF versus ALT reads | 
| MappingQualityZero | Count of all reads with MAPQ = 0 across all samples | 
| MappingQualityZeroBySample | Count of reads with mapping quality zero for each sample | 
| NBaseCount | Percentage of N bases | 
| OxoGReadCounts | Count of read pairs in the F1R2 and F2R1 configurations supporting the reference and alternate alleles | 
| PossibleDeNovo | Existence of a de novo mutation in at least one of the given families | 
| QualByDepth | Variant call confidence normalized by depth of sample reads supporting a variant | 
| RMSMappingQuality | Root Mean Square of the mapping quality of reads across all samples. | 
| ReadPosRankSumTest | Rank Sum Test for relative positioning of REF versus ALT alleles within reads | 
| SampleList | List samples that are non-reference at a given site | 
| SnpEff | Top effect from SnpEff functional predictions | 
| SpanningDeletions | Fraction of reads containing spanning deletions | 
| StrandAlleleCountsBySample | Number of forward and reverse reads that support each allele | 
| StrandBiasBySample | Number of forward and reverse reads that support REF and ALT alleles | 
| StrandOddsRatio | Strand bias estimated by the Symmetric Odds Ratio test | 
| TandemRepeatAnnotator | Tandem repeat unit composition and counts per allele | 
| TransmissionDisequilibriumTest | Wittkowski transmission disequilibrium test | 
| VariantType | General category of variant |