PURPLE Summary
PURPLE outputs a QC status along with a summary for the inferred purity and ploidy of the somatic sample. The ‘QC Status’ field reflects how we have determined the purity of the sample. A ‘FAIL’ or ‘WARN’ QC status can be attributed to several factors:
FAIL_CONTAMINATION
: measured contamination in the tumor
(by Amber) is >10%FAIL_NO_TUMOR
: no evidence of tumor is found in the
sample, when all following criteria are satisfied:
WARN_DELETED_GENES
: more than 280 homozygously deleted
genes. This sometimes occurs in samples with very high MB scale GC bias,
and particularly affects high GC content regions such as CHR 19. It may
also indicate a poor fit.WARN_HIGH_COPY_NUMBER_NOISE
: more than 220 copy number
segments not supported at either end by SV breakpoints.
Indicates samples with extreme GC bias, with differences in depth of
>= 10x between high and low GC regions. GC normalisation is
unreliable when the corrections are so extreme so it is recommended to
fail the sample (concerns with miscalled deletions or amplifications or
have poor sensitivity in high GC regions).WARN_GENDER_MISMATCH
: if the AMBER and COBALT inferred
genders are inconsistent then the COBALT one is used but the sample is
failed.WARN_LOW_PURITY
: fitted purity < 20%QC_Status | WARN_DELETED_GENES | See 'Description'. |
Purity | 0.26 (0.25-0.4) | Purity of tumor in the sample (and min-max with score within 10% of best). |
Ploidy | 2.6 (2.52-2.94) | Average ploidy of tumor sample after adjusting for purity (and min-max with score within 10% of best). |
Gender | MALE | Gender as inferred by AMBER/COBALT. |
WGD | FALSE | Whole genome duplication (more than 10 autosomes have average major allele ploidy > 1.5). |
PolyclonalProp | 0.68 | Proportion of CN regions that are more than 0.25 from a whole CN |
DiploidyProp | 0 (0-0.2) | Proportion of CN regions that have 1 (+- 0.2) minor and major allele. |
TMB | NAN (LOW) | Tumor mutational burden (# PASS variants per Megabase) (Status: 'HIGH' (>10 PASS per Mb), 'LOW' or 'UNKNOWN'). |
TML | 0 (LOW) | Tumor mutational load (# of missense variants) (Status: 'HIGH', 'LOW' or 'UNKNOWN'). |
TMB-SV | 0 | # of non inferred, non single passing SVs. |
Method | NORMAL | Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or NO_TUMOR). |
CopyNumberSegments | 47 (Unsupported: 0) | # of CN segments. |
DeletedGenes | 1814 | # of homozygously deleted genes. |
Contamination | 0.0 | Rate of contamination in tumor sample as determined by AMBER. |
GermlineAberrations | NONE | Can be one or more of: KLINEFELTER, TRISOMY_X/21/13/18/15, XYY, MOSAIC_X. |
AmberMeanDepth | 66 | Mean depth as determined by AMBER. |
CNVs (Somatic) | Min: -1.6; Max: 9; N: 47 | |
Genome | b37 | Human genome assembly used. |
Summarised below are the allele frequencies (AFs) for somatic variants detected genome-wide (Global) vs. within the coding sequence of ~1,100 cancer genes (Key Genes CDS). AFs range from 0 to 1, or 0%-100% (we filter out all novel variants with AF < 10%).
The following post-processing steps occur:
somatic_vcf_annotate
: annotate
VCF against databases of known hotspots, germline variants, low
mappability regions, UMCCR panel of normalssomatic_vcf_filter
: filter
VCF to remove germline variants and artefacts, but keep known
hotspotssubset_to_giab
: keep variants in ‘high confidence’
regions as determined by the Genome in a Bottle
consortiumafs
: grab only the INFO/TUMOR_AF
field and
output to final txt fileafs_keygenes
: grab the CHROM
,
POS
, ID
, REF
, ALT
and INFO/TUMOR_AF
for variants in the UMCCR cancer gene BED
file, and output to final txt file
Circos plots are generated by PURPLE. The first BAF plot is based on PURPLE data and configuration files.
Structural variants often involve deletions/insertions encompassing large genomic regions which can involve hundreds or even thousands of genes. The annotations for these genes/transcripts are included in the original SV VCF, but they are extremely hard to display and browse inside a HTML table, due to their size.
Example SV with -only- 3 annotations:
SIMPLE_ANN=DEL|TFBS_ablation||MA0114&MA0139&MA0281|unprioritized|4,DEL|downstream_gene_variant|HFE2|ENST00000336751_exon_1/4|unprioritized|4,DEL|intergenic_region|TXNIP-HFE2|ENSG00000265972-ENSG00000168509|unprioritized|4
Reshaped:
svtype | effect | genes | transcript | detail | tier
DEL | TFBS_ablation | | MA0114&MA0139&MA0281 | unprioritized | 4
DEL | downstream_gene_variant | HFE2 | ENST00000336751_exon_1/4 | unprioritized | 4
DEL | intergenic_region | TXNIP-HFE2 | ENSG00000265972-ENSG00000168509 | unprioritized | 4
## Error: object 'sv' not found
## Error: object 'sv' not found
## Error: object 'sv' not found
## Error: object 'p1' not found
## Error: object 'sv.dt' not found
## Error: object 'sv.dt' not found
## Error: object 'sv.dt' not found
## Error: object 'sv.dt' not found
## Error: object 'cnv' not found
## Error: object 'cnv.dt' not found
## Error: object 'cnv.dt' not found
## Error: object 'cnv.dt' not found
## Error: object 'cnv.dt' not found
PURPLE copy number alterations. - description is from https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md#gene-copy-number-file
Column | Description |
---|---|
gene | Name of gene |
minCN/maxCN | Min/Max copy number found in gene exons |
chrom/start/end | Chromosome/start/end location of gene transcript |
chrBand | Chromosome band of the gene |
onco_or_ts | oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or both (‘onco+ts’), as reported by Cancermine |
transcriptID | Ensembl transcript ID (dot version) |
minMinorAlleleCN | Minimum allele ploidy found over the gene exons - useful for identifying LOH events |
somReg (somaticRegions) | Count of somatic copy number regions this gene spans |
minReg (minRegions) | Number of somatic regions inside the gene that share the min copy number |
minRegStartEnd | Start/End base of the copy number region overlapping the gene with the minimum copy number |
minRegSupportStartEndMethod | Start/end support of the CN region overlapping the gene with the min CN (plus determination method) |
PURPLE outputs a file with the copy number profile of all contiguous segments of the tumor sample:
PURPLE copy number profile of all (contiguous) segments of the tumor sample - description is from https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md#copy-number-file
Column | Description |
---|---|
Chr/Start/End | Coordinates of copy number segment |
CN | Fitted absolute copy number of segment adjusted for purity and ploidy |
CN Min+Maj | CopyNumber of minor + major allele adjusted for purity |
Start/End SegSupport | Type of SV support for the CN breakpoint at start/end of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL (single breakend SV support), NONE (no SV support for CN breakpoint), MULT (multiple SV support at exact breakpoint) |
Method | Method used to determine the CN of the region. Allowed values: BAF_WEIGHTED (avg of all depth windows for the region), STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM (inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using special logic to handle regions of germline amplification) |
BAF (count) | Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf points covered by this segment) |
GC (windowCount) | Proportion of segment that is G or C (Count of COBALT windows covered by this segment) |
PURPLE generates charts for summarising tumor sample characteristics. Description is from the PURPLE docs.
The following ‘sunrise’ chart shows the range of scores of all examined solutions of purity and ploidy. Crosshairs identify the best purity / ploidy solution.
The following figures show the AMBER BAF count weighted distribution of copy number and minor allele ploidy throughout the fitted segments. Copy numbers are broken down by colour into their respective minor allele ploidy (MAP) while the minor allele ploidy figure is broken down by copy number.
If a somatic variant VCF has been supplied, a figure will be produced showing the somatic variant ploidy broken down by copy number.
The following diagram illustrates the clonality model of a typical sample.
The top figure shows the histogram of somatic ploidy for all SNVs and INDELs in blue. Superimposed are peaks in different colours fitted from the sample as described in the docs while the black line shows the overall fitted ploidy distribution. Red filled peaks are below the 0.85 subclonal threshold.
We can determine the likelihood of a variant being subclonal at any given ploidy as shown in the bottom half of the figure.
The contribution of each fitted segment to the final score of the best fit is shown in the following figure. Each segment is divided into its major and minor allele ploidy. The area of each circle shows the weight (AMBER baf count) of each segment.
ASCAT generates charts for summarising tumor sample characteristics. Description is from the ASCAT docs.
The following ‘sunrise’ chart shows the range of scores of all examined solutions of purity and ploidy. Crosshairs identify the best purity / ploidy solution.
The following figures show the segment results of ASCAT.
key | value |
---|---|
bcftools_stats | |
conda_list | |
dragen_hrd | |
oncokb_genes | |
title | Clindet …. |
somatic_snv_vcf | |
somatic_snv_summary | |
somatic_sv_tsv | |
somatic_sv_vcf | |
batch_name | WES |
af_keygenes | /public/…. |
genome_version | b37 |
ascat_res_dir | /public/…. |
purple_res_dir | /public/…. |
purple_som_gene_cnv | /public/…. |
purple_som_cnv | /public/…. |
purple_purity | /public/…. |
purple_qc | /public/…. |
purple_som_snv_vcf | /public/…. |
virusbreakend_tsv | |
virusbreakend_vcf | |
result_outdir | /public/…. |
tumor_name | CGGA_P438 |
## No conda package list provided.