Summary

Description

PURPLE Summary

PURPLE outputs a QC status along with a summary for the inferred purity and ploidy of the somatic sample. The ‘QC Status’ field reflects how we have determined the purity of the sample. A ‘FAIL’ or ‘WARN’ QC status can be attributed to several factors:

  • FAIL_CONTAMINATION: measured contamination in the tumor (by Amber) is >10%
  • FAIL_NO_TUMOR: no evidence of tumor is found in the sample, when all following criteria are satisfied:
    • tumor has one or more HOTSPOT SV or point mutation
    • SNV sum(allele read count) > 5000
    • SV sum(startTumorVariantFragmentSupport) > 1000 (excluding SGL breakends)
  • WARN_DELETED_GENES: more than 280 homozygously deleted genes. This sometimes occurs in samples with very high MB scale GC bias, and particularly affects high GC content regions such as CHR 19. It may also indicate a poor fit.
  • WARN_HIGH_COPY_NUMBER_NOISE: more than 220 copy number segments not supported at either end by SV breakpoints. Indicates samples with extreme GC bias, with differences in depth of >= 10x between high and low GC regions. GC normalisation is unreliable when the corrections are so extreme so it is recommended to fail the sample (concerns with miscalled deletions or amplifications or have poor sensitivity in high GC regions).
  • WARN_GENDER_MISMATCH: if the AMBER and COBALT inferred genders are inconsistent then the COBALT one is used but the sample is failed.
  • WARN_LOW_PURITY: fitted purity < 20%
QC_Status WARN_DELETED_GENES See 'Description'.
Purity 0.26 (0.25-0.4) Purity of tumor in the sample (and min-max with score within 10% of best).
Ploidy 2.6 (2.52-2.94) Average ploidy of tumor sample after adjusting for purity (and min-max with score within 10% of best).
Gender MALE Gender as inferred by AMBER/COBALT.
WGD FALSE Whole genome duplication (more than 10 autosomes have average major allele ploidy > 1.5).
PolyclonalProp 0.68 Proportion of CN regions that are more than 0.25 from a whole CN
DiploidyProp 0 (0-0.2) Proportion of CN regions that have 1 (+- 0.2) minor and major allele.
TMB NAN (LOW) Tumor mutational burden (# PASS variants per Megabase) (Status: 'HIGH' (>10 PASS per Mb), 'LOW' or 'UNKNOWN').
TML 0 (LOW) Tumor mutational load (# of missense variants) (Status: 'HIGH', 'LOW' or 'UNKNOWN').
TMB-SV 0 # of non inferred, non single passing SVs.
Method NORMAL Fit method (NORMAL, HIGHLY_DIPLOID, SOMATIC or NO_TUMOR).
CopyNumberSegments 47 (Unsupported: 0) # of CN segments.
DeletedGenes 1814 # of homozygously deleted genes.
Contamination 0.0 Rate of contamination in tumor sample as determined by AMBER.
GermlineAberrations NONE Can be one or more of: KLINEFELTER, TRISOMY_X/21/13/18/15, XYY, MOSAIC_X.
AmberMeanDepth 66 Mean depth as determined by AMBER.
CNVs (Somatic) Min: -1.6; Max: 9; N: 47
Genome b37 Human genome assembly used.

 

Somatic Mutation Profiles

 

Allelic Frequencies

Summarised below are the allele frequencies (AFs) for somatic variants detected genome-wide (Global) vs. within the coding sequence of ~1,100 cancer genes (Key Genes CDS). AFs range from 0 to 1, or 0%-100% (we filter out all novel variants with AF < 10%).

Details

The following post-processing steps occur:

  1. somatic_vcf_annotate: annotate VCF against databases of known hotspots, germline variants, low mappability regions, UMCCR panel of normals
  2. somatic_vcf_filter: filter VCF to remove germline variants and artefacts, but keep known hotspots
  3. As preparation for the allelic frequencies plots:
    • subset_to_giab: keep variants in ‘high confidence’ regions as determined by the Genome in a Bottle consortium
    • keep only variants with AF above 10%
  4. Allele frequencies for global and keygenes:
    • afs: grab only the INFO/TUMOR_AF field and output to final txt file
    • afs_keygenes: grab the CHROM, POS, ID, REF, ALT and INFO/TUMOR_AF for variants in the UMCCR cancer gene BED file, and output to final txt file

 

 

SNV Signatures

Circos Plots

Circos plots are generated by PURPLE. The first BAF plot is based on PURPLE data and configuration files.

BAF, Total/Minor CN, SVs

Description
  • Track1: Chromosomes. Darker shaded areas: gaps in reference genome (centromeres, heterochromatin & missing short arms)
  • Track2: Beta Allele Frequency. Given that the BAF points correspond to allele frequencies of heterozygous SNPs that are common in germline samples, there should not be any in chromosome Y (and chromosome X when male).
  • Track3: Total copy number changes adjusted for tumor purity, including focal and chromosomal somatic events. Red = Loss; Green = Gain. Scaled from 0 (complete loss) to 6 (high level gains). If > 6, shown as 6 with a green dot on the outermost green gridline.
  • Track4: Minor allele copy numbers. Range from 0 to 3. Expected normal minor allele copy number is 1, and anything below 1 is shown as a loss (Orange), representing an LOH event. Minor allele copy numbers above 1 (Blue) indicate gains of both A and B alleles.
  • Track5 (Inner circle): Observed structural variants within or between the chromosomes.
    • Blue = Translocations
    • Red = Deletions
    • Yellow = Insertions
    • Green = Tandem duplications
    • Black = Inversions

SNVs/Indels, Total/Minor CN, SVs

Description
  • Track1: Chromosomes. Darker shaded areas: gaps in reference genome (centromeres, heterochromatin & missing short arms)
  • Track2: Somatic variants (incl. exon, intron and intergenic regions).
    • outer ring: SNP allele frequencies, corrected for tumor purity and scaled from 0 to 100%. Each dot represents a single somatic variant, coloured according to the type of base change (e.g. C>T/G>A in red).
    • inner ring: short insertion (yellow) and deletion (red) locations.
  • Track3: Observed total copy number changes adjusted for tumor purity, including focal and chromosomal somatic events. Red = Loss; Green = Gain. Scaled from 0 (complete loss) to 6 (high level gains). If > 6, shown as 6 with a green dot on the outermost green gridline.
  • Track4: Observed minor allele copy numbers. Range from 0 to 3. Expected normal minor allele copy number is 1, and anything below 1 is shown as a loss (Orange), representing an LOH event. Minor allele copy numbers above 1 (Blue) indicate gains of both A and B alleles.
  • Track5 (Inner circle): Observed structural variants within or between the chromosomes.
    • Blue = Translocations
    • Red = Deletions
    • Yellow = Insertions
    • Green = Tandem duplications
    • Black = Inversions

Allele Ratios, BAF

Description
  • Track1: Chromosomes. Darker shaded areas: gaps in reference genome (centromeres, heterochromatin & missing short arms)
  • Track2: Tumor and Normal Allele Ratios
  • Track3: Beta Allele Frequency Given that the BAF points correspond to allele frequencies of heterozygous SNPs that are common in germline samples, there should not be any in chromosome Y (and chromosome X when male).

 

Structural Variants

What do the SV tables show?

Structural variants often involve deletions/insertions encompassing large genomic regions which can involve hundreds or even thousands of genes. The annotations for these genes/transcripts are included in the original SV VCF, but they are extremely hard to display and browse inside a HTML table, due to their size.

Example SV with -only- 3 annotations:

SIMPLE_ANN=DEL|TFBS_ablation||MA0114&MA0139&MA0281|unprioritized|4,DEL|downstream_gene_variant|HFE2|ENST00000336751_exon_1/4|unprioritized|4,DEL|intergenic_region|TXNIP-HFE2|ENSG00000265972-ENSG00000168509|unprioritized|4

Reshaped:

svtype | effect                  | genes      | transcript                      | detail        | tier
DEL    | TFBS_ablation           |            | MA0114&MA0139&MA0281            | unprioritized | 4
DEL    | downstream_gene_variant | HFE2       | ENST00000336751_exon_1/4        | unprioritized | 4
DEL    | intergenic_region       | TXNIP-HFE2 | ENSG00000265972-ENSG00000168509 | unprioritized | 4
## Error: object 'sv' not found

Summary

Plots

## Error: object 'sv' not found
## Error: object 'sv' not found
## Error: object 'p1' not found

Counts

 
 

SV Map

## Error: object 'sv.dt' not found

 
 

Breakpoints and Breakends

Prioritised

## Error: object 'sv.dt' not found

 
 

Filtered

## Error: object 'sv.dt' not found

 
 

Many Transcripts

## Error: object 'sv.dt' not found

 
 

Copy number variants

## Error: object 'cnv' not found

Prioritised

## Error: object 'cnv.dt' not found

 
 

Filtered

## Error: object 'cnv.dt' not found

 
 

Many Genes

## Error: object 'cnv.dt' not found

 
 

Many Transcripts

## Error: object 'cnv.dt' not found

 
 

PURPLE Gene Somatic CNV Calls

Description

PURPLE copy number alterations. - description is from https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md#gene-copy-number-file

Column Description
gene Name of gene
minCN/maxCN Min/Max copy number found in gene exons
chrom/start/end Chromosome/start/end location of gene transcript
chrBand Chromosome band of the gene
onco_or_ts oncogene (‘oncogene’), tumor suppressor (‘tsgene’), or both (‘onco+ts’), as reported by Cancermine
transcriptID Ensembl transcript ID (dot version)
minMinorAlleleCN Minimum allele ploidy found over the gene exons - useful for identifying LOH events
somReg (somaticRegions) Count of somatic copy number regions this gene spans
minReg (minRegions) Number of somatic regions inside the gene that share the min copy number
minRegStartEnd Start/End base of the copy number region overlapping the gene with the minimum copy number
minRegSupportStartEndMethod Start/end support of the CN region overlapping the gene with the min CN (plus determination method)

 
 

Genome-wide Somatic CNV Segments

Description

PURPLE outputs a file with the copy number profile of all contiguous segments of the tumor sample:

PURPLE copy number profile of all (contiguous) segments of the tumor sample - description is from https://github.com/hartwigmedical/hmftools/blob/master/purity-ploidy-estimator/README.md#copy-number-file

Column Description
Chr/Start/End Coordinates of copy number segment
CN Fitted absolute copy number of segment adjusted for purity and ploidy
CN Min+Maj CopyNumber of minor + major allele adjusted for purity
Start/End SegSupport Type of SV support for the CN breakpoint at start/end of region. Allowed values: CENTROMERE, TELOMERE, INV, DEL, DUP, BND (translocation), SGL (single breakend SV support), NONE (no SV support for CN breakpoint), MULT (multiple SV support at exact breakpoint)
Method Method used to determine the CN of the region. Allowed values: BAF_WEIGHTED (avg of all depth windows for the region), STRUCTURAL_VARIANT (inferred using ploidy of flanking SVs), LONG_ARM (inferred from the long arm), GERMLINE_AMPLIFICATION (inferred using special logic to handle regions of germline amplification)
BAF (count) Tumor BAF after adjusted for purity and ploidy (Count of AMBER baf points covered by this segment)
GC (windowCount) Proportion of segment that is G or C (Count of COBALT windows covered by this segment)

 
 

PURPLE Charts

PURPLE generates charts for summarising tumor sample characteristics. Description is from the PURPLE docs.

Purity/ploidy

The following ‘sunrise’ chart shows the range of scores of all examined solutions of purity and ploidy. Crosshairs identify the best purity / ploidy solution.

 
 

Copy number / Minor allele ploidy

The following figures show the AMBER BAF count weighted distribution of copy number and minor allele ploidy throughout the fitted segments. Copy numbers are broken down by colour into their respective minor allele ploidy (MAP) while the minor allele ploidy figure is broken down by copy number.

Ploidy by CN

If a somatic variant VCF has been supplied, a figure will be produced showing the somatic variant ploidy broken down by copy number.

Clonality

The following diagram illustrates the clonality model of a typical sample.

The top figure shows the histogram of somatic ploidy for all SNVs and INDELs in blue. Superimposed are peaks in different colours fitted from the sample as described in the docs while the black line shows the overall fitted ploidy distribution. Red filled peaks are below the 0.85 subclonal threshold.

We can determine the likelihood of a variant being subclonal at any given ploidy as shown in the bottom half of the figure.

Segment

The contribution of each fitted segment to the final score of the best fit is shown in the following figure. Each segment is divided into its major and minor allele ploidy. The area of each circle shows the weight (AMBER baf count) of each segment.

ASCAT Charts

ASCAT generates charts for summarising tumor sample characteristics. Description is from the ASCAT docs.

Purity/ploidy

The following ‘sunrise’ chart shows the range of scores of all examined solutions of purity and ploidy. Crosshairs identify the best purity / ploidy solution.

 
 

Copy number / Minor allele ploidy

The following figures show the segment results of ASCAT.

Report Inputs

key value
bcftools_stats
conda_list
dragen_hrd
oncokb_genes
title Clindet ….
somatic_snv_vcf
somatic_snv_summary
somatic_sv_tsv
somatic_sv_vcf
batch_name WES
af_keygenes /public/….
genome_version b37
ascat_res_dir /public/….
purple_res_dir /public/….
purple_som_gene_cnv /public/….
purple_som_cnv /public/….
purple_purity /public/….
purple_qc /public/….
purple_som_snv_vcf /public/….
virusbreakend_tsv
virusbreakend_vcf
result_outdir /public/….
tumor_name CGGA_P438

Conda Pkgs All

## No conda package list provided.