Use case III: Mutational signatures analysis for C.elegans

4. Use case III: Mutational signatures analysis for C.elegans#

4.1. Background#

In this example, we will re-analysis whole genome sequencing data from Volkova et,al..

In the original paper, 54 gentypes C.elegans were treated by 12 genotoxins with 2-3 different doses, generated 2717 total mutagenesis experiments and whole genome sequencing data.

here, we select ten samples which:

xpc-1 gene knockout with UV treat.

xpc-1 gene predicted to enable damaged DNA binding activity and single-stranded DNA binding activity. Involved in response to UV. Predicted to be located in nucleus. Predicted to be part of XPC complex and nucleotide-excision repair factor 2 complex. Predicted to be active in cytoplasm. Is expressed in germline precursor cell; intestine; and nervous system. Used to study xeroderma pigmentosum. Human ortholog(s) of this gene implicated in pancreatic cancer; serous cystadenocarcinoma; xeroderma pigmentosum; and xeroderma pigmentosum group C. Orthologous to human XPC (XPC complex subunit, DNA damage recognition and repair factor).

mlh-1 gene knockout.

mlh-1 gene predicted to enable ATP hydrolysis activity. Predicted to be involved in mismatch repair. Predicted to be located in nucleus. Predicted to be part of MutLalpha complex. Human ortholog(s) of this gene implicated in several diseases, including Lynch syndrome (multiple); carcinoma (multiple); and cervix uteri carcinoma in situ. Is an ortholog of human MLH1 (mutL homolog 1). Curator: Ranjana Kishore; Valerio Arnaboldi

mrt-2 gene knockout.

mrt-2 gene predicted to enable damaged DNA binding activity. Involved in DNA metabolic process and intracellular signal transduction. Predicted to be located in nucleus. Predicted to be part of checkpoint clamp complex. Orthologous to human RAD1 (RAD1 checkpoint DNA exonuclease).

表4.1 re-analysis samples info.#
Sample	Genotype	Generation	Replicate	Mutagen
CD0009b	mrt-2	0	0
CD0009f	mrt-2	20	3
CD0001b	N2	0	0
CD0134a	mlh-1	20	2
CD0134c	mlh-1	20	3
CD0134d	mlh-1	20	4
CD0392a	xpc-1	0	0
CD0842b	xpc-1	1	1	UV
CD0842c	xpc-1	1	2	UV
CD0842d	xpc-1	1	3	UV

4.2. Download data#

4.3. Download and confing C.elegans genome file#

4.4. write Snakemake file#

For this project, we need change the sample sheet info.

Tip

import pandas as pd
samples_info = pd.read_csv('',index_col='Sample_name') # set sample sheet path

unpaired_samples = samples_info.loc[pd.isna(samples_info['Normal_R1_file_path'])].index.tolist()
paired_samples = samples_info.loc[~pd.isna(samples_info['Normal_R1_file_path'])].index.tolist()

configfile: "" # set config file path

project = samples_info["Project"].unique().tolist()[0]
genome_version = 'WBcel235' # set genome version 

import os
if not os.path.exists("logs/slurm"):
    os.makedirs("logs/slurm")

pre_pon_db = False

if not os.path.exists('analysis/pindel_normal/log'):
    os.makedirs('analysis/pindel_normal/log')

groups = ['NC','T']

germ_caller_list = ['caveman']
caller_list = ['strelkasomaticmanta','caveman','muse','cgppindel_filter']

recall_pon =  False
recall_pon_pindel =  False

recal = False
rule all:
    input:
        ## paired sample
        expand([
            "{project}/{genome_version}/results/dedup/paired/{sample}-{group}.sorted.bam",
            "{project}/{genome_version}/results/sv/paired/DELLY/{sample}/SV_delly_{sample}_filter.vcf",
            "{project}/{genome_version}/results/vcf/paired/{sample}/strelkasomaticmanta.vcf",
            "{project}/{genome_version}/results/vcf/paired/{sample}/muse.vcf",
            '{project}/{genome_version}/logs/paired/caveman_{sample}.log',
        ],
        project = project,
        genome_version = genome_version,
        sample = paired_samples,
	    group = groups,
        caller = caller_list)

include: "workflow/WGS/Snakefile"  # the relative path of clindet workflow WGS subfolder snakefile  

4.5. Run clindet#

nohup snakemake --profile workflow/config_slurm \
-j 30 --printshellcmds -s snake_wgs_worm.smk \
--use-singularity \
--singularity-args "--bind /public/home/:/public/home/,/public/ClinicalExam:/public/ClinicalExam" \
--latency-wait 300 --use-conda --conda-frontend conda  -k > worm.out &

4.6. Results#

4.6.1. 突变检测#

4.6.2. #

在本例中我们选取了两种加