Use case II: Fusion genes detection from multiple myeloma patient RNA-seq

3. Use case II: Fusion genes detection from multiple myeloma patient RNA-seq#

3.1. Background#

Clinical Applications of RNA-Seq in Diagnostic Testing

RNA sequencing (RNA-Seq) is a high-throughput transcriptome profiling technology that enables comprehensive analysis of gene expression, splicing variants, fusion events, and novel transcripts. In clinical diagnostics, it serves as a powerful tool for:

∙ Cancer Subtyping: Identifying tumor-specific gene expression signatures, fusion genes (e.g., BCR-ABL1), and aberrant splicing events to guide targeted therapies.

∙ Rare Disease Diagnosis: Detecting dysregulated pathways and aberrant expression in Mendelian disorders where DNA-based tests are inconclusive.

∙ Infectious Disease Characterization: Profiling host-pathogen interactions and pathogen expression in complex infections.

∙ Biomarker Discovery: Validating expression-based biomarkers for disease monitoring and treatment response.

3.2. Setup a project folder#

Note

Before starting the analysis, please ensure that you have set up the analysis environment using the build_conda_env.sh script.

Create a folder named project/CGGA_WES in your home directory and activate the Clindet conda environment.

mkdir -p ~/projects/MM_RNA
cd ~/projects/MM_RNA
conda activate clindet

3.3. Download data and#

Download Multiple myeloma and COLO829 cellline RNA-seq data from the SRA database using wget and prepare the sample information file, make sure fastq-dump are in in $PATH (if don’t install it first)

cd ~/projects/MM_RNA
mkdir -p data && cd data
## Methods one multiple myeloma RNA-seq data
wget -q -c -O A26.11 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR12099713/SRR12099713
wget -q -c -O A27.19 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR12099714/SRR12099714
wget -q -c -O A28.15 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR12099715/SRR12099715

fastq-dump --gzip -O /public/ClinicalExam/lj_sih/projects/project_clindet/data/GSE153380 --split-3 ./A26.11
fastq-dump --gzip -O /public/ClinicalExam/lj_sih/projects/project_clindet/data/GSE153380 --split-3 ./A27.19
fastq-dump --gzip -O /public/ClinicalExam/lj_sih/projects/project_clindet/data/GSE153380 --split-3 ./A28.15

Next, create a CSV file named pipe_rna.csv in the ~/projects/MM_RNA directory with the following content:

Tumor_R1_file_path,Tumor_R2_file_path,Normal_R1_file_path,Normal_R2_file_path,Sample_name,Target_file_bed,Project
~/projects/MM_RNA/data/A26.11_1.fastq.gz,~/projects/MM_RNA/data/A26.11_2.fastq.gz,MF1
~/projects/MM_RNA/data/A27.19_1.fastq.gz,~/projects/MM_RNA/data/A27.19_2.fastq.gz,MS3
~/projects/MM_RNA/data/A28.15_1.fastq.gz,~/projects/MM_RNA/data/A28.15_2.fastq.gz,CD1

3.4. Write an Snakemake file from template#

For this project, modify the sample sheet and create a new Snakemake file named snake_rna.smk (see below). Set the following parameters in the Snakemake file:

configfile (str): config file for softwares and resource parameters.
stage (list): analysis steps. avaiable options:['RSEM','arriba','TRUST4','samlom','kallisto']

3.5. write Snakemake file#

For this project, we need change the sample sheet info.

Tip

import pandas as pd
samples_info = pd.read_csv('./pipe_rna.csv',index_col='Sample_name')
unpaired_samples = samples_info.loc[pd.isna(samples_info['R2_file_path'])].index.tolist()
paired_samples = samples_info.loc[~pd.isna(samples_info['R1_file_path'])].index.tolist()

configfile: "/public/ClinicalExam/lj_sih/projects/project_clindet/build_log/config.yaml"

import os
if not os.path.exists("logs/slurm"):
    os.makedirs("logs/slurm")

groups = ['NC','T']
stages = ['RSEM','arriba','TRUST4','samlom','kallisto']
caller_list = ['sentieon_anno_rnaedit','Mutect2_filter']
project = 'RNA'
genome_version = 'b37'

rna_res_list = [
    ##### for isoform expression ######
    "{project}/{genome_version}/results/summary/RSEM/{sample}/{sample}.genes.results" if 'RSEM'          in stages else None,
    ##### ka
    "{project}/{genome_version}/results/summary/kallisto/{sample}/abundance.tsv" if 'kallisto'          in stages else None,
    ##### for fusion gene detection #####
    "{project}/{genome_version}/results/fusion/{sample}_arriba_fusion.tsv" if 'arriba'          in stages else None,
    ##### for TRUST4 immu analysis #####
    "{project}/{genome_version}/results/IG/TRUST4/{sample}_report.tsv" if 'TRUST4'          in stages else None,
    #### Case report #####
]
rna_res_list = list(filter(None, rna_res_list))
rule all:
    input:
        ## paired sample
        expand(rna_res_list,
        # sample = paired_samples,
        sample = ['CD1','COLO829'],
        project = project,
        genome_version = genome_version
        ),
        
##### Modules #####
include: "workflow/RNA/Snakefile"

3.6. Run clindet#

There is two way you can run clindet

run on a local server
submit to HPC through slurm

3.6.1. Run on local node#

nohup snakemake -j 30 --printshellcmds -s snake_rna.smk \
--use-singularity --singularity-args "--bind /your/home/path:/your/home/path" \
--latency-wait 300 --use-conda >> rna.log

3.6.2. Submit to HPC use slurm#

we provide a slurm config.yaml under clindet/workflow/config_slurm folder.

nohup snakemake --profile workflow/config_slurm \
-j 30 --printshellcmds -s snake_rna.smk --use-singularity \
--singularity-args "--bind /your/home/path:/your/home/path" \
--latency-wait 300 --use-conda >> rna.log

3.6.3. Output#

3.7. Results#