3. Use case II: Fusion genes detection from multiple myeloma patient RNA-seq#

3.1. Background#

​​Clinical Applications of RNA-Seq in Diagnostic Testing​​

RNA sequencing (RNA-Seq) is a high-throughput transcriptome profiling technology that enables comprehensive analysis of gene expression, splicing variants, fusion events, and novel transcripts. In clinical diagnostics, it serves as a powerful tool for:

∙ ​​Cancer Subtyping​​: Identifying tumor-specific gene expression signatures, fusion genes (e.g., BCR-ABL1), and aberrant splicing events to guide targeted therapies.

∙ ​​Rare Disease Diagnosis​​: Detecting dysregulated pathways and aberrant expression in Mendelian disorders where DNA-based tests are inconclusive.

∙ ​​Infectious Disease Characterization​​: Profiling host-pathogen interactions and pathogen expression in complex infections.

∙ ​​Biomarker Discovery​​: Validating expression-based biomarkers for disease monitoring and treatment response.

3.2. Setup a project folder#

Note

Before starting the analysis, please ensure that you have set up the analysis environment using the build_conda_env.sh script.

Create a folder named project/CGGA_WES in your home directory and activate the Clindet conda environment.

mkdir -p ~/projects/MM_RNA
cd ~/projects/MM_RNA
conda activate clindet

3.3. Download data and#

Download Multiple myeloma and COLO829 cellline RNA-seq data from the SRA database using wget and prepare the sample information file, make sure fastq-dump are in in $PATH (if don’t install it first)

cd ~/projects/MM_RNA
mkdir -p data && cd data
## Methods one multiple myeloma RNA-seq data
wget -q -c -O A26.11 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR12099713/SRR12099713
wget -q -c -O A27.19 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR12099714/SRR12099714
wget -q -c -O A28.15 https://sra-pub-run-odp.s3.amazonaws.com/sra/SRR12099715/SRR12099715

fastq-dump --gzip -O /public/ClinicalExam/lj_sih/projects/project_clindet/data/GSE153380 --split-3 ./A26.11
fastq-dump --gzip -O /public/ClinicalExam/lj_sih/projects/project_clindet/data/GSE153380 --split-3 ./A27.19
fastq-dump --gzip -O /public/ClinicalExam/lj_sih/projects/project_clindet/data/GSE153380 --split-3 ./A28.15

Next, create a CSV file named pipe_rna.csv in the ~/projects/MM_RNA directory with the following content:

Tumor_R1_file_path,Tumor_R2_file_path,Normal_R1_file_path,Normal_R2_file_path,Sample_name,Target_file_bed,Project
~/projects/MM_RNA/data/A26.11_1.fastq.gz,~/projects/MM_RNA/data/A26.11_2.fastq.gz,MF1
~/projects/MM_RNA/data/A27.19_1.fastq.gz,~/projects/MM_RNA/data/A27.19_2.fastq.gz,MS3
~/projects/MM_RNA/data/A28.15_1.fastq.gz,~/projects/MM_RNA/data/A28.15_2.fastq.gz,CD1

3.4. Write an Snakemake file from template#

For this project, modify the sample sheet and create a new Snakemake file named snake_rna.smk (see below). Set the following parameters in the Snakemake file:

  1. configfile (str): config file for softwares and resource parameters.

  2. stage (list): analysis steps. avaiable options:['RSEM','arriba','TRUST4','samlom','kallisto']

3.5. write Snakemake file#

For this project, we need change the sample sheet info.

3.6. Run clindet#

There is two way you can run clindet

  1. run on a local server

  2. submit to HPC through slurm

3.6.1. Run on local node#

nohup snakemake -j 30 --printshellcmds -s snake_rna.smk \
--use-singularity --singularity-args "--bind /your/home/path:/your/home/path" \
--latency-wait 300 --use-conda >> rna.log

3.6.2. Submit to HPC use slurm#

we provide a slurm config.yaml under clindet/workflow/config_slurm folder.

nohup snakemake --profile workflow/config_slurm \
-j 30 --printshellcmds -s snake_rna.smk --use-singularity \
--singularity-args "--bind /your/home/path:/your/home/path" \
--latency-wait 300 --use-conda >> rna.log

3.6.3. Output#

3.7. Results#

fishy