Introduction
Sillymix is a bioinformatics analysis pipeline used to perform in-silico mixtures of tumoral and healthy fastq files at specified sequencing depth (in number of reads) and tumoral fractions.
The pipeline is built using Nextflow (>=23.10.1) a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker / Singularity containers making installation trivial and results highly reproducible.
Usage
NOTE If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.
Tumoral fractions
First, prepare a .txt file with your tumoral fractions:
tum_fractions.txt:
0.1
0.2
0.5
Sequencing depth (number of reads)
Prepare a .txt file with the sequencing depth values:
num_reads.txt
100000
500000
555000
Run the pipeline
Now you can run the NextFlow (>=23.10.1) pipeline. You need to allocate at least 12 CPUs and 72 GB of RAM to make the pipeline running.
You can run the pipeline using docker profile:
nextflow run main.nf --healthy_dir data/healthy tumor_dir data/nbl --tumor_fractions data/tum_fractions.txt --seq_depths data/num_reads.txt --outdir results -profile docker
Or using singularity profile:
nextflow run main.nf --healthy_dir data/healthy tumor_dir data/nbl --tumor_fractions data/tum_fractions.txt --seq_depths data/num_reads.txt --outdir results --fasta data/hg19.fa -profile singularity
The params.yaml file looks like the following:
params.yaml:
tumor_fractions: data/tum_fractions.txt
seq_depths: data/num_reads.txt
healthy_dir: data/healthy
tumor_dir: data/nbl
nrepl: 3
outdir: results
The healthy and tumor directories must contain the .bam files that will be used to perform the mixtures. All the parameters specified in the file can also be specified in the command line using the corresponding flags.
Credits
The scripts and containers have been written and built by Edoardo Giuili (@edogiuili) who is also the maintainers.
Citations
If you use this pipeline for your analysis, please cite it using the following doi: (âĻ). You can cite the nf-core publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.