git clone https://github.com/animesh/rnaseq_nextflow
cd rnaseq_nextflow/
wget http://genomedata.org/rnaseq-tutorial/HBR_UHR_ERCC_ds_5pc.tar
tar -xvf HBR_UHR_ERCC_ds_5pc.tar
echo "sample,fastq_1,fastq_2,strandedness" > samples.csv
ls -1 *1.fastq.gz | awk -F "_" '{print $1 $2}' > c0
ls -1 $PWD/*1.fastq.gz > c1
ls -1 $PWD/*2.fastq.gz > c2
printf 'auto\n%.0s' {1..`ls *1.fastq.gz`} > c3
paste -d "," c? >> samples.csv
cat samples.csv
wget http://genomedata.org/rnaseq-tutorial/fasta/GRCh38/chr22_with_ERCC92.fa
wget http://genomedata.org/rnaseq-tutorial/annotations/GRCh38/chr22_with_ERCC92.gtf
sed "s/exon_number \"1\";$/exon_number \"1\";gene_biotype "protein_coding";/g" chr22_with_ERCC92.gtf > chr22_with_ERCC92.biotype.gtf
Pre-reqs: curl, Java and Docker or Singularity on Linux, Windows users can run via WSL
curl -s https://get.nextflow.io | bash
N E X T F L O W
version 24.04.4 build 5917
created 01-08-2024 07:05 UTC (09:05 CEST)
cite doi:10.1038/nbt.3820
http://nextflow.io
Nextflow installation completed. Please note:
- the executable file `nextflow` has been created in the folder: /home/ash022/rnaseq_nextflow
- you may complete the installation by moving it to a directory in your $PATH
./nextflow run hello
N E X T F L O W ~ version 24.04.4
Launching `https://github.com/nextflow-io/hello` [trusting_cori] DSL2 - revision: afff16a9b4 [master]
executor > local (4)
executor > local (4)
[4d/db3896] sayHello (1) [100%] 4 of 4 ✔
Ciao world!
Hello world!
Hola world!
Bonjour world!
./nextflow run nf-core/rnaseq -profile docker,test --outdir test
./nextflow run nf-core/rnaseq --max_memory '16.GB' --max_cpus 6 --input samples.csv --outdir results --gtf chr22_with_ERCC92.biotype.gtf --fasta chr22_with_ERCC92.fa -profile docker
./nextflow run nf-core/rnaseq --max_memory '16.GB' --max_cpus 6 --input samples.csv --outdir results --gtf chr22_with_ERCC92.biotype.gtf --fasta chr22_with_ERCC92.fa -profile docker -resume
but if nothing seems to be working, try https://github.com/nf-core/rnaseq/issues 🤞
wget https://rnabio.org/assets/module_2/multiqc.png
looks similar to nextflow multiqc_report
compare original
wget http://genomedata.org/rnaseq-tutorial/results/cshl2022/rnaseq/gene_read_counts_table_all_final.tsv
with resulting nextflow-gene-counts, spearman rank-correlation is ~ 0.99 calculated using Perseus shown in blue in scatter-plot and Euclidean distance clusters) accordingly, more on this in blog/post https://fuzzylife.substack.com/p/rna-seq-analysis-with-nextflow
wget http://ftp.ensembl.org/pub/release-112/fasta/homo_sapiens/dna/Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
wget http://ftp.ensembl.org/pub/release-112/gtf/homo_sapiens/Homo_sapiens.GRCh38.112.gtf.gz
./nextflow run nf-core/rnaseq --max_memory '62.GB' --max_cpus 16 --input samples.csv --outdir results --gtf Homo_sapiens.GRCh38.112.gtf.gz --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -profile docker
rm -rf work .nextflow c? HBR_UHR_ERCC_ds_5pc.tar
sudo touch nohup.out
sudo nohup dockerd &
./nextflow run nf-core/rnaseq --max_memory '92.GB' --max_cpus 16 --input samples.csv --outdir results --gtf Homo_sapiens.GRCh38.112.gtf.gz --fasta Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz -profile docker