The goal of DEE2 is to make large scale gene expression data sets accessible to bioinformaticians, biologists and students alike. We use open-source bioinformatics tools and computational resources provided by our academic partners to provide many thousands of public RNA-seq data sets from a variety of organisms and make them freely accessible under a GNU General Public License v3.0.
This compendium is maintained by Dr Mark Ziemann (Deakin University) and Antony Kaspi (WEHI). We value your feedback, so feel free to contact us by email (mark.ziemann[at]gmail.com) or raise an issue on our GitHub Repo.
A project of this size cannot be done by a single person or team, so we acknowledge support from the following:
Epigenetics in Human Health and Disease Laboratory
Nectar Research Cloud, a collaborative
Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS).
The Multi-modal Australian ScienceS Imaging and Visualisation Environment
(MASSIVE).
Deakin eResearch,
Monash eResearch Centre
and the IT department of Baker Heart and Diabetes Institute.
This research would not have been possible without facilities and help from NCBI. In particular, we acknowledge
support from SRA and
GEO for curating and hosting these data.
10th Sep 2024 - Just completed an operating system upgrade, we should be good for another 2 years.
14th March 2022 - Data has been transferred and dee2 is operational again. You will notice that we have https enabled. We confirm that the R package getDEE2 is also functioning properly. On the other hand, transfering analyses to Degust is not working and we will be addressing this next.
9th March 2022 - We now have control of the webserver and we are transferring data to a new 4TB volume. We estimate that DEE2 will be fully operational on Monday 14th March.
1st March 2022 - DEE2 was taken down by the cloud provider due to a cybersecurity issue. Apparently the server was found to be sending spam emails. DEE2 team will restore the service once allowed by cloud provider.
11th May 2021 - We have just completed a migration of DEE2 webserver from older Intel Xeon to a new AMD Epyc system.
Visit the news archive.
Our data processing procedure entails:
More information regarding the data processing method is available at the GitHub repo. Below are the versions and major parameters used in the pipeline.
Software versions and parameters used in the pipeline. | |||
---|---|---|---|
Software, version | Purpose | Parameter | |
SE | PE | ||
Aspera client, v3.5.4 | Rapid download of sequence data | ascp -l 500m -O 33001 -T -i $ID $URL . | |
SRA toolkit, v2.8.2 | Validate downloaded SRA files | vdb-validate $SRA | |
diagnose single or paired end | fastq-dump -X 4000 --split-files $SRA | ||
dump fastq | (see parallel-fastq-dump below) | ||
FastQC, v0.11.5 | Diagnose basespace / colorspace, quality encoding, read length from 4000 reads | fastqc $FQ1 | fastqc $FQ2 |
parallel-fastq-dump, 0.6.3 | Rapid decompression of sequence data from .sra files | parallel-fastq-dump --threads $THREADS --outdir . --split-files --defline-qual + -s ${SRR}.sra | |
Skewer, v0.2.2 | 3’ quality trimming | skewer -l 18 -q 10 -k inf -t $THREADS -o $SRR $FQ1 | skewer -l 18 -q 10 -k inf -t $THREADS -o $SRR $FQ1 $FQ2 |
Adapter clipping | skewer -l 18 -t $THREADS -x $ADAPTER -o $SRR $FQ1 | skewer -l 18 -t $THREADS -x $ADAPTER1 -y $ADAPTER2 -o $SRR $FQ1 $FQ2 | |
5’ trimming | skewer -m ap --cut $CLIP_NUM,$CLIP_NUM -l 18 -k inf -t $THREADS $FQ1 | skewer -m ap --cut $R1_CLIP_NUM,$R2_CLIP_NUM -l 18 -k inf -t $THREADS $FQ1 $FQ2 | |
Minion, v13-100 | 3’ adapter detection | minion search-adapter -i $FQ1 | minion search-adapter -i $FQ2 |
Bowtie2, v2.3.2 | Adapter contamination detection | bowtie2 -f -x $BT2_REF -S /dev/stdout $ADAPTER | |
FASTX-Toolkit, v0.0.14 | Progressive 5’ trimming | fastx_trimmer -f {5,9,13,21} -m 18 -Q 33 -i $FQ1 | fastx_trimmer -f {5,9,13,21} -m 18 -Q 33 -i $FQ2 |
STAR v020201 | Gene-level mapping, Diagnose strandedness | STAR --runThreadN $THREADS --quantMode GeneCounts \ --genomeLoad LoadAndKeep --outSAMtype None \ --genomeDir $STAR_DIR --readFilesIn=$FQ1 |
STAR --runThreadN $THREADS --quantMode GeneCounts \ --genomeLoad LoadAndKeep --outSAMtype None \ --genomeDir $STAR_DIR --readFilesIn=$FQ1 $FQ2 |
Kallisto, v0.43.1 | Transcript-level mapping | kallisto quant $KALLISTO_STRAND_PARAMETER \ --single -l 100 -s 20 -t $THREADS -o . \ -i $KAL_REF $FQ1 |
kallisto quant $KALLISTO_STRAND_PARAMETER \ -t $THREADS -o . -i $KAL_REF $FQ1 $FQ2 |
The compendium relies on reference genome sequence and annotation information provided by Ensembl Genomes .
A description of each of the quality metrics is provided on the Gitub page here.
We are updating the compendium fortnightly. Upon release of an updated genome build, we intend to update the data for that organism within a year, keeping a previously archived version for bulk download only. Gene annotation sets will not be updated independent of the genome build.