About Digital Expression Explorer

The compendium is designed to bring biologists closer to large scale gene expression data sets. We have processed thousands of public RNA-seq data sets from a veriety of organisms with open-source bioinformatics tools and make them freely accessible.


About us

The compendium is brought to you by Mark Ziemann (Deakin University) with support from the Epigenetics in Human Health and Disease Laboratory and Monash eResearch Centre. We value your feedback, so feel free to contact us by email (mark.ziemann[at]gmail.com) or raise an issue on our GitHub Repo.


Acknowledgements

A project of this size cannot be done by a single person or team, so we acknowledge support from the following:

Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS).

The Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE).

Deakin eResearch, Monash eResearch Centre and the IT department of Baker Heart and Diabetes Institute.

This research would not have been possible without facilities and help from NCBI. In particular, we acknowledge support from SRA and GEO for curating and hosting these data.


Latest News

28th November 2018 - DEE2 was presented at the ABACBS2018 conference in Melbourne. See the Poster.

28th November 2018 - A guide to loading in bulk dumps had been added to my Blog.

12th November 2018 - Ever wondered how accurate DEE2 data actially is? Well we have just undertaken a simulation study and comparison to GEO deposited data to demonstrate the quality of DEE2 data. More details here.

26th October 2018 - Recently there have been many projects with ≥500 SRA runs, meaning that they have not been readily available from the webserver as one file. In order to address this, I've packaged these SRA projects with ≥200 SRA run numbers into zip files. Only projects with EVERY run processed successfully by DEE2 are included. These are now available here.

7th October 2018 - QC information is now linked in the search results and available by hovering the mouse over the QC link. Note that all the datasets are currently classified as "PASS" but this will be chenging soon. Also we moved to another nectar server with storage so there have been some updates to webserver cgi scripts. Also the keyword search is now case insensitive.

More news

Data processing

Our data processing procedure entails:

  1. -Download from NCBI SRA

  2. -Diagnose sequence format

  3. -Sequence quality trimming and adapter clipping

  4. -Alignment to genome and transcriptome

  5. -Assignment of reads to genes and transcripts

More information regarding the data processing method is available at the GitHub repo.


Reference genome information

The compendium relies on reference genome sequence and annotation information provided by Ensembl Genomes .

Species Genome Reference Sequence and Annotation
Arabidopsis thaliana Ensembl release 36
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Caenorhabditis elegans Ensembl release 90
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Drosophila melanogaster Ensembl release 90
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Danio rerio Ensembl release 90
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Escherichia coli Ensembl release 36
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Homo sapiens Ensembl release 90
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Mus musculus Ensembl release 90
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Rattus norvegicus Ensembl release 90
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)
Saccharomyces cerevisiae Ensembl release 36
Genome sequence (fasta)
Gene annotation set (GTF)
cDNA sequences (fasta)

Update schedule

New datasets deposited to SRA will be incorporated into the compendium fortnightly. Upon release of an updated genome build, we intend to update the data for that organism within a year, keeping a previously archived version for bulk download only. Gene annotation sets will not be updated independent of the genome build.