About Digital Expression Explorer

The compendium is designed to bring biologists closer to large scale gene expression data sets. We have processed thousands of public RNA-seq data sets from a veriety of organisms with open-source bioinformatics tools and make them freely accessible.

About us

The compendium is brought to you by Mark Ziemann (Deakin University) with support from the Epigenetics in Human Health and Disease Laboratory and Monash eResearch Centre. We value your feedback, so feel free to contact us by email (mark.ziemann[at]gmail.com) or raise an issue on our GitHub Repo.


A project of this size cannot be done by a single person or team, and so I would like to acknowledge the following:

This research was supported by use of the Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS).

This work was supported by the Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE).

This research was supported by Deakin eResearch, Monash eResearch Centre and the IT department of Baker Heart and Diabetes Institute.

This research would not have been possible without facilities and help from NCBI. In particular, we acknowledge support from SRA and GEO for curating and hosting these data.

Latest News

7th October 2018 - QC information is now linked in the search results and available by hovering the mouse over the QC link. Note that all the datasets are currently classified as "PASS" but this will be chenging soon. Also we moved to another nectar server with storage so there have been some updates to webserver cgi scripts. Also the keyword search is now case inseneitive.

22nd September 2018 - the bulk data dumps are now available via http. Dat turned out to be too slow and unreliable for files of this size.

21st September 2018 - Several small bits of news. Firstly a new domain name server that is working much better than the last one. Still unable to get https working so this will not likely happen while using Nectar servers. Secondly the domain name sever change broke the docker image so it was modified and rebuilt. Thirdly the data processing is progressing steadily with the torrent of new datasets that became visible upon integration with the newest SRAdbV2. Lastly, the R interface hs undergone several improvements and should be more robust now. The new documentation has been added (link).

14th March 2018 - Search functionality has been restored, so feel free to test it out.

1st March 2018 - Hope you like the new mobile friendly webpage

1st January 2018 - DEE2 data pipeline is finalised and processing is underway. For more information, see the recent blog post.

18th October 2017 - As of a month ago, the original DEE site has been down and we've been working towards DEE version 2, which will be hosted by Monash Uni. This app is still under construction, but stay tuned for the official release at the end of 2017.

More news

Data processing

Our data processing procedure entails:

  1. -Download from NCBI SRA

  2. -Diagnose sequence format

  3. -Sequence quality trimming and adapter clipping

  4. -Alignment to genome and transcriptome

  5. -Assignment of reads to genes and transcripts

More information regarding the data processing method is available at the GitHub repo.

Reference genome information

The compendium relies on reference genome sequence and annotation information provided by Ensembl Genomes .

Species Genome Reference Sequence and Annotation
Arabidopsis thaliana Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz
Ensembl release 36
Caenorhabditis elegans Caenorhabditis_elegans.WBcel235.dna_sm.toplevel.fa.gz
Ensembl release 90
Drosophila melanogaster Drosophila_melanogaster.BDGP6.dna_sm.toplevel.fa.gz
Ensembl release 90
Danio rerio Danio_rerio.GRCz10.dna_sm.toplevel.fa.gz
Ensembl release 90
Escherichia coli Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna_sm.chromosome.Chromosome.fa.gz
Ensembl release 36
Homo sapiens Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
Ensembl release 90
Mus musculus Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.gz
Ensembl release 90
Rattus norvegicus Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa.gz
Ensembl release 90
Saccharomyces cerevisiae Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz
Ensembl release 36

Update schedule

New datasets deposited to SRA will be incorporated into the compendium fortnightly. Upon release of an updated genome build, we intend to update the data for that organism within a year, keeping a previously archived version for bulk download only. Gene annotation sets will not be updated independent of the genome build.