About Digital Expression Explorer

The compendium is designed to bring biologists closer to large scale gene expression data sets. We have processed thousands of public RNA-seq data sets from a veriety of organisms with open-source bioinformatics tools and make them freely accessible. We would like to thank the folks at SRA for hosting these raw data sets.


About us

The compendium is brought to you by the Epigenetics in Human Health and Disease Laboratory with support from Monash eResearch Centre. We value your feedback, so feel free to contact us by email (mark.ziemann[at]monash.edu) or raise an issue on our GitHub Repo.


Latest News

14th March 2018 - Search functionality has been restored, so feel free to test it out.

1st March 2018 - Hope you like the new mobile friendly webpage

1st January 2018 - DEE2 data pipeline is finalised and processing is underway. For more information, see the recent blog post.

18th October 2017 - As of a month ago, the original DEE site has been down and we've been working towards DEE version 2, which will be hosted by Monash Uni. This app is still under construction, but stay tuned for the official release at the end of 2017.

15th December 2016 - Our lab is moving to Monash Central Clinical School. Stay tuned for a revamped DEE in 2017 with new data and features. This URL could go dead anytime next year so please finalise your work.

13th August 2016 - Looking for Illumina bodymap2 count data? Try searching ERP000546 for human.

More news

Data processing

Our data processing procedure entails:

  1. -Download from NCBI SRA

  2. -Diagnose sequence format

  3. -Sequence quality trimming and adapter clipping

  4. -Alignment to genome and transcriptome

  5. -Assignment of reads to genes and transcripts

More information regarding the data processing method is available at the GitHub repo.


Reference genome information

The compendium relies on reference genome sequence and annotation information provided by Ensembl Genomes .

Species Genome Reference Sequence and Annotation
Arabidopsis thaliana Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz
Ensembl release 36
Caenorhabditis elegans Caenorhabditis_elegans.WBcel235.dna_sm.toplevel.fa.gz
Ensembl release 90
Drosophila melanogaster Drosophila_melanogaster.BDGP6.dna_sm.toplevel.fa.gz
Ensembl release 90
Danio rerio Danio_rerio.GRCz10.dna_sm.toplevel.fa.gz
Ensembl release 90
Escherichia coli Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna_sm.chromosome.Chromosome.fa.gz
Ensembl release 36
Homo sapiens Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
Ensembl release 90
Mus musculus Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.gz
Ensembl release 90
Rattus norvegicus Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa.gz
Ensembl release 90
Saccharomyces cerevisiae Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz
Ensembl release 36

Update schedule

New datasets deposited to SRA will be incorporated into the compendium fortnightly. Upon release of an updated genome build, we intend to update the data for that organism within a year, keeping a previously archived version for bulk download only. Gene annotation sets will not be updated independent of the genome build.