About Digital Expression Explorer

The compendium is designed to bring biologists closer to large scale gene expression data sets. We have processed thousands of public RNA-seq data sets from a veriety of organisms with open-source bioinformatics tools and make them freely accessible.


About us

The compendium is brought to you by Mark Ziemann (Deakin University) with support from the Epigenetics in Human Health and Disease Laboratory and Monash eResearch Centre. We value your feedback, so feel free to contact us by email (mark.ziemann[at]gmail.com) or raise an issue on our GitHub Repo.


Acknowledgements

A project of this size cannot be done by a single person or team, so we acknowledge support from the following:

Nectar Research Cloud, a collaborative Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS).

The Multi-modal Australian ScienceS Imaging and Visualisation Environment (MASSIVE).

Deakin eResearch, Monash eResearch Centre and the IT department of Baker Heart and Diabetes Institute.

This research would not have been possible without facilities and help from NCBI. In particular, we acknowledge support from SRA and GEO for curating and hosting these data.


Latest News

12th November 2018 - Ever wondered how accurate DEE2 data actially is? Well we have just undertaken a simulation study and comparison to GEO deposited data to demonstrate the quality of DEE2 data. More details here.

26th October 2018 - Recently there have been many projects with ≥500 SRA runs, meaning that they have not been readily available from the webserver as one file. In order to address this, I've packages SRA projects with ≥200 SRA run numbers into zip files. Only projects with EVERY run processed successfully by DEE2 are included. These are now available here.

7th October 2018 - QC information is now linked in the search results and available by hovering the mouse over the QC link. Note that all the datasets are currently classified as "PASS" but this will be chenging soon. Also we moved to another nectar server with storage so there have been some updates to webserver cgi scripts. Also the keyword search is now case insensitive.

22nd September 2018 - the bulk data dumps are now available via http. Dat turned out to be too slow and unreliable for files of this size.

21st September 2018 - Several small bits of news. Firstly a new domain name server that is working much better than the last one. Still unable to get https working so this will not likely happen while using Nectar servers. Secondly the domain name sever change broke the docker image so it was modified and rebuilt. Thirdly the data processing is progressing steadily with the torrent of new datasets that became visible upon integration with the newest SRAdbV2. Lastly, the R interface hs undergone several improvements and should be more robust now. The new documentation has been added (link).

More news

Data processing

Our data processing procedure entails:

  1. -Download from NCBI SRA

  2. -Diagnose sequence format

  3. -Sequence quality trimming and adapter clipping

  4. -Alignment to genome and transcriptome

  5. -Assignment of reads to genes and transcripts

More information regarding the data processing method is available at the GitHub repo.


Reference genome information

The compendium relies on reference genome sequence and annotation information provided by Ensembl Genomes .

Species Genome Reference Sequence and Annotation
Arabidopsis thaliana Arabidopsis_thaliana.TAIR10.dna_sm.toplevel.fa.gz
Ensembl release 36
Caenorhabditis elegans Caenorhabditis_elegans.WBcel235.dna_sm.toplevel.fa.gz
Ensembl release 90
Drosophila melanogaster Drosophila_melanogaster.BDGP6.dna_sm.toplevel.fa.gz
Ensembl release 90
Danio rerio Danio_rerio.GRCz10.dna_sm.toplevel.fa.gz
Ensembl release 90
Escherichia coli Escherichia_coli_str_k_12_substr_mg1655.ASM584v2.dna_sm.chromosome.Chromosome.fa.gz
Ensembl release 36
Homo sapiens Homo_sapiens.GRCh38.dna_sm.primary_assembly.fa.gz
Ensembl release 90
Mus musculus Mus_musculus.GRCm38.dna_sm.primary_assembly.fa.gz
Ensembl release 90
Rattus norvegicus Rattus_norvegicus.Rnor_6.0.dna_sm.toplevel.fa.gz
Ensembl release 90
Saccharomyces cerevisiae Saccharomyces_cerevisiae.R64-1-1.dna_sm.toplevel.fa.gz
Ensembl release 36

Update schedule

New datasets deposited to SRA will be incorporated into the compendium fortnightly. Upon release of an updated genome build, we intend to update the data for that organism within a year, keeping a previously archived version for bulk download only. Gene annotation sets will not be updated independent of the genome build.