About Digital Expression Explorer
The compendium is designed to bring biologists closer to large scale gene expression data sets.
We have processed thousands of public RNA-seq data sets from a veriety of organisms with open-source bioinformatics tools and make them
The compendium is brought to you by Mark Ziemann (Deakin University) with support from the
Epigenetics in Human Health and Disease Laboratory and
Monash eResearch Centre.
We value your feedback, so feel free to contact us by email (mark.ziemann[at]gmail.com) or raise an
issue on our GitHub Repo.
A project of this size cannot be done by a single person or team, so we acknowledge support from the following:
Nectar Research Cloud, a collaborative
Australian research platform supported by the National Collaborative Research Infrastructure Strategy (NCRIS).
The Multi-modal Australian ScienceS Imaging and Visualisation Environment
Monash eResearch Centre
and the IT department of Baker Heart and Diabetes Institute.
This research would not have been possible without facilities and help from NCBI. In particular, we acknowledge
support from SRA and
GEO for curating and hosting these data.
28th November 2018 - DEE2 was presented at the ABACBS2018 conference in Melbourne. See the Poster.
28th November 2018 - A guide to loading in bulk dumps had been added to my Blog.
12th November 2018 - Ever wondered how accurate DEE2 data actially is? Well we have just undertaken a simulation
study and comparison to GEO deposited data to demonstrate the quality of DEE2 data. More details
26th October 2018 - Recently there have been many projects with ≥500 SRA runs, meaning that they have not
been readily available from the webserver as one file. In order to address this, I've packaged these SRA projects with
≥200 SRA run numbers into zip files. Only projects with EVERY run processed successfully by DEE2 are included.
These are now available here.
7th October 2018 - QC information is now linked in the search results and available by hovering the
mouse over the QC link. Note that all the datasets are currently classified as "PASS" but this will be
chenging soon. Also we moved to another nectar server with storage so there have been some updates to
webserver cgi scripts. Also the keyword search is now case insensitive.
Our data processing procedure entails:
- -Download from NCBI SRA
- -Diagnose sequence format
- -Sequence quality trimming and adapter clipping
- -Alignment to genome and transcriptome
- -Assignment of reads to genes and transcripts
More information regarding the data processing method is available at the GitHub repo.
Reference genome information
The compendium relies on reference genome sequence and annotation information provided by
Ensembl Genomes .
New datasets deposited to SRA will be incorporated into the compendium fortnightly. Upon release
of an updated genome build, we intend to update the data for that organism within a year, keeping a
previously archived version for bulk download only. Gene annotation sets will not be updated
independent of the genome build.