Bulk data@DEE2

Bulk data dumps for each species are available via http here.

The data are in 'long format' tables with the columns: 'dataset', 'gene', 'count'. Long tables are prefered for loading into databases as compared to wide matrix format. STAR counts have the prefix 'se.tsv.bz2' while kallisto estimated counts have the 'ke.tsv.bz2' prefix.

QC metrics are also available in long table format with the columns 'dataset','QC metric type', 'QC metric result'.


Corresponding metadata can be obtained here via http here. The files with the ".tsv" suffix are obtained from SRA and describe each run. The files with the ".tsv.cut" are a reduced metadata, which just contains the corresponding accession numbers and QC summary.


Data processsing is still underway so some datasets may be missing. Contact us if you would like specific data added, or alternatively spin up your own docker according to the guide.


Data for large-scale studies with ≥50 runs, that have been fully processed by DEE2 are packaged as zip files and available here.