# DEE2 Folder contents
In this folder, you will see several files:

## GeneCountMatrix.tsv     
A tab separated file of gene level expression counts generated by STAR.
These are unnormalised, so they are suitable for a variety of statistical and visualisation purposes.
The different columns represent different SRA run accession numbers and rows represent Ensembl genes.
For information about the genome build and transcriptome annotation, see https://dee2.io/pipeline.

## GeneInfo.tsv
A tab separated file of Ensembl genes mapped to their gene symbol as well as the length of gene, as calculated by GTFtools.
This file might be useful if you need to convert the gene IDs from accession numbers to symbols for downstream pathway analysis
or need to calculate FPKM for genes. Each row represents one Ensembl gene.

## logs/
A folder of logs, each file corresponding to an SRA run accession number. Logs include Fastqc stats, Skewer trimming stats,  STAR mapping stats and Kallisto mapping stats.

## MetadataSummary.tsv     
A tab separated file containing very brief metadata about each SRA run. This includes links between the SRA run and accessions to samples, experiments, projects and GEO accession numbers. There is a column to provide information about the quality control  summary. Meanings behind the warnings are provided [here](https://github.com/markziemann/dee2/blob/master/qc/qc_metrics.md). There is a column called "experiment_title" which might be useful to quickly distinguish samples into different treatment  groups. Each row represents one SRA run.

## MetadataFull.tsv
A tab separated file containing extensive metadata about each SRA run including information from corresponding samples, projects, etc. This file is very wide, so would best be loaded into R or a spreadsheet. Each row represents an SRA run.

## QC_Matrix.tsv
A tab separated file containing quality control information collected during data processing. These metrics are used to determine the QC_summary seen in the MetadataSummary.tsv file. Each column represents an SRA run and each row represents a different quality metric.

## TxCountMatrix.tsv
A tab separated file containng transcript-level expression counts generated by Kallisto. The different columns represent different SRA runs and the rows represent different Ensembl transcripts. These counts can be fractional, ie: not integers, so be wary that some analyses (eg: DESeq2) require these values be rounded to integers.

## TxInfo.tsv 
A tab separated file of Ensembl transcripts and their corresponding Ensembl gene IDs, gene symbols and transcript length. This file will be helpful for downstream analysis such as pathway analysis and calculating FPKM.

## Need help?
Contact us by email: mark.ziemann[at]gmail.com
