Dataset - Release 9 Primary Data¶
The Release 9 primary data compendium contains uniformly pre-processed and mapped data from multiple profiling experiments (technical and biological replicates from multiple individuals and/or datasets from multiple centers). All datasets were uniformly pre-processed by mapping reads onto hg19 assembly of the human genome using Pash 3.0 read mapper. Complete metadata associated with each dataset in this collection is archived at the Gene Expression Omnibus and describes samples, assays, data processing details and quality metrics collected for each profiling experiment.
Dataset – Release 9 Uniformly Processed Data¶
To reduce redundancy, improve data quality and achieve uniformity required for integrative analyses in the Roadmap Epigenomics Consortium paper, experiments were subjected to additional processing to obtain comprehensive data for 111 consolidated epigenomes. Numeric epigenome identifiers (EIDs; for example, E001) and mnemonics for epigenome names were assigned for each of the consolidated epigenomes. For additional details about the consolidated epigenome IDs see Supplementary Table 1, Epigenome Class Summary sheet in the Roadmap Epigenomic consortium paper. Data sets corresponding to 16 cell lines from the ENCODE project (with epigenome IDs ranging from E114 to E129) were also included in the uniformly processed dataset. To avoid artificial differences due to mappability, for each consolidated data set the raw mapped reads were uniformly truncated to 36 bp and then refiltered using a 36-bp custom mappability track to retain only reads that map to positions. Reads were also randomly subsampled to 30 million reads to ensure uniformity in the sequencing depth. Uniformly processed data sets were then merged across technical/biological replicates, and where necessary to obtain a single consolidated sample for every histone mark or DNase seq in each standardized epigenome.
In the paper we note that our analyses are an instance of a general methodological template applicable to any class of non-coding RNAs or regulatory elements. To support the broadest application of this methodological template and to enable the scientific community to reproduce and extend analyses reported here we provide a set of integrated on-line tools. For reader's convenience, the analyses reported in the paper can be reproduced using the data and integrated tools. The tools were integrated into the Epigenomic Toolset within the Genboree Workbench. Below we demonstrate reproducibility of our results and provide step-by-step instructions for performing similar analysis described in the paper using Clustering, LIMMA, Spark, and Enrichment tools.
Relationships of cell-types and lineages¶
Use case - Mapping ontogenetic pathways of cellular differentiation using the Roadmap Epigenomics Project data and the Epigenome Toolset within the Genboree Workbench¶
Regulatory elements acquire cell- and tissue-specific histone marks upon cellular differentiation, so we reasoned that histone marks over regulatory elements may provide sufficient information to discriminate major cell- and tissue types. To test this hypothesis, in Amin et. al., we performed clustering of 99 distinct cell types represented within 111 reference epigenomes from the Release 9 of the Human Epigenome Atlas. The clustering was performed using average signals of five core histone marks over eight types of regulatory elements, including enhancers, promoters, and lincRNA TSSs.
We observe that the histone marks at regulatory elements, and particularly the H3K4me1 mark at lincRNA TSSs, provide information sufficient to discriminate major cell and tissue types and indicate high degree of epigenetic regulation of cellular identity, particularly at lincRNA TSSs.
Such informative regulatory regions can be used to discriminate cell-types and track cellular identity. To demonstrate this, here we used data from the Roadmap Epigenome Project and the epigenome toolset integrated within the Genboree Workbench to show grouping of epigenomes that display similar ontogenetic development.
Epigenomic regulation of regulatory regions during differentiation¶
Use case - Assessing cellular differentiation by cluster analysis of functional genetic elements identified by epigenetic mapping¶
In Amin et al., we analyzed epigenomic programming of lincRNAs upon differentiation. We examined dynamic epigenomic footprints within the mesodermal germ lineage, using CD8+ T-cells as a representative of the lineage. The focus was on a list of variable lincRNA TSSs that showed changes in at least one histone mark along the T-cell subtree. The following three stages of cellular differentiation were analyzed: (1) Embryonic stem cell H1; (2) CD34+ Hematopoietic stem cell; and (3) Fully differentiated CD8+ T-cells. By combining chromatin marks at the three stages into a single Spark analysis we aimed to identify groups of lincRNAs TSSs that show similar trajectories of epigenetic programming, each trajectory consisting of distinct patterns of coordinated changes in histone marks as cells transition between the three stages.
We further demonstrate this type of analysis with a use case exploring the epigenomic programming of enhancers during Myeloid differentiation.