Index by title

Batch Download of Atlas Files

Coming soon!


Overview

Introduction

The exRNA Atlas contains a number of different analysis tools for analyzing Atlas RNA-seq data:

Below, we will demonstrate how to use these tools on Atlas data and see your analysis results in the Atlas.

Overview of Analysis Tools

Before we begin describing how to use the analysis tools, we'll go over what each tool does in more detail.
Currently, all analysis tools work solely with RNA-seq profiles.

DESeq2 Dimensionality Reduction Plotting Tool Generate Summary Report

Viewing Public Analysis Results

Before running your own analyses, you may be interested in viewing the Atlas' public analysis results.

To view the Atlas' public analysis results, you can click the Analysis Results button in the Atlas navigation bar and then click the Public Analysis Results button.
You will then be taken to a page where you can click between different tabs, each corresponding to a different tool.

When you click a given tab, you will see the public analysis results associated with that tool:

You can see an example of the public analysis results page below:

To better understand the output for a given tool, please see the "Understanding Your DESeq2 Results", "Understanding Your Dimensionality Reduction Plotting Tool Results", and "Understanding Your Generate Summary Report Results" sections below.

Running Your Own Analyses

Step 1: Selecting Your Samples of Interest

The first step to running an analysis is selecting your samples of interest.
We recommend using the faceted charts or selecting a dataset from the Datasets page to select your samples (all tools may not be available for other types of grids).

Below, you can see an example of how one would select samples via the faceted charts:

And here is an example of how one would select a set of samples via the Datasets page:

After you have generated your grid, you will need to select the specific samples you want to analyze.

Below, you can see an example where I've selected 4 samples in my samples grid:

Step 2: Selecting and Running a Analysis Tool

After you've selected your samples, you'll need to pick out a tool to run on those samples.
You can click the "Analyze Selected Samples" button to see available tools.

After choosing a tool, you will be prompted to log into your Genboree account (unless you are already logged in).

After you've logged in, you'll be prompted to provide settings for your analysis run.
  1. First, you'll need to select a Group and Database in which to store your output files.
    Each Genboree account starts with a Group (named after your username), and we will offer to create a Database for you (named "Exrna-atlas Output") if you don't have one.
  2. Next, you'll need to provide an Analysis Name for your analysis run - this name will be used to organize your analysis results, so picking an informative name is a good idea!
  3. Finally, some tools will require additional settings - for example, DESeq2 will require you to put in a factor name and two factor levels of interest.

When you're ready to submit your analysis, click the Submit Analysis button.
After a moment, you will be provided an analysis job ID. You will receive an email when your analysis run is complete.

Step 3: Viewing Your Analysis Results

To view your analysis results, you can click the Analysis Results button in the Atlas navigation bar and then click the My Analysis Results button.
You will then be taken to a page where you can click between different tabs, each corresponding to a different tool.

When you click a given tab, you will see any analysis results associated with that tool:

You can see an example of an analysis results page below:

To better understand the output for a given tool, please see the "Understanding Your DESeq2 Results", "Understanding Your Dimensionality Reduction Plotting Tool Results", and "Understanding Your Generate Summary Report Results" sections below.

Understanding Your Results

Understanding Your DESeq2 Results

When you click to view your DESeq2 results, a new page will open up containing differentially expressed miRNAs for the selected Atlas data.
Each row corresponds to a given miRNA, and each column is explained below:

[1] Love, M. I., Anders, S., Kim V., & Huber W. (2017, Aug 9). RNA-seq workflow: gene-level exploratory analysis and differential expression.
Retrieved from http://www.bioconductor.org/help/workflows/rnaseqGene/

By default, the table is sorted by adjusted p-value, but you can sort by any of the columns.
In addition, you can perform downstream analysis on selected miRNAs of interest by clicking the Analyze Selected miRNAs button (highlighted in red below) above the table.

See descriptions of all available downstream analysis tools below.

Pathway Finder

You can see what the Pathway Finder interface looks like below:

Understanding Your Dimensionality Reduction Plotting Tool Results

When you click to view your Dimensionality Reduction Plotting Tool results, a new page will open up containing an interface for visualizing the expression of different ncRNAs in the selected Atlas data.
On the left side of the screen, you will see the Control Panel and Filtering Panel that allow you to configure your visualization.

Within the Control Panel, you will see the following settings: Within the Filtering Panel, you will see the following settings:

After you've selected your settings, you can click the Make New Plot button on the right side of the screen to generate a new visualization based on your current Control Panel and Filtering Panel settings.
You can then download a PDF of your current visualization by clicking the Download Plot button.

Understanding Your Generate Summary Report Results

When you click to view your Generate Summary Report results, you will download an archive containing a variety of summary files describing the selected Atlas data.
Descriptions of the summary files can be found below:

File Name Description of File
QC Data
[analysisName]_exceRpt_DiagnosticPlots.pdf All diagnostic plots automatically generated by the tool
[analysisName]_exceRpt_readMappingSummary.txt Read-alignment summary including total counts for each library
[analysisName]_exceRpt_ReadLengths.txt Read-lengths (after 3' adapters/barcodes are removed)
[analysisName]_exceRpt_QCresults.txt QC statistics for all samples
Raw Transcriptome Quantifications
[analysisName]_exceRpt_miRNA_ReadCounts.txt miRNA read-counts quantifications
[analysisName]_exceRpt_tRNA_ReadCounts.txt tRNA read-counts quantifications
[analysisName]_exceRpt_piRNA_ReadCounts.txt piRNA read-counts quantifications
[analysisName]_exceRpt_gencode_ReadCounts.txt gencode read-counts quantifications
[analysisName]_exceRpt_circularRNA_ReadCounts.txt circularRNA read-count quantifications
[analysisName]_exceRpt_biotypeCounts.txt biotype read-count quantifications
[analysisName]_exceRpt_exogenous_miRNA_ReadCounts.txt exogenous miRNA read-counts quantifications
Normalized Transcriptome Quantifications
[analysisName]_exceRpt_miRNA_ReadsPerMillion.txt miRNA RPM quantifications
[analysisName]_exceRpt_tRNA_ReadsPerMillion.txt tRNA RPM quantifications
[analysisName]_exceRpt_piRNA_ReadsPerMillion.txt piRNA RPM quantifications
[analysisName]_exceRpt_gencode_ReadsPerMillion.txt gencode RPM quantifications
[analysisName]_exceRpt_circularRNA_ReadsPerMillion.txt circularRNA RPM quantifications
[analysisName]_exceRpt_exogenous_miRNA_ReadsPerMillion.txt exogenous miRNA RPM quantifications
Exogenous Genomic Taxonomies
[analysisName]_exceRpt_exogenousGenomes_taxonomyCumulative_ReadCounts.txt cumulative taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousGenomes_taxonomyCumulative_ReadsPerMillion.txt cumulative taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousGenomes_taxonomySpecific_ReadCounts.txt specific taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousGenomes_taxonomySpecific_ReadsPerMillion.txt specific taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousGenomes_TaxonomyTrees_aggregateSamples.pdf visualized taxonomy tree for samples, aggregated
[analysisName]_exceRpt_exogenousGenomes_TaxonomyTrees_perSample.pdf visualized taxonomy trees for each sample
Exogenous rRNA Taxonomies
[analysisName]_exceRpt_exogenousRibosomal_taxonomyCumulative_ReadCounts.txt cumulative taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousRibosomal_taxonomyCumulative_ReadsPerMillion.txt cumulative taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousRibosomal_taxonomySpecific_ReadCounts.txt specific taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousRibosomal_taxonomySpecific_ReadsPerMillion.txt specific taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousRibosomal_TaxonomyTrees_aggregateSamples.pdf visualized taxonomy tree for samples, aggregated
[analysisName]_exceRpt_exogenousRibosomal_TaxonomyTrees_perSample.pdf visualized taxonomy trees for each sample
R Objects
[analysisName]_exceRpt_smallRNAQuants_ReadCounts.RData All raw data (binary R object)
[analysisName]_exceRpt_smallRNAQuants_ReadsPerMillion.RData All normalized data (binary R object)
Other
[analysisName]_exceRpt_sampleGroupDefinitions.txt Information about sample groups (not used by Atlas)

Below, you can see some example plots from the Diagnostic Plots PDF referenced above.


Overview

Comparative and Downstream Analysis of Samples Using the Genboree Workbench

Step 1: Selecting Your Samples of Interest

Step 2: Selecting Your Tool

Step 3: Running Your Tool


Creating an Archive

Using GUI-based programs

Using Command Line (Terminal)

cd C:/Users/John/Desktop/Submission
cd /home/myHome/myDir/DataFiles/

Creating a .zip Archive

zip -X test_data.zip *.fq.gz
zip -X test_data.zip *.fq.gz mySpikeInFile.fasta

Creating a .tar.gz Archive

tar -cvzf test_data.tar.gz *.fq.gz mySpikeInFile.fasta

N/A


Creating Your FTP Account

Step 1. Create Your Genboree Account


Step 2. Contact the exRNA Team to Get an FTP Account

Summary


N/A


Common Fund exRNA Communication Consortium (ERCC) Data Sharing and Access Policy

Revised December, 2015

The ERCC. The ERCC is a community resource project designed to catalyze exRNA research activities in the scientific community. Thus, data are shared with the scientific community PRIOR to publication. In pre-publication data sharing, the desire to share data widely with the scientific community must be balanced with the desire for the data generators to have a protected period of time to analyze and publish the data they have produced.

ERCC Data Sharing Policy. The following policy has been developed to address this balance. By accessing pre-publication ERCC data, users agree to adhere to these policies and to follow appropriate scientific etiquette regarding collaboration, publication, and authorship.

The entity responsible for ERCC data deposition is the ERCC Data Management and Resource Repository (DMRR). All data are date stamped by the DMRR upon receipt from the data producers. The DMRR processes all ERCC data through consortium-approved analysis pipelines to ensure that the data are processed in a uniform fashion.
ERCC Pre-publication Data Sharing.
Users of the pre-publication ERCC data agree to a protected period (embargo) of 12 months AFTER the DMRR date stamp.

By requesting and accepting any released ERCC dataset, the user:

Researchers wishing to publish on datasets prior to the expiration of the embargo should discuss their plans with the data generator(s) and must obtain their consent prior to using the unpublished data in their individual publications or grant submissions.

Following expiration of the embargo period, any investigator may submit manuscripts or make presentations without restriction, including integrated analyses using multiple unrestricted datasets.

Proper Citation of the Datasets Used. Researchers who use ERCC datasets in oral presentations or publications are expected to cite the Consortium in all of the following ways:

Data Quality Metrics. The consortium is still in the process of developing consensus data quality metrics for different assay types so that data users will have a sense of the relative quality of a given data set. We encourage the scientific community to use these pre-publication datasets, however users should be aware that final determinations concerning the quality of a given dataset might not become clear until the consortium performs an integrative analysis of all the data produced by the ERCC.

Unrestricted-Access and Controlled-Access Datasets. The ERCC will generate both unrestricted-access (e.g. GEO) and controlled-access datasets (e.g. dbGaP). Currently only unrestricted-access datasets are available. Once controlled-access ERCC datasets become available, we will update this link and describe in more detail how they can be accessed through dbGaP (http://www.ncbi.nlm.nih.gov/gap).

Questions? Please contact the exRNA Team (brl-exrna at bcm dot edu).



Introduction to the ERCC Data Coordination Center

The Data Coordination Center (DCC) for the Extracellular RNA Communication Consortium (ERCC) is led by Prof. Aleksandar Milosavljevic
at the Bioinformatics Research Laboratory, Baylor College of Medicine, Houston, TX, USA.

These are some of the key functions of the DCC:

DCC Services

Genboree Account

If you are a new user, please follow the steps below to obtain a Genboree account and access to all associated services.

  1. Sign up for a Genboree Account: You can sign up for a new Genboree account at http://www.genboree.org/. Click the Login/Register button in the top right corner and then select New Account from the dialog. Fill out the registration form with your details and hit Submit. You'll get an email asking you to confirm (typical signup/verification process).
  2. Log into the Genboree Commons and GenboreeKB: Next, you will need to sign in once to the Genboree Commons (used for exRNA related communications) and GenboreeKB (used for navigating exRNA metadata). You should use the username and password obtained from Step 1. Signing in once allows our system to recognize you so we can add you to the appropriate projects/sub-projects. Sign into the Genboree Commons at http://genboree.org/theCommons/login and the GenboreeKB at http://genboree.org/genboreeKB/login.
  3. Email the BRL exRNA Team: Finally, you will need to email BRL to gain access to the appropriate projects/sub-projects on the Genboree Commons and GenboreeKB. We will also provide a dedicated, shared directory for your lab on our FTP server so that your lab can upload submissions for the DMRR data and metadata processing pipeline. Please include your Genboree username and PI when you email us.

What Can I Do with exRNA Profiling Data?


The exRNA Atlas

The exRNA Atlas is the data repository of the ERCC. It includes exRNA profiles derived from various biofluids and conditions and currently stores data profiled from small RNA sequencing assays and RT-qPCR assays.
To learn more about the Atlas, you can read our tutorials:

Submitting Your Data to the Atlas

You can also learn more about submitting your own data to the Atlas via our Data Submission to DCC using FTP Wiki page.

Information About Atlas Metadata

All Atlas metadata is stored in the Genboree KnowledgeBase, a MongoDB-backed database curation service.
Our metadata models follow the exRNA Metadata Standards developed by the Metadata and Data Standards (MADS) Working Group of the ERCC.

Analyzing Your Own exRNA Data

If you'd like to analyze your own data using the tools developed by the ERCC, you can use the Genboree Workbench to do so.
The Genboree Workbench is a web-based platform for performing data analysis. You can upload your data and perform various analyses using a "drag and drop" user interface.
To get started using the Genboree Workbench, you can view our collection of introductory materials.

exRNA Tools

Once you understand the basics of using the Workbench, you can start using the different ERCC tools to analyze your exRNA data:

DMRR/DCC Demos at Meetings

Contact Us - Members of the DCC

Prof. Aleksandar Milosavljevic - Principal Investigator
BRL Team - Point Person


Data Submission to dbGaP

The database of Genotypes and Phenotypes (dbGaP) was developed to archive and distribute the data and results from studies that have investigated the interaction of genotype and phenotype in Humans.
The ERCC Data Coordination Center developed this wiki to guide ERCC members on how to submit their data to dbGaP or GEO, after they have submitted their data to the exRNA Atlas.

To submit your data to dbGaP, follow these six steps:
1. Register the study
2. Fill out study config
3. Create phenotype data
4. Create sequence metadata file
5. Upload sequence file
6. Confirm and release the study
Please contact the ERCC DCC at brl-exrna@bcm.edu if any assistance is needed and we can help with steps 4-6.
We will need to be assigned as submitter for the study (the PI will have the option to do so after the study has been registered), and a completed submission to the exRNA Atlas.

Full Submission Guide From dbGaP

Full submission guide

Understanding the Process of Data Submission to dbGaP

Submission overview

Register Your Study

Finding the Genomic Program Administrator (GPA) and registering the study.

Fill Out the Study Config

What is the Study Config?
Here is a study config file with required areas highlighted in yellow.

Fill Out the Phenotype Data

Subject Consent Files
Sample Mapping Files
Pedigree Files
Subject Phenotypes Files
Sample Attributes Files

Fill Out the Sequence Metadata File

Once the previous files have been validated by dbGaP, the dbGap curator will reach out and provide the sequence metadata file to be filled and returned.

Upload Sequence File

The sequence metadata file will have to be validated by dbGaP first and then the dbGaP curator will send the information on where to submit the sequence files.

Confirm and Release the Study

The dbGaP curator will provide preview of the study and make sure everything is correct prior to release it on dbGaP.


Overview of Data & Metadata Submission to the DCC (via FTP Pipeline)

This Wiki page includes instructions on how to submit your data (with accompanying metadata) to the Data Coordination Center (DCC)
using the Genboree FTP Data Submission Pipeline.

If you're submitting small RNA-seq data, please follow the steps in the "Small RNA-seq Data Submission Pipeline" section.
If you're submitting long RNA-seq data, please follow the steps in the "Long RNA-seq Data Submission Pipeline" section.
If you're submitting qPCR data, please follow the steps in the "qPCR Data Submission" section.

Please contact us at brl-exrna@bcm.edu for guidance if you have a large data set (> 100GBs).

Prior to Your Submission

This tutorial will walk you through the entire process of creating an FTP account, formatting and submitting your data and metadata properly,
and then seeing your dataset on the Atlas.

Step 0: Create an FTP Account on the Genboree FTP Server

Creating Your FTP Account

Small RNA-seq Data Submission Pipeline

All submitted samples will be processed through the exceRpt Small RNA-seq Pipeline for exRNA Profiling
and exceRpt Small RNA-seq Post-processing tools.

Files Needed for Data Submission

Your submission will consist of three different files:

IMPORTANT NOTE
All three files must have the same file name prefix ("samples" is the prefix in "samples_data"). Note that the data archive file name ends in _data, the metadata archive file name ends in _metadata, and the manifest file name ends in .manifest.json.
In this illustrative example, the submission files will be named like this:

In this example, "samples" was chosen as sample name. You should give a more descriptive name to your actual submission files ("gastricCancerOct2015_data.zip", for example).

Step 1: Preparing Your Data Archive

Prepare Your Data Archive

Step 2: Preparing Your Metadata Archive

Prepare Your Metadata Archive

Step 3: Preparing Your Manifest File

Prepare Your Manifest File

Step 4: Uploading Your Submission to the FTP Server for Processing

Upload Submission to the DCC using FTP Server

Step 5: Processing Your Files

Processing Your Files

Long RNA-seq Data Submission Pipeline

Files Needed for longRNAseq Data Submission

Your submission will consist of three different files:

IMPORTANT NOTE
All three files must have the same file name prefix ("samples" is the prefix in "samples_longRNAseqdata"), other than the data archive file name ending in _longRNAseq_data, the metadata archive file name ending in _longRNAseq_metadata, and the manifest file name ending in _longRNAseq.manifest.json.
In this illustrative example, the submission files will be named like this:

In this example, "samples" was chosen as sample name. You should give a more descriptive name to your actual submission files ("gastricCancerOct2015_longRNAseq_data.zip", for example).

Step 1: Preparing Your longRNAseq Data Archive

Prepare Your longRNAseq Data Archive

Step 2: Preparing Your longRNAseq Metadata Archive

Prepare Your longRNAseq Metadata Archive

Step 3: Preparing Your longRNAseq Manifest File

Prepare Your longRNAseq Manifest File

Step 4: Uploading longRNAseq Submission to the FTP Server for Processing

Upload longRNAseq Submission to the DCC using FTP Server

Step 5: Processing Your longRNAseq Files

Processing Your longRNAseq Files

qPCR Data Submission

Files Needed for qPCR Data Submission

Your submission will consist of two or three different files:

IMPORTANT NOTE
Both files must have the same file name prefix ("samples" is the prefix in "samples_data"), other than the data archive file name ending in _qPCR_data, the metadata archive file name ending in _qPCR_metadata, and the manifest file name ending in .manifest.json.
In this illustrative example, the submission files will be named like this:

In this example, "samples" was chosen as sample name. You should give a more descriptive name to your actual submission files ("gastricCancerOct2015_qPCR_data.zip", for example).

Step 1: Preparing Your qPCR Data Archive

Prepare Your qPCR Data Archive

Step 2: Preparing Your qPCR Metadata Archive

Prepare Your qPCR Metadata Archive

Step 3: Preparing Your qPCR Manifest File

Prepare Your qPCR Manifest File

Step 4: Uploading qPCR Submission to the FTP Server for Processing

Upload qPCR Submission to the DCC using FTP Server

Step 5: Processing qPCR Your Files

Processing Your qPCR Files

Submission to a Public Repository

Controlled-access data repository:
Data Submission to dbGaP
Public-access data repository:
Data Submission to GEO

Miscellaneous Tips and Tricks

Below, you'll find some useful tips and tricks for creating your submission for the FTP Pipeline.

Creating an Archive

Creating an Archive

Learning How to Use the Terminal

If you need help navigating the terminal (and want to learn some basic Linux/OSX commands), the following link will be useful:


Gene Expression Omnibus (GEO) is a public access data repository. It is a public functional genomics data repository supporting MIAME-compliant data submissions. Array- and sequence-based data are accepted.
The ERCC Data Coordination Center developed this wiki to guide ERCC members on how to submit their data to dbGaP or GEO, after they have submitted their data to the exRNA Atlas.

GEO submission requires filling out the metadata sheet for the submission.
Please follow the instructions from the full submission guide below for small/long RNAseq or qPCR.
The ERCC DCC can also facilitate the submission, please email us at brl-exrna@bcm.edu
We will require the following:

Data Submission to GEO for Small/Long RNAseq

Full Submission Guide for Small/Long RNAseq From GEO

GEO Submission Guide for Small/Long RNA

Submission Requirements

Submit to GEO via FTP

Data Submission to GEO for qPCR

Full Submission Guide for qPCR From GEO

GEO Submission Guide for qPCR

Submission Requirements

Make sure the amount of samples matches in the metadata sheet and the two matrices

Submit to GEO via Webform


Description of Domains

Within each template, the domain column gives you information about what kinds of values can be provided for each property.
Below, we describe what each of these domains mean.

autoID

The autoID domain indicates that our server can automatically generate a value for the associated property.
However, in our case, we'll go ahead and provide our own values instead of letting the server generate the values for us.
You can just follow the directions in the metadata submission guide to learn more.

bioportalTerm and bioportalTerms

The bioportalTerm and bioportalTerms domains indicate that your value will be validated against the the ontology (or ontologies) listed in the domain.
Generally, the value won't be validated against the entire ontology - it'll be validated against a subset (subtree) of the ontology.
The best way to validate your value is to use the GenboreeKB templates provided for each metadata type.
You will learn more about this process when creating your individual metadata files.

boolean

The boolean domain indicates that your value must either be true or false. Note that true and false are case-sensitive - you can't put TRUE, trUe, falSE, etc.

date

The date domain indicates that you must insert a date. This date should follow a particular format: YYYY/MM/DD. Example values include:

enum

The enum domain indicates a group of possible values for that property. For example, the domain might look like:

The values inside the parentheses are the possible values for that property. If a property has enum(Experimental, Control) as its domain, for example,
then you must write Experimental or Control - any other value will be invalid. Note that the values ARE case-sensitive - you can't write experimental, conTrol, etc.

fileUrl

The fileUrl domain indicates that the provided value must be a URL directly pointing to a file of some kind. This URL must be complete. Example values include:

For any required properties, our metadata submission guide will give specific directions on how to fill out values for properties with this domain.

float

The float domain indicates that you must insert an float (integer / decimal) value for that property. Example values include:

floatRange

The floatRange domain specifics an (inclusive) float (decimal / integer) range under which your value must fall. For example, the domain might look like:
*floatRange(-5, 9)
*floatRange(-5.93,5.92)
*floatRange(0, 100.01)

So, if my domain is floatRange(-5,9), I can put any value between -5 and 9 (inclusive). This could be -5, -1.2, 0, 8.59, 9, or many other values.

gbAccount

The gbAccount domain indicates that the provided value should be a Genboree account name.
We will then automatically use that account name to fill in associated information.

int

The int domain indicates that you must insert an integer value for that property. Example values include:

intRange

The intRange domain specifics an (inclusive) integer range under which your value must fall. For example, the domain might look like:
*intRange(5, 9)
*intRange(-5,5)
*intRange(0, 100)

So, if my domain is intRange(5,9), that means my value must be 5, 6, 7, 8, or 9.

labelUrl

The labelUrl domain specifies a label and then a URL associated with that label. The formatting looks like: label|URL. Your URL can be relative or complete. Some example values include:

This domain can be useful because it supplies information to us about how a given website should be labeled.

measurement

The measurement domain indicates that you must insert a number followed by a valid measurement unit. For example, the domain might look like:

For a given measurement, we accept the listed unit (years) as well as any comparable (inter-convertible) units, like days, months, hours, etc.
Thus, if a property has measurement(years) as its domain, then you could write 10 years, 5 days, 3 months, 2 hours, etc. It should be a specific number and not a range.

numItems

The numItems domain indicates that the associated property is an item list. The value for the property will be the number of items in the item list.
For example, imagine I have a property, * Authors, which is an item list, and it has 5 items (*- Author Name). This means the value for the * Authors property will be 5.
We actually automatically update the value for any property with the numItems domain, so you can leave the value blank if you want.

negFloat

The negFloat domain indicates that you must insert a negative float (integer / decimal) value for that property. You can also put 0. Example values include:

negInt

The negInt domain indicates that you must insert a negative integer value (or 0) for that property. Example values include:

omim

The omim domain indicates that the value must be an ID from the OMIM database at http://omim.org/.
We will then automatically use that ID to fill in associated information for that reference.

pmid

The pmid domain indicates that the value must be an ID from the PubMed database at http://www.ncbi.nlm.nih.gov/pubmed.
We will then automatically use that ID to fill in associated information for that publication.

posFloat

The posFloat domain indicates that you must insert a positive float (integer / decimal) value for that property. You can also put 0. Example values include:

posInt

The posInt domain indicates that you must insert a positive integer value (or 0) for that property. Example values include:

regexp

The regexp domain indicates that any value for the domain must meet the specified regular expression. Example domains include:

These domains might look complicated, but our metadata submission guide will give specific directions on how to fill out values for required properties with this domain.

string

The string domain indicates that any text is acceptable (letters, numbers, etc.). Example values include:

As you can see, you can pretty much put anything!

timestamp

The timestamp domain indicates that you must insert a timestamp. This timestamp should follow a particular format: YYYY/MM/DD XX:XX AM/PM. Example values include:

url

The url domain indicates that some kind of URL must be provided as a value. This URL can either be complete or relative. Example values include:

The first example above is a relative URL, while the second example is a complete URL.
For any required properties, our metadata submission guide will give specific directions on how to fill out values for properties with this domain.

[valueless]

The [valueless] domain indicates that you cannot insert a value for that property (it must remain blank).
These kinds of properties are used as section headers, for the most part.
The property name describes the content of the subproperties nested below - thus, it's not necessary to provide a value for the property.


Downloading Datasets from the exRNA Atlas

There are several different options for downloading datasets from the exRNA Atlas.
You can either download the datasets individually (on a per-sample basis), or you can download the datasets in bulk.

Downloading Individual Core Result Archives

Take a look at the following faceted search grid (certain metadata columns are hidden for this example):

You can click the icon for any given sample to download its core results archive.
This core results archive contains all of the most important files generated by the exceRpt pipeline, including all of the read mapping documents to various libraries.

Downloading Individual Raw FASTQ Data Files

Alternatively, if you want to download the raw FASTQ data file associated with a given sample, take a look at the following faceted search grid:

You can see three different icons in the highlighted column:

Downloading Datasets in Bulk

If you want to download result files in bulk for a given search, you can click the Download Samples button at the top of the grid, as seen below:

You can then choose between four different options.

The Download All Core Result Files link will download a tab-delimited file that contains information on how to download the processed core results archives for each sample.

The Download All Result Files link will download a tab-delimited file that contains information on how to download the full results archives for each sample.
These archives can be very large (gigabytes), so we recommend that you start by downloading the core results archives (which are usually around 3-5 MB).

The Download All Raw Data Files link will download a tab-delimited file that contains information on how to download all available raw sequencing data files in FASTQ format.
These FASTQ files are only available for samples that are open access.
You can tell which samples have available FASTQ files by looking for the icon in the Download Data column.

These tab-delimited files will contain two separate columns:

There are several ways of downloading the files in your tab delimited list:

  1. You can copy and paste each URL in your browser and hit Enter to download each file in this list.
  2. For more advanced users, you can use a command line program like wget to download these files.

In order to download one of these tab-delimited files, you must agree to the ERC Consortium Data Access Policy, which pops up in a new window.
This same policy can also be found at the top of each tab-delimited file.

Downloading Metadata

The Download Metadata link in the Download Samples menu will download the biosample, donor, and experiment metadata documents associated with a single sample.
All metadata documents will be placed in a single text file.
Before downloading your metadata, you must select a single sample by using the checkboxes to the left of each sample in the grid.
Multiple sample selection is currently not allowed.


Downloading Data and Metadata from the exRNA Atlas

There are several different options for downloading data from the exRNA Atlas.
You can either download data on an individual, sample-by-sample basis, or you can download data in bulk.

Downloading Individual Core Result Archives

Take a look at the following faceted search grid (certain metadata columns are hidden for this example):

You can click the icon for any given sample to download its core results archive.
This core results archive contains all of the most important files generated by the exceRpt pipeline, including all of the read mapping documents to various libraries.

Downloading Individual Raw FASTQ Data Files

Alternatively, if you want to download the raw FASTQ data file associated with a given sample, take a look at the following faceted search grid:

You can see three different icons in the highlighted column:

Downloading Datasets in Bulk

If you want to download result files in bulk for a given search, you can click the Download Samples button at the top of the grid, as seen below:

You can then choose between four different options.

The Download All Core Result Files link will download a tab-delimited file that contains information on how to download the processed core results archives for each sample.

The Download All Result Files link will download a tab-delimited file that contains information on how to download the full results archives for each sample.
These archives can be very large (gigabytes), so we recommend that you start by downloading the core results archives (which are usually around 3-5 MB).

The Download All Raw Data Files link will download a tab-delimited file that contains information on how to download all available raw sequencing data files in FASTQ format.
These FASTQ files are only available for samples that are open access.
You can tell which samples have available FASTQ files by looking for the icon in the Download Data column.

These tab-delimited files will contain two separate columns:

There are several ways of downloading the files in your tab delimited list:

  1. You can copy and paste each URL in your browser and hit Enter to download each file in this list.
  2. For more advanced users, you can use a command line program like wget to download these files.

In order to download one of these tab-delimited files, you must agree to the ERC Consortium Data Access Policy, which pops up in a new window.
This same policy can also be found at the top of each tab-delimited file.

Downloading Metadata

The Download Metadata link in the Download Samples menu will download the biosample, donor, and experiment metadata documents associated with a single sample.
All metadata documents will be placed in a single text file.
Before downloading your metadata, you must select a single sample by using the checkboxes to the left of each sample in the grid.
Multiple sample selection is currently not allowed.


Downloading Data from the exRNA Atlas

There are several different options for downloading data from the exRNA Atlas.
You can either download data on an individual, sample-by-sample basis, or you can download data in bulk.

Downloading Individual Core Result Archives

Take a look at the following faceted search grid (certain metadata columns are hidden for this example):

You can click the icon for any given sample to download its core results archive.
This core results archive contains all of the most important files generated by the exceRpt pipeline, including all of the read mapping documents to various libraries.

Downloading Individual Raw FASTQ Data Files

Alternatively, if you want to download the raw FASTQ data file associated with a given sample, take a look at the following faceted search grid:

You can see three different icons in the highlighted column:

Downloading Datasets in Bulk

If you want to download result files in bulk for a given search, you can click the Download Samples button at the top of the grid, as seen below:

You can then choose between four different options.

The Download All Core Result Files link will download a tab-delimited file that contains information on how to download the processed core results archives for each sample.

The Download All Result Files link will download a tab-delimited file that contains information on how to download the full results archives for each sample.
These archives can be very large (gigabytes), so we recommend that you start by downloading the core results archives (which are usually around 3-5 MB).

The Download All Raw Data Files link will download a tab-delimited file that contains information on how to download all available raw sequencing data files in FASTQ format.
These FASTQ files are only available for samples that are open access.
You can tell which samples have available FASTQ files by looking for the icon in the Download Data column.

These tab-delimited files will contain two separate columns:

There are several ways of downloading the files in your tab delimited list:

  1. You can copy and paste each URL in your browser and hit Enter to download each file in this list.
  2. For more advanced users, you can use a command line program like wget to download these files.

In order to download one of these tab-delimited files, you must agree to the ERC Consortium Data Access Policy, which pops up in a new window.
This same policy can also be found at the top of each tab-delimited file.

Downloading Metadata

The Download Metadata link in the Download Samples menu will download the biosample, donor, and experiment metadata documents associated with a single sample.
All metadata documents will be placed in a single text file.
Before downloading your metadata, you must select a single sample by using the checkboxes to the left of each sample in the grid.
Multiple sample selection is currently not allowed.


Overview

Introduction to the exRNA Atlas

The exRNA Atlas is the data repository of the Extracellular RNA Communication Consortium (ERCC), which includes small RNA sequencing and qPCR-derived exRNA profiles from human and mouse biofluids.
All RNA-seq datasets are processed using version 4 of the exceRpt small RNA-seq pipeline and ERCC-developed quality metrics are uniformly applied to these datasets.

There are two different versions of the exRNA Atlas:

If you are interested in submitting data to the Atlas, visit the Data & Metadata Processing Guide page to learn more about the submission process.

Selecting Profiles

Using the ncRNA Search Bar

Faceted Charts

Viewing Selected Biosamples in Grid via Faceted Charts

Biosample Partition Grids

Viewing Biosamples in Biosample Partition Grid

Drill-down Sub-setting of Biosamples via Linear Tree

Viewing Selected Biosamples in Grid via Linear Tree

Downloading Profile Data and Metadata from the exRNA Atlas

Downloading Data and Metadata from the exRNA Atlas

Viewing exRNA Profiling Datasets

Viewing exRNA Profiling Datasets

Viewing Atlas Statistics

Viewing Atlas Statistics

Running Analyses and Viewing Analysis Results Using the exRNA Atlas

Running Analyses and Viewing Analysis Results Using the exRNA Atlas

BedGraphs

BedGraphs are publicly accessible, base pair level coverage maps of the genome and are present for every sample in the exRNA atlas. You can find them inside the CORE_RESULTS archives for any sample within a study (studies are defined by an accession such as EXR-TEST1-AN) . There will be 3 bedGraph files you can use

Tools

Data Slicing

You can select regions of interest across the genome and samples of interest across any study present in the atlas and perform "data slicing" and retrieve a matrix with the coverage of your regions (rows) per sample (columns) by using the downloadable exRNA Data Slicer tool found here.

Genome browser

You can view which regions are detected in the atlas using the UCSC genome browser. These coverage files have been split by biofluid and library preparation kit i.e. you can see regions of the genome where at least one plasma samples processed by the TruSeq library preparation kit has reads. We provide two coverage cut offs: 1 read and 5 reads. Files can also be downloaded here.

RNA binding proteins (RBPs)

1) For each study, you can view reads that fall into a give RBP's binding sites across samples. You can find these in the postProcessedResults files. Through the atlas datasets page, you can download All Summary Files using the download icon in the bottom right of each dataset card or you can access them through the FTP. There is a folder name _intersect_individual_RBP.combined_samples.tgz which houses the RBP coverage files for that study.

2) For each sample, you can look at coverage of reads that fall into all 150 RBPs. On the atlas, you can select samples in the sample viewer and download the Core Results Archives - inside the fastq folder there will be a endogenousAlignments_genome_Aligned_intersect_individual_RBP.tgz folder which houses the 96 files for each sample. These regions have been intersected so if RBP A binds to chromosome 1, 1:10 and RBP B binds to chromosome 1, 5:15 then three regions will be created 1:5, 5:10, and 10:15. In these files, the rows are the overlapping regions and the columns are for each RBP.

exRBPs

Data for the exRBPs that have been intersected with the atlas data is available in forms

1) For each study, you can view reads that fall into a give RBP's binding sites across samples. You can find these in the postProcessedResults files. Through the atlas datasets page, you can download All Summary Files using the download icon in the bottom right of each dataset card or you can access them through the FTP. There is a folder name _intersect_individual_RBP.combined_samples.tgz which houses the RBP coverage files for that study.

2) For each sample, you can look at coverage of reads that fall into all 150 RBPs. On the atlas, you can select samples in the sample viewer and download the Core Results Archives - inside the fastq folder there will be a endogenousAlignments_genome_Aligned_intersect_individual_RBP.tgz folder which houses the 96 files for each sample. These regions have been intersected so if RBP A binds to chromosome 1, 1:10 and RBP B binds to chromosome 1, 5:15 then three regions will be created 1:5, 5:10, and 10:15. In these files, the rows are the overlapping regions and the columns are for each RBP.

Explorer Tool

The exRNA Atlas Explorer tool allows you to visualize the RBPs across any dataset or sets of datasets in the atlas. The tool is available here


Learn More About the exceRpt small RNA-seq Data Analysis Pipeline

exceRpt Homepage

Genboree Tutorial for Using exceRpt

Understanding Your exceRpt Results

exceRpt Version Updates


Overview

Introduction to exRNA Metadata Standards

The infographic below will give you a better sense of how the different documents in the exRNA GenboreeKB relate to one another.

As an example, we see that any document in the "Study" collection will have a connection to a Submission document in its "Related Submissions" item list.
In other words, if you have a "Study" document, you must have a related "Submission" that the "Study" document falls under. Connections between collections
are made apparent through the use of red arrows and the red text within each collection's attributes ("Related Submission" for the "Study" collection, for example).
Note that the attribute list given in the infographic is merely a summary - you can look at the respective schema / templates for each collection below
to get a full list of the different properties that a given document within that collection will contain.

Finally, the box in the lower right corner of the infographic gives some information about how each document is named.
More details about how individual documents are named can be found in the exRNA Metadata Documents Accession section below.

Preparing Metadata Documents

Refer to the Prepare your Metadata Archive Wiki for more details.


GenboreeKB exRNA Metadata Tracking System

If you want to learn more about how the exRNA GenboreeKB works, you should check out the introductory materials here.

Below, you'll see some key features of our exRNA GenboreeKB Metadata Tracking System:

GenboreeKB = Multiple Collections of Documents

GenboreeKB exRNA Metadata Tracking System - Navigating the Metadata UI

Overview

To learn the basics of GenboreeKB, view the documentation found here.
In brief, we use GenboreeKB to store the metadata documents associated with samples present in the exRNA Atlas.
The GenboreeKB UI allows you to view those documents. It also allows you to edit documents, find ontology terms for properties, and
experiment with different documents while assembling your metadata submission for the FTP submission pipeline.

Each GenboreeKB is associated with a different group of metadata documents.
There are three different relevant KBs:
  1. Public Atlas KB
  2. Private Atlas KB
  3. "Testing Ground" Scratch KB

Step-by-step Instructions to Navigate to the Relevant GenboreeKB

In order to better understand the collections you will be browsing, refer to the Wiki page exRNA Metadata Standards.

1. Login

2. Navigate to the Relevant KB

Each Atlas (public and private) has its own GenboreeKB Project.
In order to navigate to the public Atlas, click the 'Extracellular RNA Atlas' project.

In order to navigate to the private Atlas (if you're an ERCC member), expand the 'exRNA Metadata Standards' project
and select the 'Extracellular RNA Atlas - Consortium' subproject. You can also select the "Testing Ground" Scratch KB
by selecting the 'exRNA Metadata - Templates' subproject.

Regardless of which KB you choose, click the 'GenboreeKB' button at the top of the page to navigate to the GenboreeKB UI.

3. View General Stats About the Current KB

When you enter a given KB, you will see a summary page consisting of several charts and graphs.
These diagrams will contain general statistics about that KB, such as number of docs per collection,
total number of docs over time, and number of doc edits over time.

4. Select a Metadata Collection

At the top of the KB UI, there will be a Collection menu that will allow you to choose between the different collections for that KB.
Each collection has its own unique document model and set of documents.
We can see an example of the available collections for the private Atlas (as of 6/16/16) in the picture below:

For example, all biosample documents can be found in the Biosamples collection.
After we select a collection (Biosamples, for example), we'll be given statistics on that collection, as seen below:

After you have selected your collection of interest, your next action will depend on what you want to accomplish.
Do you want to browse the existing documents, or edit an existing document, or add a new document?
We will explain how to complete these tasks below.

Creating a New Metadata Document

Once you've selected your metadata collection, you might want to create a new document.
You should only create a new metadata document using the Testing Ground Scratch KB.
You should not create any new metadata documents in the private Atlas or public Atlas.

Each document you create will have its own, unique document identifier (doc ID).
You can either create your own doc ID, following a collection-specific format described below,
or you can allow the GenboreeKB UI to automatically generate your doc ID for you.

If you want to create your own doc ID, follow the directions in the Creating a Valid Document Identifier section.

Please note that if the KB UI automatically generates your doc ID, that ID will not contain your PI ID (a necessary part of any doc ID that goes into the Atlas).
However, the FTP Pipeline will automatically insert this PI ID for you when processing your documents, so the final version that ends up in the private or public Atlas
will contain the PI ID. In other words, don't worry about the fact that your auto-generated doc ID doesn't include your PI ID!

Creating a Valid Document Identifier

If you would prefer to have the GenboreeKB UI automatically generate your doc ID, you can ignore this section.
All identifiers must begin with EXR-, regardless of collection.
Then, you should provide your PI ID followed by 6 alphanumeric characters (numbers and capital/lowercase letters).
Your PI ID can be found in a couple of different ways:

Finally, you will need to write another dash (-) followed by the collection suffix associated with your collection.
A table containing collection types, suffixes, and example identifiers can be found below:

Examples
Type Suffix Example Accession
Biosample BS EXR-KJENS12P3L78-BS
Donor DO EXR-KJENS12P3L78-DO
Experiment EX EXR-KJENS12P3L78-EX
Analysis AN EXR-KJENS12P3L78-AN
Submission SU EXR-KJENS12P3W78-SU
Run RU EXR-KJENS12P3W78-RU
Study ST EXR-KJENS12P3L78-ST
File FL EXR-KJENS12P3L78-FL

Your identifier must also be unique - no other document in that collection can have the same identifier.

Creating a New Document Through the UI

There are three different options for creating a new document through the UI. They can be seen below:

The most basic option is to create your metadata document without a template or questionnaire.
When you select this option, you will be prompted to provide a doc ID.
You can either provide your own doc ID (explained above) or leave the entry box blank and click OK.
If you leave the entry box blank, the doc ID will be automatically generated for you once you save the document.
When you create a document using the most basic option, only required properties will be present in the document initially.
You can always add other, optional properties though!

You can also use a template to create your document (if the collection has templates available).
Select the second option highlighted in the red box above and then choose the template you want to follow.
The template will contain all required properties as well as any recommended optional properties.

Finally, you can use a questionnaire to create your document (if the collection has questionnaires available).
Select the third option highlighted in the red box above and then choose the questionnaire you want to use.
By answering the series of questions presented, you will fill out the required fields in your document.
You will then only have to fill out any optional fields you want to include.

Uploading a New Metadata Document

You don't need to use the UI to create a new metadata document - you can also upload a new, previously-made document.
Click the "Upload Documents" button near the top of the GenboreeKB panel.
You will then find the document you want to upload by clicking "Select File...".
If you are using the templates and other materials provided on this Wiki for creating documents, you should choose
the "TABBED - Compact Property Names" format.
Click "Upload" and then wait until you receive an email informing you that your document was successfully uploaded.
If the document fails validation, you will receive information in your email telling you how to fix your document.

Finding an Existing Metadata Document

If you want to find an existing metadata document (instead of creating a new one),
you can either use the search toolbar in the top right corner of the UI window, or you can
query the collection.

Using the Search Toolbar

The most straight-forward way of finding a document is to use the search toolbar.

If you know the doc ID of the document you're looking for, you can simply type it into
the search bar. You can also type part of the ID, and all matching results will show up.
For example, if I was interested in documents from the PI ID AMILO1, I could type
AMILO1 into the search bar and see a list of documents from AMILO1 in that collection.

Clicking the downward arrow to the right of the search bar will bring up your list of results
in case you search a given term and then click elsewhere, thus minimizing the list.
If the search bar is blank and you click this arrow, a list of random documents will be
displayed. This is useful if you don't know what you want to search for or don't understand
the doc ID format for a particular collection.

Please note that if there are many documents that match your search term, not all will be
listed. Thus, you'll need to use a different search feature (like the query described below)
in order to view a list of all matching documents.

Querying the Collection

Another way of finding a document of interest is using the query functionality found here:

There will be a number of different options in the dialog window:

For the Query option, you can choose between Document ID and Indexed Properties. For the Mode option, you can choose between Exact, Full, Keyword, and Prefix. For the View option, you can choose between different views that have been created by the DCC administrators for that collection.

For the Term option, you should write your search term.

When you click Submit, you can choose to see your search results in the current tab or in a new tab.

Viewing a Metadata Document

Once you've selected a metadata document, you'll be able to see its contents in the GenboreeKB UI window.
In particular, each document starts off "minimized", with only the root property and its immediate sub-properties displayed.
In order to see all of the sub-properties in a given document, right click on the root property ("Biosample" in the example below)
and click "Fully Expand". You can also right click a sub-property and click "Fully Expand" if you only want to expand that sub-property.
You can also click "Fully Collapse" if you want to minimize a given sub-property (or the doc as a whole).

Here, we see a document that has not been fully expanded:

Now, the document has been fully expanded:

Editing a Metadata Document

Now that you're viewing a metadata document, you might want to edit some properties, add new properties, etc.
The first thing you need to do is select the Edit option for the document, shown below:

In order to edit an existing property, all you need to do is double click the value for that property.
The possible values for a property depend upon that property's domain.
For example, if a property has a domain of string, you can pretty much write anything.
If a property has a domain of enum(a, b, c), you will only be able to pick a, b, or c.
Finally, if a property has a domain of bioPortal(...) or bioPortals(...), your value will be enforced by the ontologies listed in the domain.
To learn more about this feature, see the Dynamic Retrieval of Bioportal Ontology Terms section below.

You can view the domain for a given property by viewing the document model.
You can learn more about document models below.

Adding a new property is also easy.
Each property in a given metadata document is a child property (or subproperty) of another, parent property.
The only exception is the root property, which is the document identifier.
For example, in my biosample document, "Species" is a subproperty of "Biological Sample Elements", and "Scientific Name" is a subproperty of "Species".

You can add a new subproperty by right clicking on a given property and then clicking the "Add" button:

You are then presented with a list of valid subproperties that aren't already present in your document.
Choose the subproperty you want to add (I chose "Common Name") and then click "Update" to add the subproperty.

In order to see all of the different subproperties (so that you can properly build your document), you'll need to look at the document model.

Dynamic Retrieval of Bioportal Ontology Terms

While editing your document(s), you will most likely come across properties with a domain of "bioportalTerm" and/or "bioportalTerms".
These properties use a look ahead search field to dynamically retrieve ontology terms from Bioportal.
The search is performed on both the inputted term as well as synonyms for that term.
When entering a value for these properties, enter at least three characters to begin your search within the ontologies mentioned in the property's domain.
Once you see an appropriate value, select it and then confirm your choice by clicking the "Update" button.

Saving a Metadata Document

Once you're done editing your document, you can save it by clicking the "Save" button in the upper left corner of the GenboreeKB panel.

Before we finish saving your document, we will validate it to make sure that all required properties are present and all values are valid.
If you receive an error message when you try to save your document, follow the directions in that error message to correct your document.
Otherwise, if your document is valid, you will receive confirmation that the document was saved successfully.

Downloading Metadata Document(s)

There are three different ways to download docs in the GenboreeKB UI.
First, you can download an entire collection of docs at once. For example, if you want to download all of the docs in the Biosamples collection, you would use this option.
Second, you can download a single doc that you've opened in the UI. If you just want to grab one doc (maybe a single Biosample doc), you would use this option.

You can see both of these options in the image below:

After you click either of the buttons, you'll have to select the format in which you'd like to receive your docs.
We recommend "Tabbed - Compact Property Names", since that's the format the FTP Pipeline accepts as valid input.
You could also pick the "Tabbed (Multi) - Compact Property Names" option if you are downloading an entire collection.
Currently, the FTP Pipeline only accepts this format for Biosample docs.
If you'd like to use this format for your own submission to the Atlas, downloading a collection in this format can be instructive for learning what the format looks like.
That way, you can construct your own Biosample submission in the proper way.

The third way to download docs is through the query feature highlighted above.
Simply perform a query and then click the green download icon in the toolbar to download all of the docs that are included in that query.

Viewing a Metadata Model

Each collection has its own document model.
This document model dictates the structure of the documents inside the collection.
Each document must conform to the rules set in the model.
For example, if the model states that a certain property is required, a document will not be valid unless it contains that property.
When we're building documents, the model is valuable because it tells us all of the different possible properties available for a document in the associated collection.
This will help us figure out which properties we need to add to our own document.

In order to see the document model associated with a given collection, click the "View Model" button as indicated below:

You can download a currently selected document model by clicking the green download icon highlighted in the above picture.
To learn more about what the different columns in the document model represent, you can check out the Data Model Schema page.
To see a full list of the different possible domains in GenboreeKB, click here.
To see a smaller list that contains explanations of some of the less intuitive domains, click here.


TABLE 1: List of Units supported by GenboreeKB

This table provides a list of all units that are currently supported by GenboreeKB.

Unit Name Display Name Aliases Kind Scalar Value Definition
<gee> xG ["gee", "standard-gravitation", "xG", "xg"] acceleration 196133/20000 ["<meter>"]/["<second>", "<second>"]
<katal> kat ["kat", "katal"] activity 1 ["<mole>"]/["<second>"]
<unit> U ["U", "enzUnit", "units", "unit"] activity 1/60000000 ["<mole>"]/["<second>"]
<degree> deg ["deg", "degree", "degrees"] angle 0.0174532925199433 ["<radian>"]/["<1>"]
<grad> grad ["grad", "gradian", "grads"] angle 0.015707963267949 ["<radian>"]/["<1>"]
<radian> rad ["rad", "radian", "radians"] angle 1 ["<radian>"]/["<1>"]
<rotation> rotation ["rotation"] angle 6.28318530717959 ["<radian>"]/["<1>"]
<rpm> rpm ["rpm"] angular_velocity 0.10471975511966 ["<radian>"]/["<second>"]
<acre> acre ["acre", "acres"] area 316160658/78125 ["<meter>", "<meter>"]/["<1>"]
<hectare> hectare ["hectare"] area 10000 ["<meter>", "<meter>"]/["<1>"]
<sqft> sqft ["sqft"] area 145161/1562500 ["<meter>", "<meter>"]/["<1>"]
<sqin> sqin ["sqin"] area 16129/25000000 ["<meter>", "<meter>"]/["<1>"]
<farad> F ["F", "farad", "farads"] capacitance 1 ["<ampere>", "<ampere>", "<second>", "<second>", "<second>", "<second>"]/["<kilogram>", "<meter>", "<meter>"]
<coulomb> C ["C", "coulomb", "coulombs"] charge 1 ["<ampere>", "<second>"]/["<1>"]
<siemens> S ["S", "siemens"] conductance 1 ["<ampere>", "<ampere>", "<second>", "<second>", "<second>"]/["<kilogram>", "<meter>", "<meter>"]
<base-pair> bp ["bp", "base-pair"] counting 1 ["<each>"]/["<1>"]
<cell> cells ["cells", "cell"] counting 1 ["<each>"]/["<1>"]
<count> count ["count"] counting 1 ["<each>"]/["<1>"]
<dot> dot ["dot", "dots"] counting 1 ["<each>"]/["<1>"]
<dozen> doz ["doz", "dz", "dozen"] counting 12 ["<each>"]/["<1>"]
<each> each ["each"] counting 1 ["<each>"]/["<1>"]
<gross> gr ["gr", "gross"] counting 144 ["<each>"]/["<1>"]
<molecule> molecule ["molecule", "molecules"] counting 1 ["<each>"]/["<1>"]
<nucleotide> nt ["nt", "nucleotide"] counting 1 ["<each>"]/["<1>"]
<pixel> px ["px", "pixel", "pixels"] counting 1 ["<each>"]/["<1>"]
<cents> cents ["cents"] currency 1/100 ["<dollar>"]/["<1>"]
<dollar> USD ["USD", "dollar"] currency 1 ["<dollar>"]/["<1>"]
<ampere> A ["A", "ampere", "amperes", "amp", "amps"] current 1 ["<ampere>"]/["<1>"]
<btu> Btu ["Btu", "btu", "Btus", "btus"] energy 2320092679909671/2199023255552 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>"]
<Calorie> Cal ["Cal", "Calorie", "Calories"] energy 4184.0 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>"]
<calorie> cal ["cal", "calorie", "calories"] energy 4.184 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>"]
<erg> erg ["erg", "ergs"] energy 1/10000000 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>"]
<joule> J ["J", "joule", "joules"] energy 1 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>"]
<therm> thm ["thm", "therm", "therms", "Therm"] energy 105505600.0 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>"]
<dyne> dyn ["dyn", "dyne"] force 1/100000 ["<kilogram>", "<meter>"]/["<second>", "<second>"]
<newton> N ["N", "newton", "newtons"] force 1 ["<kilogram>", "<meter>"]/["<second>", "<second>"]
<poundal> pdl ["pdl", "poundal", "poundals"] force 17281869297/125000000000 ["<kilogram>", "<meter>"]/["<second>", "<second>"]
<pound-force> lbf ["lbf", "pound-force"] force 8896443230521/2000000000000 ["<kilogram>", "<meter>"]/["<second>", "<second>"]
<becquerel> Bq ["Bq", "becquerel", "becquerels"] frequency 1 ["<1>"]/["<second>"]
<bpm> bpm ["bpm"] frequency 1/60 ["<each>"]/["<second>"]
<cpm> cpm ["cpm"] frequency 1/60 ["<each>"]/["<second>"]
<curie> Ci ["Ci", "curie", "curies"] frequency 37000000000.0 ["<1>"]/["<second>"]
<dpm> dpm ["dpm"] frequency 1/60 ["<each>"]/["<second>"]
<hertz> Hz ["Hz", "hertz"] frequency 1 ["<1>"]/["<second>"]
<lux> lux ["lux"] illuminance 1 ["<candela>", "<steradian>"]/["<meter>", "<meter>"]
<henry> H ["H", "henry", "henries"] inductance 1 ["<kilogram>", "<meter>", "<meter>"]/["<ampere>", "<ampere>", "<second>", "<second>"]
<bit> b ["b", "bit"] information 1/8 ["<byte>"]/["<1>"]
<byte> B ["B", "byte", "bytes"] information 1 ["<byte>"]/["<1>"]
<angstrom> ang ["ang", "angstrom", "angstroms"] length 1/10000000000 ["<meter>"]/["<1>"]
<AU> AU ["AU", "astronomical-unit"] length 149597870700 ["<meter>"]/["<1>"]
<fathom> fathom ["fathom", "fathoms"] length 1143/625 ["<meter>"]/["<1>"]
<foot> ft ["ft", "foot", "feet", "'"] length 381/1250 ["<meter>"]/["<1>"]
<furlong> fur ["fur", "furlong", "furlongs"] length 25146/125 ["<meter>"]/["<1>"]
<inch> in ["in", "inch", "inches", "\""] length 127/5000 ["<meter>"]/["<1>"]
<league> league ["league", "leagues"] length 603504/125 ["<meter>"]/["<1>"]
<light-minute> lmin ["lmin", "light-minute"] length 17987547480 ["<meter>"]/["<1>"]
<light-second> ls ["ls", "lsec", "light-second"] length 299792458 ["<meter>"]/["<1>"]
<light-year> ly ["ly", "light-year"] length 9460528412464108 ["<meter>"]/["<1>"]
<meter> m ["m", "meter", "meters", "metre", "metres"] length 1 ["<meter>"]/["<1>"]
<mile> mi ["mi", "mile", "miles"] length 201168/125 ["<meter>"]/["<1>"]
<mil> mil ["mil", "mils"] length 127/5000000 ["<meter>"]/["<1>"]
<naut-league> nleague ["nleague", "nleagues", "naut-league"] length 5556 ["<meter>"]/["<1>"]
<naut-mile> nmi ["nmi", "M", "NM", "naut-mile"] length 1852 ["<meter>"]/["<1>"]
<parsec> pc ["pc", "parsec", "parsecs"] length 3.08568025088532e+16 ["<meter>"]/["<1>"]
<pica> P ["P", "pica", "picas"] length 127/30000 ["<meter>"]/["<1>"]
<point> point ["point", "points"] length 127/360000 ["<meter>"]/["<1>"]
<redshift> z ["z", "red-shift", "redshift"] length 130277299999999992243683328 ["<meter>"]/["<1>"]
<rod> rd ["rd", "rod", "rods"] length 12573/2500 ["<meter>"]/["<1>"]
<survey-foot> sft ["sft", "sfoot", "sfeet", "survey-foot"] length 1200/3937 ["<meter>"]/["<1>"]
<yard> yd ["yd", "yard", "yards"] length 1143/1250 ["<meter>"]/["<1>"]
<decibel> dB ["dB", "decibel", "decibels"] logarithmic 1 ["<decibel>"]/["<1>"]
<candela> cd ["cd", "candela"] luminosity 1 ["<candela>"]/["<1>"]
<lumen> lm ["lm", "lumen"] luminous_power 1 ["<candela>", "<steradian>"]/["<1>"]
<gauss> G ["G", "gauss"] magnetism 1/10000 ["<kilogram>"]/["<ampere>", "<second>", "<second>"]
<maxwell> Mx ["Mx", "maxwell", "maxwells"] magnetism 1/100000000 ["<kilogram>", "<meter>", "<meter>"]/["<ampere>", "<second>", "<second>"]
<oersted> Oe ["Oe", "oersted", "oersteds"] magnetism 79.5774715459477 ["<ampere>"]/["<meter>"]
<tesla> T ["T", "tesla", "teslas"] magnetism 1 ["<kilogram>"]/["<ampere>", "<second>", "<second>"]
<weber> Wb ["Wb", "weber", "webers"] magnetism 1 ["<kilogram>", "<meter>", "<meter>"]/["<ampere>", "<second>", "<second>"]
<AMU> u ["u", "AMU", "amu"] mass 1/602214128999999968641024 ["<kilogram>"]/["<1>"]
<carat> ct ["ct", "carat", "carats"] mass 1/5000 ["<kilogram>"]/["<1>"]
<dalton> Da ["Da", "dalton", "daltons"] mass 1/602214128999999968641024 ["<kilogram>"]/["<1>"]
<gram> g ["g", "gram", "grams", "gramme", "grammes"] mass 1/1000 ["<kilogram>"]/["<1>"]
<kilogram> kg ["kg", "kilogram", "kilograms"] mass 1 ["<kilogram>"]/["<1>"]
<metric-ton> tonne ["tonne", "metric-ton"] mass 1000 ["<kilogram>"]/["<1>"]
<ounce> oz ["oz", "ounce", "ounces"] mass 45359237/1600000000 ["<kilogram>"]/["<1>"]
<pound> lbs ["lbs", "lb", "lbm", "pound-mass", "pound", "pounds", "#"] mass 45359237/100000000 ["<kilogram>"]/["<1>"]
<short-ton> tn ["tn", "ton", "tons", "short-tons", "short-ton"] mass 45359237/50000 ["<kilogram>"]/["<1>"]
<slug> slug ["slug", "slugs"] mass 8896443230521/609600000000 ["<kilogram>"]/["<1>"]
<molar> M ["M", "molar"] molar_concentration 1000 ["<mole>"]/["<meter>", "<meter>", "<meter>"]
<volt> V ["V", "volt", "volts"] potential 1 ["<kilogram>", "<meter>", "<meter>"]/["<ampere>", "<second>", "<second>", "<second>"]
<horsepower> hp ["hp", "horsepower"] power 37284993579113511/50000000000000 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>", "<second>"]
<watt> W ["W", "Watt", "watt", "watts"] power 1 ["<kilogram>", "<meter>", "<meter>"]/["<second>", "<second>", "<second>"]
<atm> atm ["atm", "ATM", "atmosphere", "atmospheres"] pressure 101325 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<bar> bar ["bar", "bars"] pressure 100000.0 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<cmh2o> cmH2O ["cmH2O", "cmh2o", "cmAq"] pressure 196133/2000 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<inh2o> inH2O ["inH2O", "inh2o", "inAq"] pressure 24908891/100000 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<inHg> inHg ["inHg"] pressure 190636732734642608180389/56294995342131200000 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<mmHg> mmHg ["mmHg"] pressure 1501076635705847308507/11258999068426240000 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<pascal> Pa ["Pa", "pascal", "pascals"] pressure 1 ["<kilogram>"]/["<meter>", "<second>", "<second>"]
<psi> psi ["psi"] pressure 8896443230521/1290320000 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<torr> Torr ["Torr", "torr"] pressure 20265/152 ["<kilogram>"]/["<second>", "<second>", "<meter>"]
<gray> Gy ["Gy", "gray", "grays"] radiation 1 ["<meter>", "<meter>"]/["<second>", "<second>"]
<sievert> Sv ["Sv", "sievert", "sieverts"] radiation 1 ["<meter>", "<meter>"]/["<second>", "<second>"]
<roentgen> R ["R", "roentgen"] radiation_exposure 0.000258 ["<ampere>", "<second>"]/["<kilogram>"]
<ohm> Ohm ["Ohm", "ohm", "ohms"] resistance 1 ["<kilogram>", "<meter>", "<meter>"]/["<ampere>", "<ampere>", "<second>", "<second>", "<second>"]
<steradian> sr ["sr", "steradian", "steradians"] solid_angle 1 ["<steradian>"]/["<1>"]
<fps> fps ["fps"] speed 381/1250 ["<meter>"]/["<second>"]
<knot> kt ["kt", "kn", "kts", "knot", "knots"] speed 463/900 ["<meter>"]/["<second>"]
<kph> kph ["kph"] speed 0.277777777777778 ["<meter>"]/["<second>"]
<mph> mph ["mph"] speed 1397/3125 ["<meter>"]/["<second>"]
<mole> mol ["mol", "mole"] substance 1 ["<mole>"]/["<1>"]
<celsius> degC ["degC", "celsius", "centigrade"] temperature 1 ["<kelvin>"]/["<1>"]
<fahrenheit> degF ["degF", "fahrenheit"] temperature 2501999792983609/4503599627370496 ["<kelvin>"]/["<1>"]
<kelvin> degK ["degK", "kelvin"] temperature 1 ["<kelvin>"]/["<1>"]
<rankine> degR ["degR", "rankine"] temperature 2501999792983609/4503599627370496 ["<kelvin>"]/["<1>"]
<tempC> tempC ["tempC"] temperature 1 ["<tempK>"]/["<1>"]
<tempF> tempF ["tempF"] temperature 2501999792983609/4503599627370496 ["<tempK>"]/["<1>"]
<tempK> tempK ["tempK"] temperature 1 ["<tempK>"]/["<1>"]
<tempR> tempR ["tempR"] temperature 255.927777777778 ["<tempK>"]/["<1>"]
<century> century ["century", "centuries"] time 3155692600 ["<second>"]/["<1>"]
<day> d ["d", "day", "days"] time 86400 ["<second>"]/["<1>"]
<decade> decade ["decade", "decades"] time 315569260 ["<second>"]/["<1>"]
<fortnight> fortnight ["fortnight", "fortnights"] time 1209600 ["<second>"]/["<1>"]
<hour> h ["h", "hr", "hrs", "hour", "hours"] time 3600 ["<second>"]/["<1>"]
<minute> min ["min", "minute", "minutes"] time 60 ["<second>"]/["<1>"]
<month> Month ["month", "mon", "months", "mons", "mo"] time 2629743.83333333 ["<second>"]/["<1>"]
<second> s ["s", "sec", "second", "seconds"] time 1 ["<second>"]/["<1>"]
<week> wk ["wk", "week", "weeks"] time 604800 ["<second>"]/["<1>"]
<year> y ["y", "yr", "year", "years", "annum"] time 31556926 ["<second>"]/["<1>"]
<percent> % ["%", "percent"] unitless 1/100
<ppb> ppb ["ppb"] unitless 1/1000000000
<ppm> ppm ["ppm"] unitless 1/1000000
<poise> P ["P", "poise"] viscosity 1/10 ["<kilogram>"]/["<second>", "<meter>"]
<stokes> St ["St", "stokes"] viscosity 1/10000 ["<meter>", "<meter>"]/["<second>"]
<cup> cu ["cu", "cup", "cups"] volume 473176473/2000000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<fluid-ounce> floz ["floz", "fluid-ounce", "fluid-ounces"] volume 473176473/16000000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<gallon> gal ["gal", "gallon", "gallons"] volume 473176473/125000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<liter> l ["l", "L", "liter", "liters", "litre", "litres"] volume 1/1000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<pint> pt ["pt", "pint", "pints"] volume 473176473/1000000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<quart> qt ["qt", "quart", "quarts"] volume 473176473/500000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<tablespoon> tbs ["tbs", "tbsp", "tablespoon", "tablespoons"] volume 473176473/32000000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<teaspoon> tsp ["tsp", "teaspoon", "teaspoons"] volume 157725491/32000000000000 ["<meter>", "<meter>", "<meter>"]/["<1>"]
<cfm> cfm ["cfm", "CFM", "CFPM"] volumetric_flow 18435447/39062500000 ["<meter>", "<meter>", "<meter>"]/["<second>"]
<dpi> dpi ["dpi"] wavenumber 5000/127 ["<each>"]/["<meter>"]
<ppi> ppi ["ppi"] wavenumber 5000/127 ["<each>"]/["<meter>"]

TABLE 2: Scales of Units

Below is a list of acceptable prefixes to the units provided in Table 1.
You can use a combination of the prefix from Table 2 and the actual unit name from Table 1
when you define units for measurement domain properties.

EXAMPLE:
Prefix Name Display Name Aliases Kind Scalar Value
<1> 1 ["1"] prefix 1
<atto> a ["a", "Atto", "atto"] prefix 1/1000000000000000000
<centi> c ["c", "Centi", "centi"] prefix 1/100
<deca> da ["da", "Deca", "deca", "deka"] prefix 10.0
<deci> d ["d", "Deci", "deci"] prefix 1/10
<exa> E ["E", "Exa", "exa"] prefix 1.0e+18
<exi> Ei ["Ei", "Exi", "exi"] prefix 1152921504606846976
<femto> f ["f", "Femto", "femto"] prefix 1/1000000000000000
<gibi> Gi ["Gi", "Gibi", "gibi"] prefix 1073741824
<giga> G ["G", "Giga", "giga"] prefix 1000000000.0
<googol> googol ["googol"] prefix 1.0e+100
<hecto> h ["h", "Hecto", "hecto"] prefix 100.0
<kibi> Ki ["Ki", "Kibi", "kibi"] prefix 1024
<kilo> k ["k", "kilo"] prefix 1000.0
<mebi> Mi ["Mi", "Mebi", "mebi"] prefix 1048576
<mega> M ["M", "Mega", "mega"] prefix 1000000.0
<micro> u ["u", "Micro", "micro", "mc"] prefix 1/1000000
<milli> m ["m", "Milli", "milli"] prefix 1/1000
<nano> n ["n", "Nano", "nano"] prefix 1/1000000000
<pebi> Pi ["Pi", "Pebi", "pebi"] prefix 1125899906842624
<peta> P ["P", "Peta", "peta"] prefix 1.0e+15
<pico> p ["p", "Pico", "pico"] prefix 1/1000000000000
<tebi> Ti ["Ti", "Tebi", "tebi"] prefix 1099511627776
<tera> T ["T", "Tera", "tera"] prefix 1000000000000.0
<yebi> Yi ["Yi", "Yebi", "yebi"] prefix 1208925819614629174706176
<yocto> y ["y", "Yocto", "yocto"] prefix 1/999999999999999983222784
<yotta> Y ["Y", "Yotta", "yotta"] prefix 1.0e+24
<zebi> Zi ["Zi", "Zebi", "zebi"] prefix 1180591620717411303424
<zepto> z ["z", "Zepto", "zepto"] prefix 1/1000000000000000000000
<zetta> Z ["Z", "Zetta", "zetta"] prefix 1.0e+21

GenboreeKB exRNA Metadata Tracking System - Navigating the Metadata UI

Overview

Step-by-step Instructions to Use GenboreeKB

In order to see TEMPLATES and EXAMPLES for the various collections you'll be browsing, refer to exRNA Metadata Standards.

Login

GenboreeKB Basics

Select the Project "exRNA Metadata Standards"

Accessing exRNA Metadata GenboreeKB

Select Metadata Collection

Create New Metadata Documents

Add Sub-properties

Saving Document

Browse Existing Metadata

Search and Browse Existing Documents

Edit Existing Documents


Dynamic Retrieval of Bioportal Ontology Terms


Upload and Download Metadata

Bulk Upload of Docs


Download entire collection or a single document


Data Models

View Models


Opening and Saving Metadata Files

Opening Metadata Files in Microsoft Excel

Given below are the instructions to ensure your metadata file is formatted
and opened correctly in Microsoft Excel.

  1. Open Microsoft Excel.
  2. Click on File >> Open, then navigate to the folder in your computer that has the saved metadata file.
  3. Select the metadata file (with .tsv extension)
  4. Choose the file type as "Delimited" and click Next.
  5. Check the box next to Tab Delimiter and click Next.
  6. IMPORTANT STEP: Select the radio button next to Text under Column data format and click Finish.

IMPORTANT: Make sure that you open the file through File >> Open in Excel as opposed to right-clicking the file and then clicking open with Excel. The latter method may bypass the text import wizard and result in issues with your metadata file.



Saving Metadata Files

Microsoft Excel in Windows

Select "Save As" from the menubar.
Navigate to the folder where you would like to save your metadata document.
Provide a file name for your document. Remember, file names end with .metadata.tsv.
Select the option "Text (Tab delimited)" from the pull down menu for "Save as type" and press OK.

Microsoft Excel in Mac

To save your metadata documents as a properly formatted tab-separated value file, click "Save" and
select the option to save as "Windows Formatted Text".
This option saves the file as a tab-separated value file without any special characters.

LibreOffice Calc

Select "Save As", choose "All Format", and then choose "Test CSV (.csv)".
You will see a dialog box titled "Export Text File".
Select {Tab} from the pull down menu for "Field delimiter" and select OK.

Your document will be saved as a tab-delimited text file.

Sanity Check the TSV file

To ensure there are no special characters in your metadata document after following the above mentioned
methods to save your file, open the document in any text editor like

Check if the document is properly formatted, i.e. columns are separated by a tab character and
the document does not have any characters like ^M, etc.


Opening and Saving Metadata Files

Opening Metadata Files in Microsoft Excel

Given below are the instructions to ensure your metadata template document is formatted
and opened correctly in Microsoft Excel.

  1. Open Microsoft Excel.
  2. Click on File >> Open, then navigate to the folder in your computer that has the saved document template.
  3. Select the document template file (with .tsv extension)
  4. Choose the file type as "Delimited" and click Next.
  5. Check the box next to Tab Delimiter and click Next.
  6. IMPORTANT STEP: Select the radio button next to Text under Column data format and click Finish.



Saving Metadata Files

Microsoft Excel in Windows

Select "Save As" from the menubar.
Navigate to the folder where you would like to save your metadata document.
Provide a file name for your document. Remember, file names end with .metadata.tsv.
Select the option "Text (Tab delimited)" from the pull down menu for "Save as type" and press OK.

Microsoft Excel in Mac

To save your metadata documents as a properly formatted tab-separated value file, click "Save" and
select the option to save as "Windows Formatted Text".
This option saves the file as a tab-separated value file without any special characters.

LibreOffice Calc

Select "Save As", choose "All Format", and then choose "Test CSV (.csv)".
You will see a dialog box titled "Export Text File".
Select {Tab} from the pull down menu for "Field delimiter" and select OK.

Your document will be saved as a tab-delimited text file.

Sanity Check the TSV file

To ensure there are no special characters in your metadata document after following the above mentioned
methods to save your file, open the document in any text editor like

Check if the document is properly formatted, i.e. columns are separated by a tab character and
the document does not have any characters like ^M, etc.


Prepare Your Analyses Metadata File


Prepare Your Biosamples Metadata File

Here are some specific instructions for filling out a Biosamples metadata file:


Prepare Your Data Archive

Step 1. Gather All of Your Data Files in the Same Directory

Step 2. Compress Data Files into One Archive

Summary

  1. Gather all of your data files in the same directory (including spike-in file, if necessary)
  2. Compress data files into a single archive

Prepare Your Donors Metadata File

Here are some specific instructions for filling out a Donors metadata file:


Prepare Your Experiments Metadata File

Here are some specific instructions for filling out an Experiments metadata file:


Prepare Your longRNAseq Data Archive

Step 1. Gather All of Your Data Files in the Same Directory

Step 2. Compress Data Files into One Archive

Summary

  1. Gather all of your data files in the same directory (including spike-in file, if necessary)
  2. Compress data files into a single archive

Prepare Your longRNAseq Experiments Metadata File

Here are some specific instructions for filling out an Experiments metadata file:


Prepare Your longRNAseq Manifest File

After you have finished preparing your data archive and metadata archive, you have to complete the third and final part of your submission: the manifest file.
The manifest file is the "glue" that links together all of your metadata and data. It also provides some important, additional information required to process your submission.

Your manifest file name will have the same prefix as your other files (data archive, metadata file) and will end in "_longRNAseq.manifest.json".
For example, if my data archive was named "samples_longRNAseq_data.zip", then my manifest file would be named "samples_longRNAseq.manifest.json".
As you work on your manifest file, make sure that you save regularly so you don't lose your progress!

Step 1. Download Template Manifest File

First, you will want to download a template of the manifest file.
You can find that template here.
You will complete your manifest file by filling in values between the quotation marks for each property.

Below, you can see what the template looks like:

 1 {
 2   "studyName": "",
 3   "userLogin": "",
 4   "md5CheckSum": "",
 5   "runMetadataFileName": "",
 6   "submissionMetadataFileName": "",
 7   "studyMetadataFileName": "",
 8   "experimentMetadataFileName": "",
 9   "biosampleMetadataFileName": "",
10   "donorMetadataFileName": "",
11   "manifest": 
12   [
13     {
14       "dataFileNameRead1": "",
15       "dataFileNameRead2": "",
16       "sampleName": "" 
17     }
18   ],
19   "settings":
20   {
21     "adapterSequence": "",
22     "analysisName": "" 
23   }
24 }

Step 2. Open Your Manifest File

Next, you will need to open your manifest file in your favorite text editor.
You can find some recommendations below:

Step 3. Compute the MD5 Checksum of your Data Archive

Step 4. Fill Out the Top Section of Your Manifest

The top section of your manifest contains information that applies to all samples in your submission.
Below, we'll go through each property and tell you how to fill them all out.

So far, our template should look something like this:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "manifest": 
12   [
13     {
14       "dataFileNameRead1": "",
15       "dataFileNameRead2": "",
16       "sampleName": "" 
17     }
18   ],
19   "settings":
20   { 
21     "adapterSequence": "",
22     "analysisName": "" 
23   }
24 }

Step 5. Fill Out the Sample-Specific Section of Your Manifest

Next, we'll tackle the part of the manifest file that deals with your individual samples.
For each sample, you will need to fill out a dataFileNameRead1, dataFileNameRead2, and sampleFileName.
Currently, the template only has space to fill out information about one sample.
To add more samples, all you need to do is copy-paste the existing set of dataFileNameRead1, dataFileNameRead2, and sampleFileName properties.
For example, this is what the (relevant part of the) template currently looks like:

 1 {
 2   "manifest": 
 3   [
 4     {
 5       "dataFileNameRead1": "",
 6       "dataFileNameRead2": "",
 7       "sampleName": "" 
 8     }
 9   ],
10 }

If I had three samples, It would look like this:

 1 {
 2   "manifest": 
 3   [
 4     {
 5       "dataFileNameRead1": "",
 6       "dataFileNameRead2": "",
 7       "sampleName": "" 
 8     },
 9     {
10       "dataFileNameRead1": "",
11       "dataFileNameRead2": "",
12       "sampleName": "" 
13     },
14     {
15       "dataFileNameRead1": "",
16       "dataFileNameRead2": "",
17       "sampleName": "" 
18     }
19   ],
20 }

IMPORTANT NOTE: I added a comma between dataFileNameRead1, dataFileNameRead2, and sampleName properties. This is required (or else your file will not be valid JSON).

Next, we'll go over how to fill out the dataFileNameRead1, dataFileNameRead2, and sampleName for each sample.
It might be easiest to first see how this section will look when properly filled out:

 1 {
 2   "manifest": 
 3   [
 4     {
 5       "dataFileNameRead1": "test1.R1.fastq.gz",
 6       "dataFileNameRead2": "test1.R2.fastq.gz",
 7       "sampleName": "Test 1" 
 8     },
 9     {
10       "dataFileNameRead1": "test2.R1.fastq.gz",
11       "dataFileNameRead2": "test2.R2.fastq.gz",
12       "sampleName": "Test 2" 
13     },
14     {
15       "dataFileNameRead1": "test3.R1.fastq.gz",
16       "dataFileNameRead2": "test3.R2.fastq.gz",
17       "sampleName": "Test 3" 
18     }
19   ],
20 }

The dataFileName property refers to a given sample's data file name in the data archive. Next, we'll explain the sampleName property.

Now, our manifest file looks like the following:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "manifest": 
12   [
13     {
14       "dataFileNameRead1": "test1.R1.fastq.gz",
15       "dataFileNameRead2": "test1.R2.fastq.gz",
16       "sampleName": "Test 1" 
17     },
18     {
19       "dataFileNameRead1": "test2.R1.fastq.gz",
20       "dataFileNameRead2": "test2.R2.fastq.gz",
21       "sampleName": "Test 2" 
22     },
23     {
24       "dataFileNameRead1": "test3.R1.fastq.gz",
25       "dataFileNameRead2": "test3.R2.fastq.gz",
26       "sampleName": "Test 3" 
27     }
28   ],
29   "settings":
30   {
31     "adapterSequence": "",
32     "analysisName": "" 
33   }
34 }

Here is a manifest file filler helper that could help you create all of the sampleName, dataFileNameRead1, and dataFileNameRead2 in JSON format.
Make sure you are in the longRNAseq tab and remember to remove the final comma "," after the last sampleName, dataFileNameRead1, and dataFileNameRead2 entry in the JSON file.

Step 6. Fill Out the Settings Section of Your Manifest

The "settings" section at the bottom of the manifest file provides some ability to customize how your submission is processed.
Below, we'll go over the different options and describe briefly what they do.

Setting Name Description and Possible Values
adapterSequence value of 3' adapter sequence. Default of "autoDetect" (will try to auto-detect adapter sequence). Other possible values include "none" (adapter sequence already clipped) and the actual value of the adapter sequence (for example, "AGATCGGAAGAGCACACGTCT"). Note that you can provide a different 3' adapter sequence for each sample by including the adapterSequence field with each sample's information (dataFileName / sampleName). If you do so, don't include the adapterSequence field in the general settings section.
randomBarcodeLength indicates random barcode length used in samples. Default of "0" (no random barcodes).
randomBarcodeLocation indicates location of random barcodes. Default of "-5p -3p". Other possible values include "-5p" and "-3p".
randomBarcodeStats sets whether we should compute frequency and enrichment statistics for samples with random barcodes (useful for identifying ligation/amplification biases in some cases). Default of "false" (recommended). Other possible values include "true".
analysisName analysis name - used for naming job-specific folder on Genboree and for naming certain files in your results. Default uses timestamp to indicate when the job was submitted (this is a good idea!).
genomeVersion genome version of your output database / your data. Default is hg19. Other supported genomes are mm10.
useLibrary indicates whether you are using a spike-in library. Default value of "noOligo", which means no spike-in library. Other possible values are "uploadNewLibrary" (you included a FASTA file in your data archive).
suppressRunExceRptEmails indicates whether you want to suppress all runExceRpt emails sent by successfully processed samples. Note that failure emails will be sent regardless. This setting will significantly reduce the number of emails you receive. Default: false. Other possible values include "true".

IMPORTANT NOTES

You must specify an analysisName in your manifest file, as this setting provides valuable information for organizing your submission.
We recommend that you structure your analysisName in the following way:

Make sure that you include "useLibrary": "uploadNewLibrary" if you are providing a spike-in library with your data files.

Make sure that you specify "genomeVersion": "mm10" if your samples use one of these alternative reference genomes (hg19 is the default).

Make sure that you specify randomBarcodeLength and randomBarcodeLocation if your samples have random barcodes (we recommend not using randomBarcodeStats).


Now, our (completed) manifest file looks like the following:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "manifest": 
12   [
13     {
14       "dataFileNameRead1": "test1.R1.fastq.gz",
15       "dataFileNameRead2": "test1.R2.fastq.gz",
16       "sampleName": "Test 1" 
17     },
18     {
19       "dataFileNameRead1": "test2.R1.fastq.gz",
20       "dataFileNameRead2": "test2.R2.fastq.gz",
21       "sampleName": "Test 2" 
22     },
23     {
24       "dataFileNameRead1": "test3.R1.fastq.gz",
25       "dataFileNameRead2": "test3.R2.fastq.gz",
26       "sampleName": "Test 3" 
27     }
28   ],
29   "settings":
30   {
31     "adapterSequence": "AGATCGGAAGAGCACACGTCT",
32     "analysisName": "AMILO1-Serum_vs_Plasma_Controls-2017-06-01" 
33   }
34 }

If you remove or add a setting, make sure that your terms are still separated sensibly by commas.
For example, if I removed analysisName above, I would delete the comma after adapterSequence (because adapterSequence is now the final property).
Likewise, if I added another property like genomeVersion after analysisName, I would put a comma after analysisName (but no comma after genomeVersion).

You can download a completed example manifest file here

Step 7. Validate and Save Your Manifest File

After you've finished working on your manifest file, you should make sure that the file is formatted correctly by using a JSON validator like JSONLint.
Simply copy-paste your manifest content into the text box and then click "Validate" to see if there are any errors in your manifest file.
If there are any errors, use the error messages provided by the JSON validator to fix your manifest file.
You're now done with creating your manifest file! Save it a final time and you're ready to upload your submission for processing.

Summary

  1. Download template manifest file
  2. Open your manifest file
  3. Compute the MD5 checksum of your data archive (not your manifest file, not your metadata archive)
  4. Fill out the top section of your manifest
  5. Fill out the sample-specific section of your manifest
  6. Fill out the settings section of your manifest
  7. Validate and save your manifest file

Prepare Your longRNAseq Metadata Archive

'Metadata' refers to descriptive information and protocols for the overall study, the experiments performed, and the individual samples that are part of your submission.
This information is supplied by completing one file for each type of metadata and then archiving those files in your metadata archive.
Submitting your metadata is very important for: Your metadata archive will contain six different files:

We will go step-by-step below to create these files.

Step 1. Open Your Reference Materials (Introduction)

Step 2. Prepare Your Submissions Metadata File

Step 3. Prepare Your longRNAseq Studies Metadata File

Step 4. Prepare Your longRNAseq Runs Metadata File

Step 5. Prepare Your longRNAseq Experiments Metadata File

Step 6. Prepare Your Donors Metadata File

Step 7. Prepare Your Biosamples Metadata File

Step 8. Move All Metadata Files to Same Directory

Step 9. Validate the Metadata Files

Step 10. Create Metadata Archive

Summary

  1. Open your reference materials
  2. Complete each metadata file type in turn (a total of six different metadata file types)
  3. Move all completed metadata files to the same directory
  4. Compress all metadata files into one archive (with _metadata suffix and with same prefix as the data archive you created earlier)

Prepare Your longRNAseq Runs Metadata File


Prepare Your longRNAseq Studies Metadata File


Prepare your Manifest File

After you have finished preparing your data archive and metadata archive, you have to complete the third and final part of your submission: the manifest file.
The manifest file is the "glue" that links together all of your metadata and data. It also provides some important, additional information required to process your submission.

Your manifest file name will have the same prefix as your other files (data archive, metadata file) and will end in ".manifest.json".
For example, if my data archive was named "samples_data.zip", then my manifest file would be named "samples.manifest.json".
As you work on your manifest file, make sure that you save regularly so you don't lose your progress!

Step 1. Download Template Manifest File

First, you will want to download a template of the manifest file.
You can find that template here.
You will complete your manifest file by filling in values between the quotation marks for each property.

Below, you can see what the template looks like:

 1 {
 2   "studyName": "",
 3   "userLogin": "",
 4   "md5CheckSum": "",
 5   "runMetadataFileName": "",
 6   "submissionMetadataFileName": "",
 7   "studyMetadataFileName": "",
 8   "experimentMetadataFileName": "",
 9   "biosampleMetadataFileName": "",
10   "donorMetadataFileName": "",
11   "manifest": 
12   [
13     {
14       "dataFileName": "",
15       "sampleName": "" 
16     }
17   ],
18   "settings":
19   {
20     "adapterSequence": "",
21     "analysisName": "" 
22   }
23 }

Step 2. Open Your Manifest File

Next, you will need to open your manifest file in your favorite text editor.
You can find some recommendations below:

Step 3. Compute the MD5 Checksum of your Data Archive

Step 4. Fill Out the Top Section of Your Manifest

The top section of your manifest contains information that applies to all samples in your submission.
Below, we'll go through each property and tell you how to fill them all out.

So far, our template should look something like this:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "manifest": 
12   [
13     {
14       "dataFileName": "",
15       "sampleName": "" 
16     }
17   ],
18   "settings":
19   { 
20     "adapterSequence": "",
21     "analysisName": "" 
22   }
23 }

Step 5. Fill Out the Sample-Specific Section of Your Manifest

Next, we'll tackle the part of the manifest file that deals with your individual samples.
For each sample, you will need to fill out a dataFileName and sampleFileName.
Currently, the template only has space to fill out information about one sample.
To add more samples, all you need to do is copy-paste the existing set of dataFileName and sampleFileName properties.
For example, this is what the (relevant part of the) template currently looks like:

 1 {
 2   "manifest": 
 3   [
 4     {
 5       "dataFileName": "",
 6       "sampleName": "" 
 7     }
 8   ],
 9 }

If I had five samples, It would look like this:

 1 {
 2   "manifest": 
 3   [
 4     {
 5       "dataFileName": "",
 6       "sampleName": "" 
 7     },
 8     {
 9       "dataFileName": "",
10       "sampleName": "" 
11     },
12     {
13       "dataFileName": "",
14       "sampleName": "" 
15     },
16     {
17       "dataFileName": "",
18       "sampleName": "" 
19     },
20     {
21       "dataFileName": "",
22       "sampleName": "" 
23     }
24   ],
25 }

IMPORTANT NOTE: I added a comma between each pair of dataFileName / sampleName properties. This is required (or else your file will not be valid JSON).

Next, we'll go over how to fill out the dataFileName and sampleName for each sample.
It might be easiest to first see how this section will look when properly filled out:

 1 {
 2   "manifest": 
 3   [
 4     {
 5       "dataFileName": "test1.fastq.gz",
 6       "sampleName": "Test 1" 
 7     },
 8     {
 9       "dataFileName": "test2.fastq.gz",
10       "sampleName": "Test 2" 
11     },
12     {
13       "dataFileName": "test3.fastq.gz",
14       "sampleName": "Test 3" 
15     },
16     {
17       "dataFileName": "test4.fastq.gz",
18       "sampleName": "Test 4" 
19     },
20     {
21       "dataFileName": "test5.fastq.gz",
22       "sampleName": "Test 5" 
23     }
24   ],
25 }

The dataFileName property refers to a given sample's data file name in the data archive. Next, we'll explain the sampleName property.

Now, our manifest file looks like the following:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "manifest": 
12   [
13     {
14       "dataFileName": "test1.fastq.gz",
15       "sampleName": "Test 1" 
16     },
17     {
18       "dataFileName": "test2.fastq.gz",
19       "sampleName": "Test 2" 
20     },
21     {
22       "dataFileName": "test3.fastq.gz",
23       "sampleName": "Test 3" 
24     },
25     {
26       "dataFileName": "test4.fastq.gz",
27       "sampleName": "Test 4" 
28     },
29     {
30       "dataFileName": "test5.fastq.gz",
31       "sampleName": "Test 5" 
32     }
33   ],
34   "settings":
35   {
36     "adapterSequence": "",
37     "analysisName": "" 
38   }
39 }

Here is a manifest file filler helper that could help you create all of the sampleName and dataFileName pairs in JSON format.
Make sure you are in the smRNAseq tab and remember to remove the final comma "," after the last sampleName, dataFileName pair in the JSON file.

Step 6. Fill Out the Settings Section of Your Manifest

The "settings" section at the bottom of the manifest file provides some ability to customize how your submission is processed.
Below, we'll go over the different options and describe briefly what they do.

Setting Name Description and Possible Values
adapterSequence value of 3' adapter sequence. Default of "autoDetect" (will try to auto-detect adapter sequence). Other possible values include "none" (adapter sequence already clipped) and the actual value of the adapter sequence (for example, "AGATCGGAAGAGCACACGTCT"). Note that you can provide a different 3' adapter sequence for each sample by including the adapterSequence field with each sample's information (dataFileName / sampleName). If you do so, don't include the adapterSequence field in the general settings section.
randomBarcodeLength indicates random barcode length used in samples. Default of "0" (no random barcodes).
randomBarcodeLocation indicates location of random barcodes. Default of "-5p -3p". Other possible values include "-5p" and "-3p".
randomBarcodeStats sets whether we should compute frequency and enrichment statistics for samples with random barcodes (useful for identifying ligation/amplification biases in some cases). Default of "false" (recommended). Other possible values include "true".
analysisName analysis name - used for naming job-specific folder on Genboree and for naming certain files in your results. Default uses timestamp to indicate when the job was submitted (this is a good idea!).
genomeVersion genome version of your output database / your data. Default is hg19. Other supported genomes are mm10.
useLibrary indicates whether you are using a spike-in library. Default value of "noOligo", which means no spike-in library. Other possible values are "uploadNewLibrary" (you included a FASTA file in your data archive).
suppressRunExceRptEmails indicates whether you want to suppress all runExceRpt emails sent by successfully processed samples. Note that failure emails will be sent regardless. This setting will significantly reduce the number of emails you receive. Default: false. Other possible values include "true".

IMPORTANT NOTES

You MUST specify an analysisName in your manifest file, as this setting provides valuable information for organizing your submission.
We recommend that you structure your analysisName in the following way:

Make sure that you include "useLibrary": "uploadNewLibrary" if you are providing a spike-in library with your data files.

Make sure that you specify "genomeVersion": "mm10" if your samples use one of these alternative reference genomes (hg19 is the default).

Make sure that you specify randomBarcodeLength and randomBarcodeLocation if your samples have random barcodes (we recommend not using randomBarcodeStats).


Now, our (completed) manifest file looks like the following:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "manifest": 
12   [
13     {
14       "dataFileName": "test1.fastq.gz",
15       "sampleName": "Test 1" 
16     },
17     {
18       "dataFileName": "test2.fastq.gz",
19       "sampleName": "Test 2" 
20     },
21     {
22       "dataFileName": "test3.fastq.gz",
23       "sampleName": "Test 3" 
24     },
25     {
26       "dataFileName": "test4.fastq.gz",
27       "sampleName": "Test 4" 
28     },
29     {
30       "dataFileName": "test5.fastq.gz",
31       "sampleName": "Test 5" 
32     }
33   ],
34   "settings":
35   {
36     "adapterSequence": "AGATCGGAAGAGCACACGTCT",
37     "analysisName": "AMILO1-Serum_vs_Plasma_Controls-2017-06-01" 
38   }
39 }

If you remove or add a setting, make sure that your terms are still separated sensibly by commas.
For example, if I removed analysisName above, I would delete the comma after adapterSequence (because adapterSequence is now the final property).
Likewise, if I added another property like genomeVersion after analysisName, I would put a comma after analysisName (but no comma after genomeVersion).

You can download this example manifest file here.

Step 7. Validate and Save Your Manifest File

After you've finished working on your manifest file, you should make sure that the file is formatted correctly by using a JSON validator like JSONLint.
Simply copy-paste your manifest content into the text box and then click "Validate" to see if there are any errors in your manifest file.
If there are any errors, use the error messages provided by the JSON validator to fix your manifest file.
You're now done with creating your manifest file! Save it a final time and you're ready to upload your submission for processing.

Summary

  1. Download template manifest file
  2. Open your manifest file
  3. Compute the MD5 checksum of your data archive (not your manifest file, not your metadata archive)
  4. Fill out the top section of your manifest
    1. Make sure file names are typed in exactly as how it is named, including file extension.
  5. Fill out the sample-specific section of your manifest
  6. Fill out the settings section of your manifest
  7. Validate and save your manifest file

Metadata Submission to the DCC

'Metadata' refers to descriptive information and protocols for the overall study, the experiments performed, and the individual samples that are part of your submission.
This information is supplied by completing one file for each type of metadata and then archiving those files in your metadata archive.
Submitting your metadata is very important for: Your metadata archive will contain six different files:

We will go step-by-step below to create these files.

Step 1. Open Your Reference Materials (Introduction)

Step 2. Prepare Your Submissions Metadata File

Step 3. Prepare Your Studies Metadata File

Step 4. Prepare Your Runs Metadata File

Step 5. Prepare Your Experiments Metadata File

Step 6. Prepare Your Donors Metadata File

Step 7. Prepare Your Biosamples Metadata File

Step 8. Move All Metadata Files to Same Directory

Step 9. Validate the Metadata Files

Step 10. Create Metadata Archive

Summary

  1. Open your reference materials
  2. Complete each metadata file type in turn (a total of six different metadata file types)
  3. Move all completed metadata files to the same directory
  4. Compress all metadata files into one archive (with _metadata suffix and with same prefix as the data archive you created earlier)

Prepare Your qPCR Data Archive

qPCR Data Files

Format of qPCR Data Archive

If you need help creating an archive, please visit the Creating an Archive page.

IMPORTANT NOTES

Prepare Your Experiments Metadata File

Here are some specific instructions for filling out an Experiments metadata file:


Prepare Your qPCR Manifest File

After you have finished preparing your data archive and metadata archive, you have to complete the third and final part of your submission: the manifest file.
The manifest file is the "glue" that links together all of your metadata and data. It also provides some important, additional information required to process your submission.

Your manifest file name will have the same prefix as your other files (data archive, metadata file) and will end in "_qPCR.manifest.json".
For example, if my data archive was named "samples_qPCR_data.zip", then my manifest file would be named "samples_qPCR.manifest.json".
As you work on your manifest file, make sure that you save regularly so you don't lose your progress!

Step 1. Download Template Manifest File

First, you will want to download a template of the manifest file.
You can find that template here.
You will complete your manifest file by filling in values between the quotation marks for each property.

Below, you can see what the template looks like:

 1 {
 2   "studyName": "",
 3   "userLogin": "",
 4   "md5CheckSum": "",
 5   "runMetadataFileName": "",
 6   "submissionMetadataFileName": "",
 7   "studyMetadataFileName": "",
 8   "experimentMetadataFileName": "",
 9   "biosampleMetadataFileName": "",
10   "donorMetadataFileName": "",
11   "qPCRTargetsMetadataFileName": "",
12   "settings":
13   {
14     "analysisName": "" 
15   }
16 }

Step 2. Open Your Manifest File

Next, you will need to open your manifest file in your favorite text editor.
You can find some recommendations below:

Step 3. Compute the MD5 Checksum of your Data Archive

Step 4. Fill Out the Top Section of Your Manifest

The top section of your manifest contains information that applies to all samples in your submission.
Below, we'll go through each property and tell you how to fill them all out.

So far, our template should look something like this:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "qPCRTargetsMetadataFileName": "testqPCRTargets.metadata.tsv",
12   "settings":
13   { 
14     "analysisName": "" 
15   }
16 }

Step 5. Fill Out the Settings Section of Your Manifest

The "settings" section at the bottom of the manifest file provides some ability to customize how your submission is processed.
Below, we'll go over the different options and describe briefly what they do.

Setting Name Description and Possible Values
analysisName analysis name - used for naming job-specific folder on Genboree and for naming certain files in your results. Default uses timestamp to indicate when the job was submitted (this is a good idea!).
genomeVersion genome version of your output database / your data. Default is hg19. Other supported genomes are hg38 and mm10.

IMPORTANT NOTES

You need to specify an analysisName in your manifest file, as this setting provides valuable information for organizing your submission.
We recommend that you structure your analysisName in the following way:

Make sure that you specify "genomeVersion": "mm10" or "genomeVersion": "hg38" if your samples use one of these alternative reference genomes (hg19 is the default).


Now, our (completed) manifest file looks like the following:

 1 {
 2   "studyName": "CSF vs. Serum Parkinson's June 2017",
 3   "userLogin": "william_thistle",
 4   "md5CheckSum": "b9355772f35516837a06666f7c56afdd",
 5   "runMetadataFileName": "testRun.metadata.tsv",
 6   "submissionMetadataFileName": "testSubmissions.metadata.tsv",
 7   "studyMetadataFileName": "testStudies.metadata.tsv",
 8   "experimentMetadataFileName": "testExperiments.metadata.tsv",
 9   "biosampleMetadataFileName": "testBiosamples.metadata.tsv",
10   "donorMetadataFileName": "testDonors.metadata.tsv",
11   "qPCRTargetsMetadataFileName": "testqPCRTargets.metadata.tsv",
12   "settings":
13   { 
14     "analysisName": "AMILO1-Serum_vs_Plasma_Controls-2017-06-01" 
15   }
16 }

If you remove or add a setting, make sure that your terms are still separated sensibly by commas.
For example, if I added another property like genomeVersion after analysisName, I would put a comma after analysisName (but no comma after genomeVersion).

You can download this example manifest file here.

Step 7. Validate and Save Your Manifest File

After you've finished working on your manifest file, you should make sure that the file is formatted correctly by using a JSON validator like JSONLint.
Simply copy-paste your manifest content into the text box and then click "Validate" to see if there are any errors in your manifest file.
If there are any errors, use the error messages provided by the JSON validator to fix your manifest file.
You're now done with creating your manifest file! Save it a final time and you're ready to upload your submission for processing.

Summary

  1. Download template manifest file
  2. Open your manifest file
  3. Compute the MD5 checksum of your data archive (not your manifest file, not your metadata archive) if necessary
  4. Fill out the top section of your manifest
  5. Fill out the settings section of your manifest
  6. Validate and save your manifest file

Prepare Your qPCR Metadata Archive

'Metadata' refers to descriptive information and protocols for the overall study, the experiments performed, and the individual samples that are part of your submission.
This information is supplied by completing one file for each type of metadata and then archiving those files in your metadata archive.
Submitting your metadata is very important for: Your metadata archive will contain seven different files, with one optional file:

We will go step-by-step below to create these files.

Step 1. Open Your Reference Materials (Introduction)

Step 2. Prepare Your Submissions Metadata File

Step 3. Prepare Your Studies Metadata File

Step 4. Prepare Your Experiments Metadata File

Step 5. Prepare Your Donors Metadata File

Step 6. Prepare Your Biosamples Metadata File

Step 7. Prepare Your qPCR Runs Metadata File

Step 8. Prepare Your qPCR Targets Metadata File

Step 9. Move All Metadata Files to Same Directory

Step 10. Validate the Metadata Files

Step 11. Create Metadata Archive

Summary

  1. Open your reference materials
  2. Complete each metadata file type in turn (a total of seven different metadata file types)
  3. Move all completed metadata files to the same directory
  4. Compress all metadata files into one archive (with qPCR_metadata suffix and with same prefix as the data archive you created earlier)

Prepare Your Runs Metadata File


Prepare Your qPCR Targets Metadata File


Prepare Your Runs Metadata File


Prepare Your Studies Metadata File


Prepare Your Submissions Metadata File


Processing Your Files

After you upload your three files (manifest file, metadata archive, data archive) to our FTP server, we will begin processing your files automatically.

Troubleshooting a Failed Submission

Locating Your Finished Submission on the exRNA Atlas


Processing Your longRNAseq Files

After you upload your two or three files (manifest file, metadata archive, and data archive (optional!)) to our FTP server, we will begin processing your files automatically.

Troubleshooting a Failed Submission

Locating Your Finished Submission on the exRNA Atlas


Processing Your qPCR Files

After you upload your two or three files (manifest file, metadata archive, and data archive (optional!)) to our FTP server, we will begin processing your files automatically.

Troubleshooting a Failed Submission

Locating Your Finished Submission on the exRNA Atlas


qPCR Data Submission

Data Format

The qPCR data file should be in tab-separated value format, with the ID_REF value column followed by a number of Sample columns.

ID_REF column: Must contain unique identifiers
SAMPLE columns: Should report non-normalized data. i.e. raw Ct target values.

IMPORTANT NOTE
SAMPLE column header names must match Sample name column in the Biosample Metadata document.

EXAMPLE:

ID_REF SAMPLE1 SAMPLE2
A01 35 35
A02 29.35 28.19
B01 29.58 28.79
B02 28.04 25.92

Metadata Format

All metadata documents should follow the guidelines provided in this Wiki


Small exRNA Sequencing Data and Metadata Submission Guidelines

These are the steps involved in submitting your small exRNA-seq data and metadata to the DCC.

0. Creating an FTP Account

1. Prepare Your Data Archive

2. Prepare Your Metadata Archive

Download metadata template or example documents to prepare your own metadata documents

3. Prepare Your Manifest File

4. Upload Submission to the DCC using FTP Server


FTP Server Details

Files Needed for Data Submission

exRNA Metadata Standards

exRNA Metadata Documents


RT-qPCR Data Submission to DCC

Quantitative PCR with reverse transcription is one of the commonly used assay in addition to RNA-sequencing to characterize extracellular RNAs.
This Wiki page includes instructions on how to submit your RT-qPCR data with accompanying metadata to the Data Coordination Center (DCC).

This tutorial will walk you through the entire process of creating an FTP account, formatting and submitting your data and metadata properly,
and then viewing your data in the exRNA Atlas. All submitted samples will be manually curated by the DCC Staff. This is a temporary curation/validation step,
until the FTP Data/Metadata Submission pipeline for qPCR data is made available.

Step 0: Getting an FTP Account on the Genboree FTP Server

Creating Your FTP Account

Files Needed for Data Submission

Your submission will consist of two different files:

IMPORTANT NOTE
Both files must have the same basic file name, other than the data archive file name ending in _data and the metadata archive file name ending in _metadata.
This will be explained in more detail below, but your files will look something like this:

Here, I've chosen the name "qPCR_samples" for my submission. This is just an example - you should give a more descriptive name in your actual submission ("alzheimersDiseaseMay2016-UH2_data.zip", for example).

Step 1: Preparing Your Data Archive

Prepare Your qPCR Data Archive

Step 2: Preparing Your Metadata Archive

Prepare Your Metadata Archive
You can follow the instructions given in the above link to prepare your metadata documents. Ensure that your metadata contains information relevant to the qPCR assay i.e. all relevant qPCR metadata fields in each collection should be filled out.

Download Metadata Models, Document Templates and Example Metadata Documents

This section provides templates for each document type that will allow you to easily and quickly fill out your TSV files using Microsoft Excel or any simple word processor.
LAST UPDATED: June 22nd, 2016

Schema Description Doc Template For Editing in Excel User Submitted qPCR Metadata Examples Template in GenboreeKB UI
Submissions TABBED Model Information about PI / submitter associated with submission. Submission Template EXR-JSAUG1UH2001-SU Submission KB Template
Studies TABBED Model A study groups together experiments or analyses for public data release purposes. Study Template EXR-JSAUG1UH2001-ST Study KB Template
Runs TABBED Model A run contains sequencing reads submitted in data files. Run Template EXR-JSAUG1UH2001-RU qPCR Run KB Template
Experiments TABBED Model An experiment contains instrument and library preparation information and groups together one or more runs. Experiment Template EXR-JSAUG1UH2001-EX qPCR Assay KB Template
Donors TABBED Model Information about each individual donor who contributed biosamples. Donor Template Donor Multi-tabbed Donor KB Template
Biosamples TABBED Model Detailed information about the sequenced sample, biofluid source, etc. Samples can be used in any number of experiments. Biosample Template ; Multi-tabbed Format Biosamples Multi-tabbed Format Biosample Biofluid KB Template Biosample Cell Culture Supernatant KB Template
Analyses TABBED Model An analysis contains secondary analysis results. Analysis Template EXR-JSAUG1UH2001-AN qPCR Analysis KB Template
qPCR Targets TABBED Model The qPCR Targets document contains the list of all targets and the corresponding Cq values for each biosample. qPCR Targets Template qPCR Targets Multi-tabbed qPCR Targets KB Template

Step 3: Uploading Your Submission to the FTP Server for Validation

Upload Submission to the DCC


Step 4: Viewing Your Results

Viewing Your qPCR Data in the exRNA Atlas


Miscellaneous Tips and Tricks

Below, you'll find some useful tips and tricks for creating your submission.

Creating an Archive

Creating an Archive

Learning How to Use the Terminal

If you need help navigating the terminal (and want to learn some basic Linux/OSX commands), the following links will be useful:


Overview

Introduction

The exRNA Atlas contains a number of different analysis tools for analyzing Atlas RNA-seq data:

Below, we will demonstrate how to use these tools on Atlas data and see your analysis results in the Atlas.

Overview of Analysis Tools

Before we begin describing how to use the analysis tools, we'll go over what each tool does in more detail.
Currently, all analysis tools work solely with RNA-seq profiles.

XDec DESeq2 Dimensionality Reduction Plotting Tool Generate Summary Report

Viewing Public Analysis Results

Before running your own analyses, you may be interested in viewing the Atlas' public analysis results.

To view the Atlas' public analysis results, you can click the Analysis Results button in the Atlas navigation bar and then click the Public Analysis Results button.
You will then be taken to a page where you can click between different tabs, each corresponding to a different tool.

When you click a given tab, you will see the public analysis results associated with that tool:

You can see an example of the public analysis results page below:

To better understand the output for a given tool, please see the "Understanding Your DESeq2 Results", "Understanding Your Dimensionality Reduction Plotting Tool Results", and "Understanding Your Generate Summary Report Results" sections below.

Running Your Own Analyses

Step 1: Selecting Your Samples of Interest

The first step to running an analysis is selecting your samples of interest.
We recommend using the faceted charts or selecting a dataset from the Datasets page to select your samples (all tools may not be available for other types of grids).

Below, you can see an example of how one would select samples via the faceted charts:

And here is an example of how one would select a set of samples via the Datasets page:

After you have generated your grid, you will need to select the specific samples you want to analyze.

Below, you can see an example where I've selected 4 samples in my samples grid:

Step 2: Selecting and Running a Analysis Tool

After you've selected your samples, you'll need to pick out a tool to run on those samples.
You can click the "Analyze Selected Samples" button to see available tools.

After choosing a tool, you will be prompted to log into your Genboree account (unless you are already logged in).

After you've logged in, you'll be prompted to provide settings for your analysis run.
  1. First, you'll need to select a Group and Database in which to store your output files.
    Each Genboree account starts with a Group (named after your username), and we will offer to create a Database for you (named "Exrna-atlas Output") if you don't have one.
  2. Next, you'll need to provide an Analysis Name for your analysis run - this name will be used to organize your analysis results, so picking an informative name is a good idea!
  3. Finally, some tools will require additional settings - for example, DESeq2 will require you to put in a factor name and two factor levels of interest.

When you're ready to submit your analysis, click the Submit Analysis button.
After a moment, you will be provided an analysis job ID. You will receive an email when your analysis run is complete.

Step 3: Viewing Your Analysis Results

To view your analysis results, you can click the Analysis Results button in the Atlas navigation bar and then click the My Analysis Results button.
You will then be taken to a page where you can click between different tabs, each corresponding to a different tool.

When you click a given tab, you will see any analysis results associated with that tool:

You can see an example of an analysis results page below:

To better understand the output for a given tool, please see the "Understanding Your DESeq2 Results" and "Understanding Your Generate Summary Report Results" sections below.

Understanding Your Results

Understanding Your XDec Results

Output from XDec includes:

To learn more about XDec and how to interpret your results, read the Cell paper "ExRNA Atlas Analysis Reveals Distinct Extracellular RNA Cargo Types and Their Carriers Present Across Human Biofluids" (Murillo et al., 2019).

Understanding Your DESeq2 Results

When you click to view your DESeq2 results, a new page will open up containing differentially expressed miRNAs for the selected Atlas data.
Each row corresponds to a given miRNA, and each column is explained below:

[1] Love, M. I., Anders, S., Kim V., & Huber W. (2017, Aug 9). RNA-seq workflow: gene-level exploratory analysis and differential expression.
Retrieved from http://www.bioconductor.org/help/workflows/rnaseqGene/

By default, the table is sorted by adjusted p-value, but you can sort by any of the columns.
In addition, you can perform downstream analysis on selected miRNAs of interest by clicking the Analyze Selected miRNAs button (highlighted in red below) above the table.

See descriptions of all available downstream analysis tools below.

Pathway Finder

You can see what the Pathway Finder interface looks like below:

Understanding Your Dimensionality Reduction Plotting Tool Results

When you click to view your Dimensionality Reduction Plotting Tool results, a new page will open up containing an interface for visualizing the expression of different ncRNAs in the selected Atlas data.
On the left side of the screen, you will see the Control Panel and Filtering Panel that allow you to configure your visualization.

Within the Control Panel, you will see the following settings: Within the Filtering Panel, you will see the following settings:

After you've selected your settings, you can click the Make New Plot button on the right side of the screen to generate a new visualization based on your current Control Panel and Filtering Panel settings.
You can then download a PDF of your current visualization by clicking the Download Plot button.

Understanding Your Generate Summary Report Results

When you click to view your Generate Summary Report results, you will download an archive containing a variety of summary files describing the selected Atlas data.
Descriptions of the summary files can be found below:

File Name Description of File
QC Data
[analysisName]_exceRpt_DiagnosticPlots.pdf All diagnostic plots automatically generated by the tool
[analysisName]_exceRpt_readMappingSummary.txt Read-alignment summary including total counts for each library
[analysisName]_exceRpt_ReadLengths.txt Read-lengths (after 3' adapters/barcodes are removed)
[analysisName]_exceRpt_QCresults.txt QC statistics for all samples
Raw Transcriptome Quantifications
[analysisName]_exceRpt_miRNA_ReadCounts.txt miRNA read-counts quantifications
[analysisName]_exceRpt_tRNA_ReadCounts.txt tRNA read-counts quantifications
[analysisName]_exceRpt_piRNA_ReadCounts.txt piRNA read-counts quantifications
[analysisName]_exceRpt_gencode_ReadCounts.txt gencode read-counts quantifications
[analysisName]_exceRpt_circularRNA_ReadCounts.txt circularRNA read-count quantifications
[analysisName]_exceRpt_biotypeCounts.txt biotype read-count quantifications
[analysisName]_exceRpt_exogenous_miRNA_ReadCounts.txt exogenous miRNA read-counts quantifications
Normalized Transcriptome Quantifications
[analysisName]_exceRpt_miRNA_ReadsPerMillion.txt miRNA RPM quantifications
[analysisName]_exceRpt_tRNA_ReadsPerMillion.txt tRNA RPM quantifications
[analysisName]_exceRpt_piRNA_ReadsPerMillion.txt piRNA RPM quantifications
[analysisName]_exceRpt_gencode_ReadsPerMillion.txt gencode RPM quantifications
[analysisName]_exceRpt_circularRNA_ReadsPerMillion.txt circularRNA RPM quantifications
[analysisName]_exceRpt_exogenous_miRNA_ReadsPerMillion.txt exogenous miRNA RPM quantifications
Exogenous Genomic Taxonomies
[analysisName]_exceRpt_exogenousGenomes_taxonomyCumulative_ReadCounts.txt cumulative taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousGenomes_taxonomyCumulative_ReadsPerMillion.txt cumulative taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousGenomes_taxonomySpecific_ReadCounts.txt specific taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousGenomes_taxonomySpecific_ReadsPerMillion.txt specific taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousGenomes_TaxonomyTrees_aggregateSamples.pdf visualized taxonomy tree for samples, aggregated
[analysisName]_exceRpt_exogenousGenomes_TaxonomyTrees_perSample.pdf visualized taxonomy trees for each sample
Exogenous rRNA Taxonomies
[analysisName]_exceRpt_exogenousRibosomal_taxonomyCumulative_ReadCounts.txt cumulative taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousRibosomal_taxonomyCumulative_ReadsPerMillion.txt cumulative taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousRibosomal_taxonomySpecific_ReadCounts.txt specific taxonomy read-count quantifications
[analysisName]_exceRpt_exogenousRibosomal_taxonomySpecific_ReadsPerMillion.txt specific taxonomy RPM quantifications
[analysisName]_exceRpt_exogenousRibosomal_TaxonomyTrees_aggregateSamples.pdf visualized taxonomy tree for samples, aggregated
[analysisName]_exceRpt_exogenousRibosomal_TaxonomyTrees_perSample.pdf visualized taxonomy trees for each sample
R Objects
[analysisName]_exceRpt_smallRNAQuants_ReadCounts.RData All raw data (binary R object)
[analysisName]_exceRpt_smallRNAQuants_ReadsPerMillion.RData All normalized data (binary R object)
Other
[analysisName]_exceRpt_sampleGroupDefinitions.txt Information about sample groups (not used by Atlas)

Below, you can see some example plots from the Diagnostic Plots PDF referenced above.


Saving metadata documents

Microsoft Excel in Windows

Select "Save As" from the menubar.
Navigate to the folder where you would like to save your metadata document.
Provide a file name for your document. Remember, file names end with .metadata.tsv.
Select the option "Text (Tab delimited)" from the pull down menu for "Save as type" and press OK.

Microsoft Excel in Mac

To save your metadata documents as a properly formatted tab-separated value file, click "Save" and
select the option to save as "Windows Formatted Text".
This option saves the file as a tab-separated value file without any special characters.

LibreOffice Calc

Select "Save As", choose "All Format", and then choose "Test CSV (.csv)".
You will see a dialog box titled "Export Text File".
Select {Tab} from the pull down menu for "Field delimiter" and select OK.

Your document will be saved as a tab-delimited text file.

Sanity Check the TSV file

To ensure there are no special characters in your metadata document after following the above mentioned
methods to save your file, open the document in any text editor like

Check if the document is properly formatted, i.e. columns are separated by a tab character and
the document does not have any characters like ^M, etc.


Troubleshooting


Understanding the Nested Tabbed Format

The Symbol -

#property value
-- Biological Fluid
--- Biofluid Name serum
--- Collection Details
---- Sample Collection Method venipuncture

The Symbol *

#property value
* Authors
*- Author Name NAME1
*- Author Name NAME2
*- Author Name NAME3
*- Author Name NAME4

Upload longRNAseq Submission to the DCC using FTP Server

Below, we give two different ways of uploading your files:

Please contact us at brl-exrna@bcm.edu if your data archive is over 100GBs.

Uploading Submission via the LFTP Command Line Client (Linux / Unix / Mac)

Step 1. Setup

Step 2. Uploading Your Files

Example

Imagine that I had the following set of three files:

Furthermore, all 3 files are stored at the following location on my local computer: /home/myHome/myDataDir/smallRNASeqData.
I would perform the following commands to upload all three files to the FTP server (replacing PICODE with whatever my PI code is):

cd /home/myHome/myDataDir/longRNASeqData
lftp ftps://ftps.genboree.org -u username
# enter password
cd exrna-PICODE/
cd inbox/
put test_longRNAseq.manifest.json test_longRNAseq_metadata.zip test_longRNAseq_data.zip
ls
exit

Please note that any lines that begin with # are comments and are not actual commands that you should type!
For example, you shouldn't actually type "# enter login name and password" - that's just me informing you that
you'll need to enter your password after the "lftp ftps://ftps.genboree.org -u <user name>" command.

Uploading Submission via the FileZilla FTP Client

Step 1. Setup

Step 2. Uploading Your Files

Resuming File Upload (If Upload Fails)

If your transfer fails before it completes, you will need to resume it from the point where it failed.

(If the file transfer completes after resuming from a previous transfer and the MD5 does not match to what you have provided, please remove the file and start from step 2 again.)

Send an email to notify us

Sending the data via a hard drive

Please coordinate with us at brl-exrna@bcm.edu and provide the following information prior to sending the hard drive Copy the data archive, metadata, and manifest into the external hard drive

Notify us that you are sending the hard drive by emailing us at brl-exrna@bcm.edu with the tracking number and the return information.


Upload qPCR Submission to the DCC using FTP Server

Below, we give two different ways of uploading your files:

Uploading Submission via the FileZilla FTP Client

Step 1. Setup

Step 2. Uploading Your Files

Resuming File Upload (If Upload Fails)

If your transfer fails before it completes, you can easily resume it from the point where it failed.

Uploading Submission via the FTP Command Line Client (Linux / Unix / Mac)

Step 1. Setup

Step 2. Uploading Your Files

Example

Imagine that I had the following set of three files:

Furthermore, all three files are stored at the following location on my local computer: /home/myHome/myDataDir/qPCRData.
I would perform the following commands to upload all three files to the FTP server (replacing PICODE with whatever my PI code is):

cd /home/myHome/myDataDir/qPCRData
ftp ftp.genboree.org
# enter login name and password
bin
cd exrna-PICODE/
cd inbox/
prompt
mput test_qPCR.manifest.json test_qPCR_metadata.zip test_qPCR_data.zip
dir
bye

Please note that any lines that begin with # are comments and are not actual commands that you should type!
For example, you shouldn't actually type "# enter login name and password" - that's just me informing you that
you'll need to enter your login name and password after the "ftp ftp.genboree.org" command.

Resuming File Uploads (If Upload Fails)

If your upload fails and you want to resume it, you will need to reconnect to the FTP server and navigate back to your
upload directory (remember to type "bin" and "prompt" just like before!).
ftp ftp.genboree.org
# enter login name and password
bin
cd exrna-PICODE/
cd inbox/
prompt
dir
# to restart uploading a partially transferred file with file size 1000 bytes
restart 1000
put
FILENAME
FILENAME
dir
bye

Send an email to notify us


Below, we give two different ways of uploading your files:

Please contact us at brl-exrna@bcm.edu if your data archive is over 100GBs.

Uploading Submission via the LFTP Command Line Client (Linux / Unix / Mac)

Step 1. Setup

Step 2. Uploading Your Files

Example

Imagine that I had the following set of three files:

Furthermore, all 3 files are stored at the following location on my local computer: /home/myHome/myDataDir/smallRNASeqData.
I would perform the following commands to upload all three files to the FTP server (replacing PICODE with whatever my PI code is):

cd /home/myHome/myDataDir/smallRNASeqData
lftp ftps://ftps.genboree.org -u username
# enter password
cd exrna-PICODE/
cd inbox/
put test.manifest.json test_metadata.zip test_data.zip
ls
exit

Please note that any lines that begin with # are comments and are not actual commands that you should type!
For example, you shouldn't actually type "# enter login name and password" - that's just me informing you that
you'll need to enter your password after the "lftp ftps://ftps.genboree.org -u <user name>" command.

Uploading Submission via the FileZilla FTP Client

Step 1. Setup

Step 2. Uploading Your Files

Resuming File Upload (If Upload Fails)

If your transfer fails before it completes, you can easily resume it from the point where it failed.

(If the file transfer completes after resuming from a previous transfer and the MD5 does not match to what you have provided, please remove the file and start from step 2 again.)

Send an email to notify us

Sending the data via a hard drive

Please coordinate with us at brl-exrna@bcm.edu and provide the following information prior to sending the hard drive Copy the data archive, metadata, and manifest into the external hard drive

Notify us that you are sending the hard drive by emailing us at brl-exrna@bcm.edu with the tracking number and the return information.


Introduction to the ncRNA Search Bar

The ncRNA search bar is designed to drill down on an ncRNA-specific level into the Atlas data.
For example, imagine I was very interested in the mature miRNAs hsa-miR-320a and hsa-miR-100-5p.
It would be nice if I could learn more about those mature miRNAs in the context of the Atlas.
Below, we'll learn exactly how to do that.

You can find the ncRNA search bar near the top of the Atlas home page. There are many ways to reach it:

Below, you can see a picture of the ncRNA search bar (boxed in red):

Currently, the ncRNA search bar supports mature miRNAs, tRNAs, and piRNAs.
We recommend the following steps when learning how to use the search bar:
  1. Click the options icon directly to the right of the text box.
  2. Once you've selected your type of ncRNA, you can type or paste your identifiers of interest into the text box.
  3. Once you've written your identifiers of interest, you can click the magnifying glass (or hit enter) to perform your search.

Below, we can see that I've typed three mature miRNA IDs into the search bar:

Two of these mature miRNA IDs are valid (hsa-let-7b-3p and hsa-miR-101-5p), while one is invalid (test).

When we click search, we'll see a page like this:

You can see that the page presents some useful information that will help us format our search correctly.
You can use this information to fix your incorrect identifiers, or, if preferred, just directly submit a search with your correct identifiers.

Tools in the ncRNA Search Bar

Once you've submitted a properly formatted request, a results page will be displayed.
We will break down the results pages for the different databases below.

Atlas Census

Introduction

When you perform a search using the Atlas Census database, your results will consist of a table that summarize the frequency of your selected ncRNAs in the exRNA Atlas data.

The parameters listed below will normally be displayed above the table. However, if your browser window isn't large enough to fit the parameters,
a hamburger menu will be made available in the upper right corner. Simply click the hamburger icon to reveal the different parameters.

Parameters for Adjusting Stringency for Detection

There are two parameters for adjusting stringency for detection of your ncRNAs:

Parameters for Adjusting Sample Subsets

You can also pick different subsets of the Atlas data for your table by using the Sample Type option.

Downstream Analysis (for Mature miRNAs)

Finally, if you searched for mature miRNAs (as opposed to tRNAs or piRNAs), you can perform downstream analysis on those mature miRNAs.
First, select your miRNAs of interest (via the checkboxes on the left side of the table).
You can then click the Analyze Selected miRNAs button above the table to see the different downstream analysis tools.


Viewing All Biosamples in Biosample Partition Grid

As an alternative to the facet search, you can also view all biosamples in one of our biosample partition grids.
We have two different biosample partition grids available: Biofluid vs Condition and Biofluid vs Assay Type.
You can access these grids in two different ways:

First, you can click Select Profiles in the navigation bar and then click Biofluid vs Condition Grid or Biofluid vs Assay Type Grid.

Second, you can use the links on the front page in the Browse exRNA Profiles - Alternative Options panel:

For example, see the Biofluid vs. Condition grid below:

Each cell in this grid indicates the total number of biosamples collected and profiled for exRNAs from a biofluid-condition combination.
If you click the number in a given cell, you will be able to see key metadata about all the biosamples that meet the biofluid-condition criteria given for that cell.

The Biofluid vs. Assay Type grid is very similar except its columns are assay types instead of conditions.

Once you click the number in a given cell, a new grid will be displayed that contains information about associated samples.

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

Actions

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools
present in the Genboree Workbench. To learn more about this option, view this tutorial.


Viewing Atlas Statistics

You can find various Atlas statistics in the Atlas Statistics panel near the bottom of the Atlas homepage. You can reach the Atlas homepage by:

On the left side of the panel, you can see various bar charts that describe the data in the Atlas.

On the right side of the panel, you can see a breakdown of how much data has been deposited into the Atlas over various time frames.


Viewing Biosamples in Biosample Partition Grid

As an alternative to the facet search, you can also view all biosamples in one of our biosample partition grids.
We have two different biosample partition grids available: Biofluid vs Condition and Biofluid vs Assay Type.
You can access these grids in two different ways:

First, you can click Select Profiles in the navigation bar and then click Biofluid vs Condition Grid or Biofluid vs Assay Type Grid.

Second, you can use the links on the front page in the Browse exRNA Profiles - Alternative Options panel:

For example, see the Biofluid vs. Condition grid below:

Each cell in this grid indicates the total number of biosamples collected and profiled for exRNAs from a biofluid-condition combination.
If you click the number in a given cell, you will be able to see key metadata about all the biosamples that meet the biofluid-condition criteria given for that cell.

The Biofluid vs. Assay Type grid is very similar except its columns are assay types instead of conditions.

Once you click the number in a given cell, a new grid will be displayed that contains information about associated samples.

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.


Viewing exRNA Profiling Datasets

All profiles that are submitted to the exRNA Atlas are part of a dataset.
Each dataset is associated with a given study that focuses on some topic (detection of biomarkers associated with gastric cancer, for example).
There are two different ways of viewing datasets on the exRNA Atlas.

Dataset Submissions Table

First, on the Atlas home page, you can find the Dataset Submissions table.
This table provides a summary-level description for each dataset submission to the Atlas.

The table, by default, is organized by PI (last) name, but you can sort (ascending or descending) by most of the columns.
Clicking the analysis ID for a given dataset in the Study Title column will take you to its card on the stand-alone Datasets page (described below).
Clicking the green check mark for a given dataset in the Published? column will open the publication associated with that dataset.
Clicking the name of an external database (dbGaP, GEO, SRA) for a given dataset in the Other Databases column will open the associated page for that dataset in the external database.
You can click Load More to load an additional 5 datasets, or click Load All to load all datasets at once.
If you want the table to return to default, you can then click the Return to Default button (only available once you've loaded additional datasets).

Datasets Page

If you want to view datasets in more detail, you can visit the stand-alone Datasets page.
You can reach this page in three different ways:
  1. Click the Datasets button in the navigation bar at the top of any Atlas page
  2. Click the exRNA Profiling Datasets link in the Browse exRNA Profiles - Alternative Options panel near the bottom of the Atlas home page
  3. Click the analysis ID associated with a given dataset in the Dataset Submissions table

Each card in the layout above contains information about a dataset in the exRNA Atlas:

Note that not all options will be available for each card.

RNA Profile Grid

By clicking the Analysis ID associated with a given dataset, you can pull up a grid that contains read counts for that dataset.
The grid will also contain various downloads for each sample in the dataset.

In the first picture above, we see the read counts associated with different exceRpt mapping stages for each sample.

In the second picture above, we see the following information and links:

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.

Sample Metadata Grid

By clicking the Samples badge associated with a given dataset, you can pull up a grid that contains sample metadata for that dataset.

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.


Viewing Selected Biosamples in Grid via Faceted Search

It is easy to search for specific types of biosamples via our chart search. There are three different categories which you can use for your search:

You can select exRNA profiles by clicking the slices or names of facets in the charts above.

For example, if I wanted to search for biosamples that were either plasma or serum and were tagged as Alzheimer's disease, I would click the "Plasma", "Serum", and "Alzheimer's" facets.
Then, in order to complete the search, I would click the Search icon in the floating menubar.

This search will create a grid that looks like the following:

This search summary results grid will display key metadata about the relevant biosamples. Tips and tricks:

Viewing Selected Biosamples in Grid via Faceted Charts

You can find the faceted search on the Atlas home page. There are many ways to reach it:

It is easy to select specific types of biosamples via our faceted donut charts. There are four different categories which you can use for your selection:

You can select exRNA profiles by clicking the slices or names of facets in the charts.

Example: If I wanted to select biosamples that were either plasma or serum and were tagged as Alzheimer's disease, I would click the "Alzheimer's", "Plasma", and "Serum" facets.
Because 52 samples (as of July 28th, 2016) qualify for these facets, (52 selected) will be displayed in yellow above the faceted charts.
Then, in order to generate my grid, I would click the icon in the floating menubar.

Clicking this icon will create a grid that looks like the following (split up into two separate pictures, each depicting half of the grid):

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.


Viewing Selected Biosamples in Grid via Faceted Search

You can find the faceted search on the Atlas home page. There are many ways to reach it:

It is easy to select specific types of biosamples via our faceted donut charts. There are four different categories which you can use for your selection:

You can select exRNA profiles by clicking the slices or names of facets in the charts.

Example: If I wanted to select biosamples that were either plasma or serum and were tagged as Alzheimer's disease, I would click the "Alzheimer's", "Plasma", and "Serum" facets.
Because 52 samples (as of July 28th, 2016) qualify for these facets, (52 selected) will be displayed in yellow above the faceted charts.
Then, in order to generate my grid, I would click the icon in the floating menubar.

Clicking this icon will create a grid that looks like the following (split up into two separate pictures, each depicting half of the grid):

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

Actions

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools
present in the Genboree Workbench. To learn more about this option, view this tutorial.


Viewing Selected Biosamples in Grid via Linear Tree

You can use our dendrogram-like partition diagram ("linear tree") to interactively drill down into different subsets of biosamples.
There are two ways of reaching the linear tree page:

After you open the linear tree drill-down page, you will see a diagram like the following:

Click on a collapsed node to "drill down" along its path in the Anatomical Locations » Biofluids » Conditions facet sequence.

Your selected path is always clearly highlighted:

Clicking the icon in the floating menubar will open the search results for your particular drill-down path (split up into two separate pictures, each depicting half of the grid):

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.


N/A


Viewing Summary Barcharts of exRNA Profiling Datasets

On the main Atlas landing page, there are several different barcharts in the Atlas Statistics section that summarize the exRNA profiling datasets held within the Atlas.
Different summary metrics include:

An example barchart can be found below:

Hovering over any of the bars will display the percentage (y-axis) associated with that bar:


Viewing Summary Bar Charts of exRNA Profiling Datasets

On the main Atlas landing page, there are several different bar charts in the Atlas Statistics section that summarize the exRNA profiling datasets held within the Atlas.
Different summary metrics include:

An example bar chart can be found below:

Hovering over any of the bars will display the percentage (y-axis) associated with that bar:


Viewing Summary Bar Graphs of exRNA Profiling Datasets

On the main Atlas landing page, there are several different bar graphs in the Atlas Statistics section that summarize the exRNA profiling datasets held within the Atlas.
Different summary metrics include:

An example bar graph can be found below:

Hovering over any of the bars will display the percentage (y-axis) associated with that bar:


Viewing Summary Grid of DCC Submissions

The DCC Submission Summary table displays usage of exRNA profiling data analysis tools by both ERC consortium members as well as other members of the scientific community.
In order to view the grid, click the relevant thumbnail on the main Atlas page:

When you click this thumbnail, you will see a grid like the following:

This grid, by default, groups submissions by submission month / year.
However, if you want to group submissions by RFA Title, you can click the Group: RFA Title tab at the top of the grid.


Viewing exRNA Profiling Datasets

All profiles that are submitted to the exRNA Atlas are part of a dataset.
Each dataset is associated with a given study that focuses on some topic (detection of biomarkers associated with gastric cancer, for example).
There are two different ways of viewing datasets on the exRNA Atlas.

Dataset Submissions Table

First, on the Atlas home page, you can find the Dataset Submissions table.
This table provides a summary-level description for each dataset submission to the Atlas.

The table, by default, is organized by PI (last) name, but you can sort (ascending or descending) by most of the columns.
Clicking the analysis ID for a given dataset in the Study Title column will take you to its card on the stand-alone Datasets page (described below).
Clicking the green check mark for a given dataset in the Published? column will open the publication associated with that dataset.
Clicking the name of an external database (dbGaP, GEO, SRA) for a given dataset in the Other Databases column will open the associated page for that dataset in the external database.
You can click Load More to load an additional 5 datasets, or click Load All to load all datasets at once.
If you want the table to return to default, you can then click the Return to Default button (only available once you've loaded additional datasets).

Datasets Page

If you want to view datasets in more detail, you can visit the stand-alone Datasets page.
You can reach this page in three different ways:
  1. Click the Datasets button in the navigation bar at the top of any Atlas page
  2. Click the exRNA Profiling Datasets link in the Browse exRNA Profiles - Alternative Options panel near the bottom of the Atlas home page
  3. Click the analysis ID associated with a given dataset in the Dataset Submissions table

Each card in the layout above contains information about a dataset in the exRNA Atlas:

Note that not all options will be available for each card.

RNA Profile Grid

By clicking the Analysis ID associated with a given dataset, you can pull up a grid that contains read counts for that dataset.
The grid will also contain various downloads for each sample in the dataset.

In the first picture above, we see the read counts associated with different exceRpt mapping stages for each sample.

In the second picture above, we see the following information and links:

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.

Sample Metadata Grid

By clicking the Samples badge associated with a given dataset, you can pull up a grid that contains sample metadata for that dataset.

In the first picture above (which displays the first half of the grid), we see each biosample's name as well as some key metadata properties
of each biosample (Condition, Anatomical Location, Biofluid Name, and exRNA Source).

In the second picture above (which displays the second half of the grid), we see the following information and links:

ERCC Quality Standards?

Download Data

Download Advanced Results

Download Metadata

RNA Profile

External References

There are three additional buttons at the top of the grid.
First, the Back to Home Page button will take you back to the exRNA Atlas landing page.
Second, the Download All Samples button will allow you to download result files in bulk.
To learn more about this option, view this tutorial.
Third, the Analyze Selected Samples button will allow you to take samples from the exRNA Atlas and feed them into downstream and comparative analysis tools.
To learn more about this option, view this tutorial.


Viewing Your Results

After you upload your files to our FTP server, we will process your files automatically.

Locating Your Data Results on the Genboree Workbench

Important Notes:

Locating Your Data Results on the FTP Server

Preliminary Steps for New Users

Locating Your Result Files

Locating Your Original Submission


Understanding Your Data Results


Locating Your Metadata Results on the exRNA GenboreeKB


Copying Your Submission to the Public Atlas