Overview of Data & Metadata Submission to the DCC (via FTP Pipeline)

Prior to Your Submission
Step 0: Create an FTP Account on the Genboree FTP Server
Small RNA-seq Data Submission Pipeline
Files Needed for Data Submission
Step 1: Preparing Your Data Archive
Step 2: Preparing Your Metadata Archive
Step 3: Preparing Your Manifest File
Step 4: Uploading Your Submission to the FTP Server for Processing
Step 5: Processing Your Files
Long RNA-seq Data Submission Pipeline
Files Needed for longRNAseq Data Submission
Step 1: Preparing Your longRNAseq Data Archive
Step 2: Preparing Your longRNAseq Metadata Archive
Step 3: Preparing Your longRNAseq Manifest File
Step 4: Uploading longRNAseq Submission to the FTP Server for Processing
Step 5: Processing Your longRNAseq Files
qPCR Data Submission
Files Needed for qPCR Data Submission
Step 1: Preparing Your qPCR Data Archive
Step 2: Preparing Your qPCR Metadata Archive
Step 3: Preparing Your qPCR Manifest File
Step 4: Uploading qPCR Submission to the FTP Server for Processing
Step 5: Processing qPCR Your Files
Submission to a Public Repository
Miscellaneous Tips and Tricks
Creating an Archive
Learning How to Use the Terminal

This Wiki page includes instructions on how to submit your data (with accompanying metadata) to the Data Coordination Center (DCC)
using the Genboree FTP Data Submission Pipeline.

If the dataset you are submitting is part of a new grant (ex. 4UH3TR000906-03) please email the grant number to DCC at brl-exrna@bcm.edu

If you're submitting small RNA-seq data, please follow the steps in the "Small RNA-seq Data Submission Pipeline" section.
If you're submitting long RNA-seq data, please follow the steps in the "Long RNA-seq Data Submission Pipeline" section.
If you're submitting qPCR data, please follow the steps in the "qPCR Data Submission" section.

Please contact us at brl-exrna@bcm.edu for guidance if you have a large data set (> 100GBs).

Prior to Your Submission¶

This tutorial will walk you through the entire process of creating an FTP account, formatting and submitting your data and metadata properly,
and then seeing your dataset on the Atlas.

Step 0: Create an FTP Account on the Genboree FTP Server¶

Creating Your FTP Account

Small RNA-seq Data Submission Pipeline¶

All submitted samples will be processed through the exceRpt Small RNA-seq Pipeline for exRNA Profiling
and exceRpt Small RNA-seq Post-processing tools.

Files Needed for Data Submission¶

Your submission will consist of three different files:

a data archive: The data archive will contain all of your different data files (FASTQ / SRA) as well as an optional spike-in file (FASTA) for those inputs.
a metadata archive: The metadata archive will contain various metadata documents relating to your data submission.
a manifest file: The manifest file will link together your data and metadata files, and it will also provide other valuable information for verifying that your submission is complete.

IMPORTANT NOTE
All three files must have the same file name prefix ("samples" is the prefix in "samples_data"). Note that the data archive file name ends in _data, the metadata archive file name ends in _metadata, and the manifest file name ends in .manifest.json.
In this illustrative example, the submission files will be named like this:

samples_data.zip
samples_metadata.zip
samples.manifest.json

In this example, "samples" was chosen as sample name. You should give a more descriptive name to your actual submission files ("gastricCancerOct2015_data.zip", for example).

Step 1: Preparing Your Data Archive¶

Prepare Your Data Archive

Step 2: Preparing Your Metadata Archive¶

Prepare Your Metadata Archive

Step 3: Preparing Your Manifest File¶

Prepare Your Manifest File

Step 4: Uploading Your Submission to the FTP Server for Processing¶

Upload Submission to the DCC using FTP Server

Step 5: Processing Your Files¶

Processing Your Files

Long RNA-seq Data Submission Pipeline¶

Files Needed for longRNAseq Data Submission¶

Your submission will consist of three different files:

a data archive: The data archive will contain all of your different paired-end reads FASTQ data files.
a metadata archive: The metadata archive will contain various metadata documents relating to your data submission.
a manifest file: The manifest file will link together your data and metadata files, and it will also provide other valuable information for verifying that your submission is complete.

IMPORTANT NOTE
All three files must have the same file name prefix ("samples" is the prefix in "samples_longRNAseqdata"), other than the data archive file name ending in _longRNAseq_data, the metadata archive file name ending in _longRNAseq_metadata, and the manifest file name ending in _longRNAseq.manifest.json.
In this illustrative example, the submission files will be named like this:

samples_longRNAseq_data.zip
samples_longRNAseq_metadata.zip
samples_longRNAseq.manifest.json

In this example, "samples" was chosen as sample name. You should give a more descriptive name to your actual submission files ("gastricCancerOct2015_longRNAseq_data.zip", for example).

Step 1: Preparing Your longRNAseq Data Archive¶

Prepare Your longRNAseq Data Archive

Step 2: Preparing Your longRNAseq Metadata Archive¶

Prepare Your longRNAseq Metadata Archive

Step 3: Preparing Your longRNAseq Manifest File¶

Prepare Your longRNAseq Manifest File

Step 4: Uploading longRNAseq Submission to the FTP Server for Processing¶

Upload longRNAseq Submission to the DCC using FTP Server

Step 5: Processing Your longRNAseq Files¶

Processing Your longRNAseq Files

qPCR Data Submission¶

Files Needed for qPCR Data Submission¶

Your submission will consist of two or three different files:

a data archive: The data archive is OPTIONAL. It will contain all of your different data files (RDML format or any other custom format provided by the qPCR instrument).
a metadata archive: The metadata archive will contain various metadata documents relating to your data submission.
a manifest file: The manifest file will provide valuable information about your submission.

IMPORTANT NOTE
Both files must have the same file name prefix ("samples" is the prefix in "samples_data"), other than the data archive file name ending in _qPCR_data, the metadata archive file name ending in _qPCR_metadata, and the manifest file name ending in .manifest.json.
In this illustrative example, the submission files will be named like this:

samples_qPCR_data.zip
samples_qPCR_metadata.zip
samples_qPCR.manifest.json

In this example, "samples" was chosen as sample name. You should give a more descriptive name to your actual submission files ("gastricCancerOct2015_qPCR_data.zip", for example).

Step 1: Preparing Your qPCR Data Archive¶

Prepare Your qPCR Data Archive

Step 2: Preparing Your qPCR Metadata Archive¶

Prepare Your qPCR Metadata Archive

Step 3: Preparing Your qPCR Manifest File¶

Prepare Your qPCR Manifest File

Step 4: Uploading qPCR Submission to the FTP Server for Processing¶

Upload qPCR Submission to the DCC using FTP Server

Step 5: Processing qPCR Your Files¶

Processing Your qPCR Files

Submission to a Public Repository¶

Controlled-access data repository:
Data Submission to dbGaP
Public-access data repository:
Data Submission to GEO

Miscellaneous Tips and Tricks¶

Below, you'll find some useful tips and tricks for creating your submission for the FTP Pipeline.

Creating an Archive¶

Creating an Archive

Learning How to Use the Terminal¶

If you need help navigating the terminal (and want to learn some basic Linux/OSX commands), the following link will be useful:

http://www.ee.surrey.ac.uk/Teaching/Unix/

Also available in: HTML TXT

exRNA Data Coordination Center

Wiki