Genboree Workbench Metagenomics Tools Toolset - Getting Started

Genboree Microbiome Toolset - Home

Before you get started

Genboree Workbench Help and Quick Notes

Links / Downloadable Copy of the Tutorial steps

Publicly Available Tutorial Data

5 Stool vs. 5 Throat 16S rRNA Sequence Data

We will be going through a metagenomics tutorial on the Genboree Workbench with publicly available data:

  • Sample Meta Data (Note: If you are attending a Genboree Workshop for the microbiome tools this download is for reference only. You will need to know how to create this file from scratch.)
  • Sequence Files

5 Soil vs. 5 Fermentation 16S rRNA Sequence Data

16S rRNA Soil vs Fermentation Tutorial

Create Sample Meta Data File

The first step towards completing work on the Genboree Microbiome Workbench is to produce the sample meta data file. The sample meta data reflects the attributes of each sample (e.g. health, body site, BMI, etc.) as well as the necessary information required to extract the sequence data from the original SFF or SRA sequence file. You will need barcode and primer information for all samples.

Requirements:
  • Tab-delimited, text file (no program specific files (i.e. xls, xlsx, ods, doc, etc.))
  • The first line of the file contains the column headers, as a comment-line. It must start with a '#'. It is most common to have the first row, first column be '#name'.
  • One of the fields MUST be 'name' which should be unique for all Sample records.
  • All records MUST have the same number of fields/columns.
  • Fields:
    • name - [Required] Unique name associated with the Sample.
    • barcode - [Required] The Sample-specific sequence used to barcode the sequences in multiplex sequencing. Will be used to identify which sequence records go with which Samples.
    • region or proximal and distal are required if your sequences were not obtained for V3V5 and sequenced in the 3' to 5' direction.
    • region - [optional] The name of the 16S region amplified. Defaults to V3V5 if no 'region', 'proximal', or 'distal' primer is included. The proximal and distal primer pair should amplify the region mentioned here.
    • proximal - [optional] The upstream primer used to amplify the microbial 16S rRNA region. If not provided, then a standard primer pair will be looked up based on the 'region' column. For example, if the user does not know the proximal primer, they can list V3V5 in the 'region' column and the stored primer used to amplify the V3V5 region is assumed; the upstream primer in that case is CCGTCAATTCMTTTRAGT. If the sequences do not match the assumed primer they will fail QC and be discarded due to the mismatch.
    • distal - [optional] The downstream primer used to amplify the microbial 16S rRNA region. If not provided, then a standard primer pair used looked up based on the 'region' column. For example, if the user does not know the distal primer, they can list V3V5 in the 'region' column and the stored primer used to amplify the V3V5 region is assumed; the upstream primer in that case is CTGCTGCCTCCCGTAGG. If the sequences do not match the assumed primer (or have >3 mismatches) they will fail QC and be discarded due to the mismatch when distal primer trimming is selected.
  • Also, in all table entries avoid any spaces or any character other than a_through_z_A_through_Z_0_through_9. Alphanumeric and underscore characters are allowed. Dashes, spaces, and special characters are not permitted.

Sample Metadata for Tutorial
  • 10 samples
  • 10 barcodes
  • 2 body sites
    • Stool
    • Throat
  • 1 primer region
    • V3V5

#name barcode proximal distal region body_site
S_700033665 CCGTTCCTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Stool
S_700035861 ACCGGCGTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Stool
S_700095543 ACGAATTAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Stool
S_700095850 AACCGGATAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Stool
S_700101600 AACGGAACGC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Stool
T_700016994 AATAACCGTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat
T_700095565 TTAATGGAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat
T_700095872 CGGACCGGAAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat
T_700101388 CCGAACGAC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat
T_700101622 TTCGTTCTTC CCGTCAATTCMTTTRAGT CTGCTGCCTCCCGTAGG V3V5 Throat

  • Select the data above and Copy.
  • Paste into Excel or an open source spreadsheet program. Be sure all entries are free of spaces and special characters and that all samples have the same number of columns. Avoid the column titles "state" and "type".
  • Save As and select tab-delimited.
  • Name your file in a clear and consistent manner.

Create Group

What is a Group? A "Group" contains Databases and Projects and controls access to all content within. You control access to your Group(s), and who is a member of your group. You can also belong to multiple Groups (i.e. collaborators).

Create Database

What is a Database? A Database contains Tracks, Lists, Sample Sets, Samples, and Files. Each database can be associated with a reference genome.

Create Project

What is a Project? A Project holds files that contain analysis results belonging to a Group.

Upload Files

  • Clean Input Data and Output Targets windows
  • Drag your database from the Data Selector window to the Output Targets window
  • Select Data » Files » Transfer File
  • Verfiy the Output Folder (destination) is correct
  • Browse to the file on your local machine
  • Choose Unpack/extract if your files are compressed
  • Choose Convert to Unix if your file is not compressed and was created on a Mac or PC
  • Create a new folder by entering a name into the Create in Sub-Folder (e.g. "tutorial_meta_data")
  • Describe the file in the File Description prompt [Optional]
  • Click 'Submit'
  • Click OK

View Uploaded Files

  • Click the Refresh icon or you can double click the expansion icon next to the Files Folder
  • Expand your Group, Databases, Database, Files directory to view your upload files

Import Samples

What is a Sample? A "Sample" is a database record corresponding to attributes that describe the Sample. For example, the breast cancer biopsy was 2 mm in diameter (attribute).
What is a Sample Set? A Sample Set is a list of Samples. Dragging a Sample Set is much more convenient that dragging a large amount of Samples.

  • Clean Input Data and Output Targets windows
  • Drag tutorial_meta_data.tsv file from the Data Selector window to the Input Data window
  • Drag your database from the Data Selector window to the Output Targets window
  • Select Data » Samples & Samples Sets » Samples » Import Samples
    • Click the Help icon if you have additional questions about this tool
  • Enter a new sample set name (e.g. "tutorial_sample_set") into the 'Assign Samples to new Sample Set'
  • Select your import behavior (you can leave the default settings for this tutorial) (click the help icon for more information about these settings)
  • Click 'Submit'
  • Click OK

Drag tutorial_meta_data.tsv file from the Data Selector window to the Input Data window

Drag your database from the Data Selector window to the Output Targets window

Select Data » Samples & Samples Sets » Samples » Import Samples

Enter a new sample set name (e.g. "tutorial_sample_set") into the 'Assign Samples to new Sample Set', Select your import behavior, Click 'Submit'

Click OK

View Imported Samples

  • Click the Refresh icon or you can double click the expansion icon next to the Samples Folder to see the changes
  • Expand the Samples Folder to see the newly imported Samples
  • Expand the Sample Set Folder to see the newly created Sample Set

Expand the Samples Folder to see the newly imported Samples

Expand the Sample Set Folder to see the newly created Sample Set

Link Samples To Sequence Files

  • Clean Input Data and Output Targets windows
  • Drag tutorial_sequence_file.sff.gz file from the Data Selector window to the Input Data window
  • Drag tutorial_sample_set from the Data Selector window to the Input Data window BELOW the tutorial_sequence_file.sff.gz entry
    • Note: Make sure that the sequence file is always BEFORE the sample, sample set, or sample folder that is to be linked. You can do this for multiple data sets, just make sure it is always sequence file followed by sample(s), sequence file followed by sample(s), etc.
  • Select Data » Samples & Samples Sets » Samples » Sample - File Linker
    • Click the Help icon if you have additional questions about this tool
  • Verify that you have sequence data followed by sample(s) to link, sequence data followed by sample(s) to link, etc.
  • Click Submit
  • Click OK
  • Wait for confirmation email or check the Job Status
    • Select System/Network » Jobs » Job Summary
    • Select start date, end date, sort order, group by, etc. or leave the defaults
    • Click Generate Report
    • Verify that job has finished by observing the Status column for the status of 'completed'. You can also click the Refresh button to update the job statuses.
    • Click OK
  • Click the Refresh icon or you can double click the expansion icon next to the Samples Folder to see the changes
  • (Optional) Verify that the Attribute 'fileLocation' has correctly linked the file to the Sample

Drag tutorial_sequence_file.sff.gz file from the Data Selector window to the Input Data window

Drag tutorial_sample_set from the Data Selector window to the Input Data window BELOW the tutorial_sequence_file.sff.gz entry

Select Data » Samples & Samples Sets » Samples » Sample - File Linker

Verify that you have sequence data followed by sample(s) to link, sequence data followed by sample(s) to link, etc., Click Submit

Click OK

Wait for confirmation email or check the Job Status by Select System/Network » Jobs » Job Summary

Select start date, end date, sort order, group by, etc. or leave the defaults, Click Generate Report

Verify that job has finished by observing the Status column for the status of 'completed'. Click OK

Click the Refresh icon or you can double click the expansion icon next to the Samples Folder to update the changes, (Optional) Verify that the Attribute 'fileLocation' has correctly linked the file to the Sample

Confirmation Email

Hello First-Time Genboree-User,

Your Sample File Linker job has completed successfully.

JOB SUMMARY:
  JobID          : wbJob-sampleFileLinker-HtQIPo-2992

The following file(s) and samples(s) has been linked:
 tutorial_sequence_file.sff.gz(File) -> S_700033665(Sample)
 tutorial_sequence_file.sff.gz(File) -> S_700035861(Sample)
 tutorial_sequence_file.sff.gz(File) -> S_700095543(Sample)
 tutorial_sequence_file.sff.gz(File) -> S_700095850(Sample)
 tutorial_sequence_file.sff.gz(File) -> S_700101600(Sample)
 tutorial_sequence_file.sff.gz(File) -> T_700016994(Sample)
 tutorial_sequence_file.sff.gz(File) -> T_700095565(Sample)
 tutorial_sequence_file.sff.gz(File) -> T_700095872(Sample)
 tutorial_sequence_file.sff.gz(File) -> T_700101388(Sample)
 tutorial_sequence_file.sff.gz(File) -> T_700101622(Sample)

The Genboree Team

Import Sequences

  • Clean Input Data and Output Targets windows
  • Drag SamplesSet tutorial_sample_set from the Data Selector window into the the Input Data window
    • Note: You can drag over multiple Samples, Sample Sets, or Sample Folders (that have been properly linked) into the Input Data window. This allows users to combine interesting data sets without having to import samples, link samples with files, etc. multiple times.
  • Drag your database from the Data Selector window to the Output Targets window
  • Select Metagenome » Data Initialization » Import 16S rRNA Sequences
    • Click the Help icon if you have additional questions about this tool
  • Select your options for sequence import
    • At this time you can sub-select a set of sequences that you wish to import in the 'Select Samples' window.
      • The default action is to select all samples.
      • If you want to select a subset of samples you have two options:
        • If you are wanting to only exclude a few samples (< 50%) you can ctrl + click (hold the control (Ctrl) button and click the left mouse button) the samples you wish to exclude
        • If you want to exclude a lot of samples (> 50%) you can click the Clear All button and ctrl + click the samples you wish to include
    • Set a custom folder output name ('Sample Set Name') or leave the default (which includes a helpful time stamp)
    • Options to choose
      • Trim At Distal Primer
        • Trim at distal primer location (up to 3 mismatches).
      • Trim at N/n
        • Trim reads at the first location of N/n if it occurs before distal primer location.
      • Remove sequences which contain an N
        • Ignore any reads that contain an N/n.
      • Set the minimum read length
        • Set the minimum read length to filter the FASTQ file.
      • Set the minimum average quality
        • Ignore sequences that do not meet the minimum average quality.
      • Set the minimum sequence count
        • Ignore samples that do not meet the minimum sequence count.
  • Click 'Submit'
  • Wait for confirmation email or check the Job Status

Drag SamplesSet tutorial_sample_set from the Data Selector window into the the Input Data window

Drag your database from the Data Selector window to the Output Targets window

Select Metagenome » Data Initialization » Import 16S rRNA Sequences

Choose your options for sequence import

Click Submit

Click OK

Wait for confirmation email or check the Job Status

Hello First-time Genboree-user

Your Sequence Import job completed successfully.

Job Summary:
   JobID                  : wbJob-seqImport-KAtdCp-2612
   Analysis Name          : Sequence-Import-2014-03-04-14:24:35

Settings:
   minAvgQuality           : 20
   minSeqCount             : 1000
   minSeqLength            : 200
   blastDistalPrimer       : true
   cutAtEnd                : false
   trimLowQualityRun       : false
   removeNSequences        : false

Result File Location in the Genboree Workbench:
   Group : New__Genboree__Group
   DataBase : New__Genboree__Database
   Path to File:
      Files
      * MicrobiomeData
         * Sequence-Import-2014-03-04-14:24:35

The Genboree Team

View Imported Sequences

  • Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes
  • Observe imported sequences output
    • fastq
      • fastq files for each uploaded SFF/SRA file
      • fastq is a file format that represents the combination of the fasta and quality score files
    • sample.metadata
      • Sample meta data file representing all samples used for analysis (appended with sequence import parameters, flags, etc. that are used for the pipeline)
    • settings.json
      • Settings in json format for sequence import pipeline
    • fasta.result.tar.gz
      • fasta file for each uploaded SFF/SRA file
    • filtered_fasta.result.tar.gz
      • Final quality filtered fasta file for each sample
    • stats.result.tar.gz
      • Sequence metrics for each sample
    • jobFile.json
      • See settings.json
    • sequences_metrics_summary.xls
      • Sequence metrics broken down into individual samples, summary for all samples, and each meta data label.

Download and Open sequences_metrics_summary.xls to confirm that you have correctly imported your sequences

Notes: Sequence Metrics Summary

  • Sample Order
    • The order of samples may be different than the order shown below. It is important to verify that you get the same values for each sample, but the order of the samples listed may (correctly) vary.
  • Columns
    • Your output will show 13 columns, but for the sake of brevity, we have shown a subset (4) below.
  • Worksheets
    • You should observe at least 3 sheets:
      • sampleSheet
        • Sequence metrics and metadata (user provided) for all samples imported in a given run (useful for troubleshooting and analyzing individual samples to ensure that you can safely proceed forward with all necessary Samples)
      • allSheet
        • Combined sequence metrics for all imported Samples (useful for summary metrics for a study summary, presentation, manuscript, etc.)
      • 'META DATA COLUMN NAME'Sheet
        • Sequence metrics broken down by metadata category (useful for determining potential sequence depth and sequence quality biases among metadata categories, metrics for quality control of sequencing runs / presentation / manuscript, etc.)
sampleName Average_read_length total_sequence_counts_after_filter body_site
S_700033665 505 7008 Stool
S_700101600 506 6716 Stool
T_700101622 515 4658 Throat
T_700016994 512 6794 Throat
S_700035861 511 6819 Stool
S_700095850 500 5879 Stool
T_700095872 516 2543 Throat
S_700095543 503 6191 Stool
T_700101388 510 7527 Throat
T_700095565 516 6294 Throat
Average Sequence Length Total Sequences
509.4 60429

RDP - Taxonomic Abundance Pipeline

  • Clean Input Data and Output Targets windows
  • Drag Sequence Import folder from the Data Selector window to the Input Data window
  • Drag database into the Output Targets window
  • Drag project into the Output Targets window
  • Select Metagenome » Data Analysis » Taxonomic Classification(RDP)
  • (Optional) Enter a 'Study Name' to organize your analyses. We will use 'Tutorial' here.
  • Click Submit
  • Click OK
  • + Wait+ for confirmation email or check the Job Status

Drag Sequence Import folder from the Data Selector window to the Input Data window

Drag database into the Output Targets window

Drag project into the Output Targets window

Select Metagenome » Data Analysis » Taxonomic Classification(RDP)

(Optional) Enter a 'Study Name' to organize your analyses, Click Submit

Click OK

Wait for confirmation email or check the Job Status

    Hello First-time Genboree-user

    Your RDP job completed successfully.

    Job Summary:

    JobID                  : wbJob-rdp-3roCaD-8680
    Study Name             : Tutorial
    Job Name               : RDP-Job-2014-03-04-14:36:23

    Settings:
    rdpVersion: 2.2
    rdpBootstrapCutoff: 0.8

    Result File Location in the Genboree Workbench:
    Group : New__Genboree__Group
    DataBase : New__Genboree__Database
    Path to File:
    Files
    * MicrobiomeWorkBench
      * Tutorial
        *RDP
          *RDP-Job-2014-03-04-14:36:23

Plots URL (click or paste in browser to access file):
    Prj: New__Genboree__Project
    URL:
http://www.genboree.org/java-bin/project.jsp?projectName=New__Genboree__Project

RDP Results

  • Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes
  • Observe and or Download files from RDP output
  • Click on project and Click 'Link to Project' in Details window
  • Click 'Link to result plots' under RDP job output
  • Observe RDP Output
  • Click 'Family_weighted-normalized' link to see heatmap of the family RDP output for the 10 Samples
  • Click 'Family_weighted.meta.body_site' to see stacked bar chart of the family RDP output for the 10 samples binned by body_site attribute

Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes, Observe and or Download files from RDP output

Click on project and Click 'Link to Project' in Details window

Click 'Link to result plots' under RDP job output

Observe RDP Output

Click 'Family_weighted-normalized' link to see heatmap of the family RDP output for the 10 Samples

Click 'Family_weighted.meta.body_site' to see stacked bar chart of the family RDP output for the 10 Samples binned by the body_site attribute

QIIME Pipeline - OTU Table, Phylogenetic Tree, and Beta Diversity

  • Clean Input Data and Output Targets windows
  • Drag Sequence Import folder from the Data Selector window to the Input Data window
  • Drag database into the Output Targets window
  • Drag project into the Output Targets window
  • Select Metagenome » Data Analysis » QIIME
  • (Optional) Enter a 'Study Name' to organize your analyses. We will use 'Tutorial' here.
  • (Optional) Click 'Remove Chimeras?' if you wish to remove chimeras via Chimera Slayer
  • Click Submit
  • Click OK
  • + Wait+ for confirmation email or check the Job Status

Drag Sequence Import folder from the Data Selector window to the Input Data window

Drag database into the Output Targets window

Drag project into the Output Targets window

Select Metagenome » Data Analysis » QIIME

(Optional) Enter a 'Study Name' to organize your analyses, (Optional) Click 'Remove Chimeras?' if you wish to remove chimeras via Chimera Slayer, Click Submit

Click OK

Wait for confirmation email or check the Job Status

Hello First-time Genboree-user

Your QIIME job completed successfully.

Job Summary:
   JobID                  : wbJob-qiime-DxQxJo-0630
   Study Name             : Tutorial
   Job Name               : Qiime-Job-2014-03-04-14:46:24

Result File Location in the Genboree Workbench:
   Group : New__Genboree__Group
   DataBase : New__Genboree__Database
   Path to File:
      Files
      * MicrobiomeWorkBench
         * Tutorial
            *QIIME
               *Qiime-Job-2014-03-04-14:46:24

Plots URL (click or paste in browser to access file):
    Prj: New__Genboree__Project
    URL:
http://www.genboree.org/java-bin/project.jsp?projectName=New__Genboree__Project

QIIME Results

  • Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes
  • Observe and or Download files from QIIME output
  • Click on project and Click 'Link to Project' in Details window
  • Click 'Link to cdhit-results' or 'Link to cdhit-normalized results' under QIIME job output
  • Click 'Unweighted Unifrac 2D' link
  • Observe separate clustering of stool and throat Samples
  • Click 'Unweighted Unifrac 3D' link
  • Observe separate clustering of stool and throat Samples
    • Note: If you experience security issues with loading the King applet, the following steps may help for Windows
      • Programs -> Java -> Configure Java
        • In the dialog that appears, click the security tab.
        • Restore/keep the security level marker at "High"
        • Click "[Edit Site List...]"
        • In the dialog that appears, click "[Add]"
        • In the prompt put the URL, including protocol & host
        • Other setups just require a host (and not the protocol as well)
        • Read the Security Warning about HTTP & FTP being insecure (duh)
        • Click "[Continue]" to continue despite Warning
        • Click "[Ok]" to finish with the Site List dialog
        • Click "[Ok]" to close the Configure java dialog
        • Close & reopen browser (to get JVM instance w/ new settings)

Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes, Observe+ and or Download files from QIIME output

Click on project and Click 'Link to Project' in Details window

Click 'Link to cdhit-normalized results' under QIIME job output

Click 'Unweighted Unifrac 2D' link

Observe separate clustering of stool and throat Samples

Click 'Unweighted Unifrac 3D' link

Observe separate clustering of stool and throat Samples

Alpha Diversity

  • Clean Input Data and Output Targets windows
  • Drag QIIME output folder from the Data Selector window to the Input Data window
  • Drag database into the Output Targets window
  • Drag project into the Output Targets window
  • Select Metagenome » Data Analysis» Alpha Diversity
  • (Optional) Enter a 'Study Name' to organize your analyses. We will use 'Tutorial' here.
  • Select one or many metadata attributes that you wish to analyze (multi-select can be activated with ctrl + click)
  • (Optional) Click Remove singletons? (checked by default)
    • Singletons are entries in the OTU tables that only exist once in all samples. These elements can falsely raise diversity and have been known to impact alpha diversity curves.
  • Click 'Submit'
  • Click OK
  • Wait for confirmation email or check the Job Status

Clean Input Data and Output Targets windows, Drag QIIME output folder from the Data Selector window to the Input Data window

Drag database into the Output Targets window

Drag project into the Output Targets window

Select Metagenome » Data Analysis» Alpha Diversity

(Optional) Enter a 'Study Name' to organize your analyses, Select one or many metadata attributes, (Optional) Click Remove singletons?, Click 'Submit'

Click OK

Wait for confirmation email or check the Job Status

Hello First-time Genboree-user

Your Alpha Diversity job completed successfully.

Job Summary:
   JobID                  : wbJob-alphaDiversity-DVtpVp-9378
   Study Name             : Tutorial
   Job Name               : AD-Job-2014-03-04-14_58_35

Result File Location in the Genboree Workbench:
   Group : New__Genboree__Group
   DataBase : New__Genboree__Database
   Path to File:
      Files
      * MicrobiomeData
         * Tutorial
            *AlphaDiversity
               *AD-Job-2014-03-04-14:58:35

Plots URL (click or paste in browser to access file):
    Prj: New__Genboree__Project
    URL:
http://www.genboree.org/java-bin/project.jsp?projectName=New__Genboree__Project

The Genboree Team

Alpha Diversity Results

  • Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes
  • Observe and or Download files from alpha diversity output
  • Click on project and Click 'Link to Project' in Details window
  • Click 'Link to result plots' under alpha diversity job output
  • Click body_site - richness - rarefaction link
  • Observe higher richness in the 5 stool Samples
  • Click body_site - renyi profile - renyi link
  • Observe the intersecting lines which indicates that there is not a consensus in higher / lower diversity for the 5 stool Samples vs. 5 throat Samples

Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes, Observe and or Download files from alpha diversity output

Click on project and Click 'Link to Project' in Details window

Click 'Link to result plots' under alpha diversity job output

Click body_site - richness - rarefaction link

Observe higher richness in the 5 stool Samples

Click body_site - renyi profile - renyi link

Observe the intersecting lines which indicates that there is not a consensus in higher / lower diversity for the 5 stool Samples vs. 5 throat Samples

Machine Learning

  • Clean Input Data and Output Targets windows
  • Drag QIIME output folder from the Data Selector window to the Input Data window
  • Drag database into the Output Targets window
  • Drag project into the Output Targets window
  • Select Metagenome » Data Analysis » Machine Learning » QIIME generated OTU table
  • (Optional) Enter a 'Study Name' to organize your analyses. We will use 'Tutorial' here.
  • Select one or many metadata attributes that you wish to analyze (multi-select can be activated with ctrl + click)
  • Click 'Submit'
  • Click OK
  • Wait for confirmation email or check the Job Status

Clean Input Data and Output Targets windows, Drag QIIME output folder from the Data Selector window to the Input Data window

Drag database into the Output Targets window

Drag project into the Output Targets window

Select Metagenome » Data Analysis » Machine Learning » QIIME generated OTU table

(Optional) Enter a 'Study Name' to organize your analyses, Select one or many metadata attributes that you wish to analyze (multi-select can be activated with ctrl + click)

Click 'Submit'

Click OK

Wait for confirmation email or check the Job Status

Hello First-time Genboree-user

Your Machine Learning -> QIIME generated OTU table job completed successfully.

Job Summary:
   JobID                  : wbJob-machineLearning-AABUFp-9018
   Study Name             : Tutorial
   Job Name               : ML-Job-2014-03-04-14_59_46

Result File Location in the Genboree Workbench:
   Group : New__Genboree__Group
   DataBase : New__Genboree__Database
   Path to File:
      Files
      * MicrobiomeData
         * Tutorial
            *MachineLearning
               *ML-Job-2014-03-04-14:59:46

Plots URL (click or paste in browser to access file):
    Prj: New__Genboree__Project
    URL:
http://www.genboree.org/java-bin/project.jsp?projectName=New__Genboree__Project

The Genboree Team

Machine Learning Results

  • Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes
  • Observe and or Download files from machine learning output
  • Click 'RF_Summary.xls' in the Data Selector window and then 'Click to Download File' in the Details window
  • Observe 0% estimated error for all 4 OTU row count cutoffs using randomForest classification
  • Click on project and Click 'Link to Project' in Details window
  • Click 'Link to result plots' under machine learning job output
  • Click Body_site-500 link
  • Observe significant Boruta selected features (classified down to the Genus taxonomic depth) that discriminate the 5 stool Samples vs. the 5 throat Samples

Click the Refresh icon or you can double click the expansion icon next to the Files Folder to see the changes, Observe and or Download files from machine learning output

Click 'RF_Summary.xls' in the Data Selector window and then 'Click to Download File' in the Details window

Observe 0% estimated error for all 4 OTU row count cutoffs using randomForest classification

5 25 100 500
body_site 0.0 0.0 0.0 0.0

Click on project and Click 'Link to Project' in Details window

Click 'Link to result plots' under machine learning job output

Click Body_site-500 link

Observe significant Boruta selected features (classified down to the Genus taxonomic depth) that discriminate the 5 stool Samples vs. the 5 throat Samples

Also available in: HTML TXT