Understanding the Nested Tabbed Format

  • In each metadata file, you will have a "#property" column and at least one "value" column.
  • The "#property" column contains different metadata properties, and the "value" column contains values for those metadata properties.
  • For each entry in the "#property" column, you'll notice that different properties have different numbers of dashes and stars preceding the actual property names.
  • These "-" and "*" symbols serve as nesting prefixes.
    • When a given property is nested underneath another property, that means the first property is a subproperty of the second property.
    • The subproperty usually provides more detail about the parent property in some way.

The Symbol -

  • The symbol "-" indicates an additional basic level of nesting for a given property. For example, see the table below:
#property value
-- Biological Fluid
--- Biofluid Name serum
--- Collection Details
---- Sample Collection Method venipuncture
  • Here, --- Biofluid Name and --- Collection Details are nested under -- Biological Fluid, and ---- Sample Collection Method is nested under --- Collection Details.
  • The Biofluid Name and Collection Details properties provide more information about the Biological Fluid property, and the Sample Collection Method property provides more information about the Collection Details property.

The Symbol *

  • The symbol "*" indicates that the property contains an item list.
  • This list can be as long as you like, and each property name will be the same within the list.
  • For example: Imagine that you have 4 authors associated with your study. There is a property named * Authors in your Studies metadata file.
    Below this property, there will be 1 row for the *- Author Name property. This property is an item in the * Authors item list.
    If you want to add 3 more authors, simply add another 3 rows of *- Author Name, like so:
#property value
* Authors
*- Author Name NAME1
*- Author Name NAME2
*- Author Name NAME3
*- Author Name NAME4

One Nuance of Multiple Value Columns

  • For some of your metadata files (Experiments, Donors, Biosamples), it is likely that you will have multiple value columns.
    • For example, your submission could contain 20 different biosamples - you would then need 20 value columns in your Biosamples metadata file.
  • In general, it is very simple to add a new value column to your metadata file (via Excel, for example) and then add content for that new column.
  • However, what if your new value column provides values for a different set of properties compared to your old value column?
  • For example, there could be some optional information that's available for my new value column, but isn't available for my old value column.
  • In most cases, you can just leave the value blank for the old value column (while providing the suitable value in the new value column).
  • However, if the domain of the property allows for a blank or empty value, then we need some way of specifying that the old value column
    is missing this property (instead of just providing a blank value for that property).
  • You can provide this information by writing #MISSING# as a value for the relevant property in the old value column.
  • You will need to write #MISSING# if the domain is the following:
    • autoID, fileUrl, numItems, regexp, string, url, [valueless]
  • Please note that if you mark a parent property as #MISSING#, you don't need to fill in #MISSING# for any nested subproperties (although it won't break anything if you do).
    • For example, if I mark "--- Collection Details" as missing above, I wouldn't need to mark "---- Sample Collection Method" as missing.
  • You likely won't need to use the #MISSING# feature. It is most likely to come up when you're working on your Experiments metadata file, since different experiments
    are more likely to use different metadata properties (versus biosamples and donors).
  • You can download an example of an Experiments metadata file that uses the #MISSING# feature here.
    In this metadata file, #MISSING# is used twice:
    • It is used for the "----- Low Speed Centrifugation" property in the second value column.
      By using #MISSING, we are saying that our second value column doesn't contain any information about "----- Low Speed Centrifugation" or the subproperty "------ Centrifugation Parameters".
      • We had to use #MISSING# because "----- Low Speed Centrifugation" has a domain of string.
      • Note that we only put #MISSING# for "----- Low Speed Centrifugation" and not "------ Centrifugation Parameters"
        (because if "----- Low Speed Centrifugation" is missing, then "------ Centrifugation Parameters", a subproperty, certainly is).
    • It is used for the "- exRNA Sample Preparation Protocol" property in the second value column.
      By using #MISSING, we are saying that our second value column doesn't contain any information about "- exRNA Sample Preparation Protocol".
      • We had to use #MISSING# because "- exRNA Sample Preparation Protocol" has a domain of [valueless].
      • Note that we only put #MISSING# for "- exRNA Sample Preparation Protocol" and none of its subproperties.

Also available in: HTML TXT