Domain Specifier Keywords

You use the domain field in a property definition to provide information about the type (domain) of a property's value.
  • The domain field takes a keyword string or a specifically formatted key-phrase string.
  • By default, a property's value contains text ("string") type data, possibly empty. This provides limited validation opportunities and limited query operators.

Summary of Generic Domain Keywords & Keyphrases

[
  # Text-like
  "string",                      # Value is free form text of "any" length (16MB max).
  "regexp(%REGEXP%)",            # Value must be a string that matches the @%REGEXP%@ pattern indicated.
  "enum(%CSV_LIST%)",            # Value must be one from a fixed list. (%CSV_LIST% lists possible values; each
                                 # value treated as a string).
  "url",                         # Value must be a URL.

  # Numeric
  "int",                         # Value must be an integer.
  "posInt",                      # Value must be a positive integer. Includes 0.
  "negInt",                      # Value must be a negative integer. Includes 0.
  "intRange(%START%, %END%)",    # Value must be an integer within a range. Ranges are inclusive.
  "float",                       # Value must be a floating-point.
  "posFloat",                    # Value must be a positive float. Includes 0.0
  "negFloat",                    # Value must be negative float value. Includes 0.0
  "floatRange(%START%, %END%)",  # Value must be a number within a range. Ranges are inclusive.

  # Date/Time
  "date",                        # Value must be a string indicating a calendar date/day. (Unambiguous
                                 # date formats only).
  "timestamp",                   # Value must be a string indicating a specific time in history (or future).
                                 # 1 second resolution. RFC822 formatted string or similar recommended.

  # Other
  "boolean"                     # Value must be a boolean and only accepts @true@ or @false@ literals.
  "[valueless]"                 # Property has NO value. Document representations can provide null (JSON)
                                # or empty-string for this property's value.
]

Details of Generic Domains

Text-Like

string

The property's value is a string of characters. Free form text of "any" length (16MB max).
  • Empty string is allowed. If not appropriate, consider using regexp(\S+) or similar.

regexp(%REGEXP%)

The property's value must be a string that matches the %REGEXP% pattern indicated.
  • "regexp(%REGEXP%)" - The property's value is a string that matches the %REGEXP% provided.
    • If whole string must match the regexp, make sure to use ^ and $ anchors!
    • Tip: Can be used to indicate "non-empty, non-whitespace-only string"! i.e. "regexp(\S)".

enum(%CSV_LIST%)

The property's value must be one from a fixed list. (Must list all possible values.)
  • Individual values are always stored and treated as strings. Only string-based searching, etc. will be available. Not numeric.

url

The property's value is a URL string typically with protocol & host; possibly also with path, anchor, and query components.

Relative URLs are acceptable but may have limited use:
  • There are specific rules for relative URLs within a KB, to use as shorthand for: a different doc in the same collection, a doc in a different collection, a doc in a different KB, a different collection, etc.
  • Some apps & interfaces know how to treat these KB-relative URLs sensibly. For example, the GenboreKB UI will construct the correct link and you can open the doc indicated by the relative URL!
  • If you want to write a relative URL to another KB document from within your current KB document and have it work automatically in such UIs, the basic rule is: start with the fixed type keyword that differs between the two docs. For example:
    • Say you have a document http://host.com/REST/v1/grp/MyGrp/kb/MyKB/coll/MyColl1/doc/DocID78:
      • doc/DocID87 - is a KB-relative URL to a different doc in same collection.
      • coll/MyOtherColl/doc/doc_idA - is a KB-relative URL to a doc in different collection of the same KB.
      • kb/YourKB/coll/YourKB/doc/12345 - is a KB-relative URL to doc in a different KB altogether.
      • coll/MyOtherColl - is a KB-relative URL to a collection in the same KB.
      • etc

Numeric

int

The property's value is an integer value.

posInt

The property's value is a positive integer value. Includes 0.

negInt

The property's value is a negative integer value. Includes 0.

intRange(%START%, %END%)

The property's value falls within an integer range. Value provided must be an integer in the range. Ranges are inclusive.
  • Advanced: Open ended ranges are supported like they are in regular expressions:
    • (5,) is a range whose minimum value is 5 and "no" maximum. (Actually, maximum is largest 64-bit integer: 9,223,372,036,854,775,807.)
    • (,255) is a range with "no" minimum value (i.e. -inf) and whose maximum value is 255. (Actually, minimum is the smallest 64-bit integer: -9,223,372,036,854,775,808.)

float

The property's value is a floating-point value.

posFloat

The property's value is a positive float value. Includes 0.0.

negFloat

The property's value is a negative float value. Includes 0.0.

floatRange(%START%, %END%)

The property's value falls within a real number range. Value provided must be a number within the range. Ranges are inclusive.
  • Advanced: Open ended ranges are supported like they are in regular expressions (note the commas please):
    • (50.5,) is a range whose minimum value is 50.5 and "no" maximum. (Actually maximum is the largest positive 64-bit IEEE 754 floating point number: 1.79769313486232e+308.)
    • (,1.0) is a range with "no" minimum value (i.e. -inf) and whose maximum value is 1.0 (Actually minimum is the largest negative 64-bit IEE754 floating point number: -1.79769313486232e+308.)

Date/Time

date

The property's value is a calendar date/day. Document representations provide this date as a string.
  • Value must be an unambiguous date string. Example formats: "YYYY/MM/DD" or "YYYY-MM-DD" or "Dec 06 2013" or "05 Dec 2013".
  • Ambiguous formats are not supported and likely to be actively rejected during validation, in order to protect the integrity of your data. Especially when there are multiple collaborators, institutions, countries, or raw data sources which may or may not be following the same assumptions regarding date string formats.
    • Thus MM-DD-YYYY and DD-MM-YYYY formats are not acceptable because they are ambiguous [with each other] and can lead to data corruption.
      • So "5/10/2015" is not a supported. It is ambiguous. Nor is "11-5-11" supported, nor "11/5/5".
    • In most cases, such ambiguous formats will be rejected to help protect against common incorrect conversions and mismatches of assumptions (and formats!) used by the various contributors.
    • But beyond such rejections, best-effort parsing of your date string is made.
  • Y2K-type "bad practices": While not rejected outright, ambiguous formats like 5 Dec 5 or Dec 5 10 do carry Y2K type risk and you really should have stopped using such formats over a decade ago.
    • But these will be accepted, and will probably be parsed how you intended: they will be interpreted contextually with respect to when they were submitted (currently, a "year" field of 5 => 2005 and a "year" field of 10 => 2010).

timestamp

The property's value is a string that indicates a specific time in history (or future), usually providing hours, minutes, year, month, and day information at least.
  • RFC822 formatted strings, or similar, are recommended.
  • Some sensible examples: "2013/12/12 10:10 pm", same as "2013-Dec-12 22:10", same as "22:10 2013-Dec-12".
  • Odd, novel, creative format warning:
    • "Creative" timestamp formats may be accepted but not interpreted as you expect.
      • e.g. 10.00 a.m will be accepted, but is completely silly; it has "." for what is probably meant to be a ":", it has "a.m" for what it probably "a.m." or "am".
      • Because of the use of "." not ":" the parser assumes it to mean "today, at time NOW, with seconds portion = 10 and fractions of a second = 0".
      • i.e. Recently this string was interpretted as: Thu Feb 05 17:03:10 -0600 2015...nothing to do with 10:00 am. Best not to use "creative" timestamp formats.
  • Ambiguous formats are not supported. See "Ambiguous formats" section for the date domain above.

Other

boolean

The property's value is boolean and only accepts true or false literals (in JSON representations) as values; in non-JSON formats the strings "true", "false", "yes", and "no" are accepted.

[valueless] domain

Properties whose domain is [valueless] have no valid value. They are present to help organize and categorize their descendent properties, or acts as a flags by their mere presence or absence.
  • JSON representations will have a value of null for these properties (although empty string, "", is acceptable); other representation formats will have empty-string for such properties.

GenboreeKB specific domains

These are generally more advanced domains that support very particular usage, and which have a number of additional rules or complexity in order to employ them.

[
  # Misc.
  "measurement(unit)",             # Value is a string with a numeric and units component.
  "fileUrl",                       # Value is a URL pointing to a downloadable file.         

  # Ontology-Related
  "bioportalTerm(%URL%)",          # Value must be term from specific part of ontology exposed at bioontology.org

  # Auto-Filling
  "autoID(prefix, mode, suffix)",  # Value is an ID string which, if missing, will be generated automatically.
  "pmid",                          # Value is a PubMed ID; sub-properties may be added in for you.
  "omim",                          # Value is an OMIM ID; sub-properties may be added in for you.
  "gbAccount"                      # Value is a Genboree login; sub-properties may be added in for you.
]

Misc.

measurement(unit)

This is a special kind of number which includes a unit component. This is provided as a string in the format "{number} {unit}" where {number} is an integer or float and {unit} is the units for the measurement, such as kg, lbs, sec, cm, ml, oz
  • The model declares the official, agreed-upon canonical unit for the measurement and will determine what is stored. However, uploaded docs may use any compatible units in their representations.
    • For example, if the domain is "measurement(kg)" then the value may be "1.5 kg", "0.67 lbs", "58 g", etc.
  • Prior to storage, the non-canoncial value will be converted to the canonical units. Thus, this provides an auto-normalization functionality, helping better manage, process, and mine measurements whose original data sources used varying units.

fileUrl

This is similar to url above but the URL should point to a file which can be downloaded simply by clicking/using the URL. Relative URLs are not valid here and the URL is expected to include the protocol and host, and typically also a path.
  • UIs, scripts, etc, are expected to be able to use this URL to obtain the file's data directly, using the URL. Even via wget or curl (libcurl) etc.
  • Genboree-stored files: Genboree provides a service for storing raw data files; obviously, these are available via REST API URLs. But Genboree exposes both file metadata and the file contents. When using such a URL as a fileUrl, it is expected you are pointing to the file contents, not the metadata.

Ontology-Related

bioportalTerm(%URL%)

The value for the property must be a term from an ontology exposed by Bioontology.org . Specifically, a term within the ontology sub-tree indicated by %URL%.

Auto-Filling

autoID(prefix, mode, suffix)

The value for this property can be provided or left blank. If provided it must match according to the prefix, mode, and suffix components present in the model; if left blank, those same components will be used to automatically generate a value.
  • The components indicate what the ID string looks like: {prefix}-{variable}-{suffix}
    • prefix - a static string which all values will have as their prefix.
    • mode - A keyword indicating how to generate the variable middle portion of the ID string.
      • uniqAlphaNum indicates the variable portion is a mix of numbers and upper- and lower-case letters.
      • uniqNum indicates the variable portion is a mix of numbers.
    • suffix - a static string which all values will have as their suffix.
  • For example, our domain might be autoID(Gene, uniqAlphaNum, NGD) and a generated ID (or provided) ID might be Gene-123aBc-NGD.

pmid

The property value must be an ID from the PubMed database at http://www.ncbi.nlm.nih.gov/pubmed . Genboree can automatically fill in certain PubMed information as sub-properties. Which PubMed information sub-properteis are added is dictated by your model--if your model as a recognized sub-property under this pmid domain, it can be automatically filled in for you.
  • For example, the value 25759985 refers to an article entitled "Ancestral experience as a game changer in stress vulnerability and disease outcomes." GenboreeKB will automatically query PubMed about this article and then add sub-properties about that article to your document.
  • Recognized sub-properties and aliases:
    • pmid - PMID, pmid
    • citationStr - Citation, citation
    • authorsStr - Authors, authors
    • journalStr - Journal Ref, journal ref, journalRef
    • title - Article Title, Title, article title, title, articleTitle
    • journalTitle - Journal Title, Journal Name, journal title, journal name, journalTitle, journalName
    • journalAbbrv - Journal Abbrv, Journal, journal abbrv, journal, journalAbbrv
    • publicationDate - Publication Date, Pub Date, publication date, publicationDate, pub date, pubDate
    • volumeNumber - Volume, Vol, volume, vol
    • issueNumber - Issue, Iss, issue, iss
    • pages - Pages, Pgs, pages, pgs
    • locationId - Location Id, location id, locationId
    • abstract - Abstract, abstract

omim

The property value must be an ID from the OMIM database at http://omim.org/ . GenboreeKB can automatically fill in certain OMIM information as sub-properties. Which OMIM information sub-properties are added is dictated by your model--if your model has a recognized sub-property under this omim property, it can be automatically filled in for you.
  • For example, the value may be 105400, which is the OMIM identifier for Amyotrophic Lateral Sclerosis (ALS). GenboreeKB will automatically query the OMIM database and then add sub-properties about ALS from OMIM.
    • For example, the "Description", "Clinical Features", "Inheritance", and other aspects of the disease can be stored as sub-properties as they appear at http://omim.org/entry/105400 . Again, which aspects are retreived and stored for you is dictated by your model.
  • Recognized sub-properties and aliases:
    • title - Title, title
    • otherTitles - Other Titles, Other titles, otherTitles, other titles
    • references - References, references
    • animalModel - Animal Model, Animal model, animalModel, animal model
    • biochemicalFeatures - Biochemical Features
    • clinicalFeatures - Clinical Features
    • clinicalManagement - Clinical Management
    • cloning - Cloning
    • cytogenetics - Cytogenetics
    • description - Description
    • diagnosis - Diagnosis
    • evolution - Evolution
    • geneFamily - Gene Family
    • geneFunction - Gene Function
    • geneStructure - Gene Structure
    • geneTherapy - Gene Therapy
    • geneticVariability - Genetic Variability
    • genotype - Genotype
    • genotypePhenotypeCorrelations - Genotype/Phenotype Correlations
    • heterogeneity - Heterogeneity
    • history - History
    • inheritance - Inheritance
    • mapping - Mapping
    • molecularGenetics - Molecular Genetics
    • nomenclature - Nomenclature
    • otherFeatures - Other Features
    • pathogenesis - Pathogenesis
    • phenotype - Phenotype
    • populationGenetics - Population Genetics
    • text - Text

gbAccount

The property's value must be a Genboree user's account (login) name at that host. GenboreeKB can automatically fill-in certain user information as sub-properties. Which user information sub-properties are added is dictated by your model--if your model has a recognized sub-property under this gbAccount property, it can be automatically filled in for you.
  • Recognized sub-properties and aliases:
    • firstName - First Name, First name, first Name, first name, firstName, first_name
    • lastName - Last Name, Last name, last Name, last name, lastName, last_name
    • institution - Institution, institution
    • email - Email, email, Email Address, Email address, email address, email Address, emailAddress, email_address