Data Model Schema

Preamble

A data model describes what a valid stored document will look like.
  • A data model is required for all GenboreeKB collections.
    • A GenboreeKB can have multiple document collections, and each will have its own model.
  • The model defines key things like:
    • What are the valid property names? (case sensitive!)
    • What is the domain or "type" information for the value of a property?
    • Does a given property have any sub-properties? If so, define them.
    • Or does a given property have a homogeneous list/array/set of sub-properties instead?
    • Do you need to index the property to enable faster searching?
  • For a given property, the data model defines things like:
    • Is the property required to be present?
    • Is the property the record identifier? (exactly 1 property must be tagged as the record identifier)
    • Should the value be unique? (no other documents in the collection shall have the same value for this property)
    • Is the property's value fixed/static? (i.e. can't be altered; good for fixed "categories" and "headings")

The Property Definition

Each property will have a definition that describes value use of the property in a document. A model thus contains a set of property definitions.

Summary of Key Fields in a Property Definition

Illustrated below are some of the key fields that can be present in a property definition.
  • The second column indicates the DEFAULT value for the field if you don't provide it. Often, the default is appropriate!
{
   // ---- Commonly used fields ----
    "name"        : NO_DEFAULT,  // [Required] Contains the name of the property.
    "domain"      : "string",    // Keyword indicating domain/type-information for the property's value.
    "identifier"  : false,       // Is this property the "document identifier"? Exactly 1 property must have "identifier"==true.
    "required"    : false,       // Is this property required in a valid document, or can it be left out?

    // ---- Slightly more advanced fields ----
    "category"    : false,       // Are you using this property more as a category or header, for organizing documents nicely? 
    "fixed"       : false,       // Is the value fixed/static? i.e. defined in the model? (Often used with categories)
    "default"     : NO_DEFAULT,  // If there are any default values to be used for that property, define them here.
    "unique"      : false,       // If the property value should be unique in the entire collection, specify it here.
    "description" : "",          // Provide a useful description about the property. This description will be shown as a tool tip in the UI.
    "index"       : false,       // Would you like to index this property, so searches can be faster?

    // ---- Mutually exclusive fields regarding sub-properties [if any] ---
    "properties"  : null,        // If the property has its own sub-properties, define them here. 
    "items"       : null         // If the property has an open list of 0+ sub-properties, define them here.
                                 // Property definitions in "items" must be SINGLY-ROOTED, but of course can be nested.
}

Detailed Descriptions of Model-Related Property Metadata

Key Fields

(core/key fields you should be aware of and use/consider often)
  • "name" - [String] The property name. Can have spaces, special chars (if escaped where needed). Case sensitive.
    • Predefined in the data model (!!)
    • New properties/attributes cannot be provided at data entry/submission time. All properties must be in the model.
  • "domain" - [Keyword string] The domain or type-information for the property's value.
    • Must be one of the known domain specifier Strings (see Domain Specifier Keywords).
    • You often want to provide this. If not provided, the default domain "string" will be used.
  • "default" - [(various)] The default value, if any.
    • Must be either a value from the value domain or the null value (not text "null") for no default.
    • Documents won't be accepted as valid if they fail to provide values for properties with null defaults. It just means no suggested value is provided by default.
  • "identifier" - [Boolean] Is the property the document identifier?
    • ALL MODELS MUST HAVE EXACTLY 1 doc identifier PROPERTY i.e. the unique name by which you refer to the document.
    • Implies "unique"=true. There is no need to provide "unique" for the property; providing "unique"=false for an identifier property is an error.
    • Implies "required"=true. There is no need to provide "required" for the property; providing "required"=false for an identifier property is an error.
  • "required" - [Boolean] Whether the property is required to be present.
    • i.e. must be filled in or provided when data is submitted.
    • Note that you can have required sub-properties under a non-required property. Just means that "IF this property is provided, it MUST ALSO have certain sub-properties".
  • "properties" - [Array] An Array of sub-property definitions for this property, if any.
    • i.e. the properties-of-this-property. This is where you define the content-related property metadata for this property.
    • Mutually-exclusive [currently] with "items".
  • "items" - [Array] For properties which have a list of sub-ordinate properties, this is an Array of sub-property definitions for the items stored in the list.
    • All items (i.e. properties; i.e. ~sub-documents) in the items list MUST BE SINGLY ROOTED PROPERTY DEFINITIONS. i.e. there is one top level property, and it may have any number of sub-properties as usual (or none)
    • The list is homogeneous and here is where you define this kind of "sub-document" or kind of "property" the list contains.
    • Mutually-exclusive [currently] with "properties".

Ancillary Fields

(more specialized fields supporting specific cases or providing more advanced information)
  • "description" - [String] Description of the property.
    • For documentation, best practices, informative models, reminders to self (8 months later), communication with others, etc.
    • Strongly recommended best-practice.
    • But completely optional.
    • Note: may be used for tooltips/popups in UIs.
  • "unique" - [Boolean] Whether the value for this property is unique.
    • For "unique" properties not in an "items" list, no other document in the collection can have the same value for this property.
    • For "unique" properties which are within an "items" list, the scope is restricted to the list; i.e. no other sub-document in the list can have have the same value for the property. Very useful for properties which act as item "identifiers" in the list!
    • Search and get-by-unique property functionality of UIs, APIs, etc. can leverage unique fields (otherwise, saving and using formal Queries will be needed, with much more overhead).
  • "category" - [Boolean] Whether the property is the special case of a category.
    • Category properties are a specific case that are expected to be handled differently than normal in other representations.
      • Mundane example: Category property names & values may be rendered in bold when presented in UIs and as HTML, etc.
      • Information example: Category properties are actually tag/categorization attributes for the properties immediately underneath the category. The "tag" analogy is good here. Some data exports (e.g. RDF) may not keep the category as a hierarchical "header" but rather as subordinate statements about the property(ies) within the category.
    • Most commonly used together with "fixed=true", but not strictly required to be.
  • "fixed" -[Boolean] Is the property's value fixed/static/unmodifiable? i.e. defined and fixed by the model?
    • Most commonly used together with "category", but not strictly required to be.
  • "index" - [Boolean] If appropriate, should an index be built on this property to speed up common (!!) searches?
    • This is actually a "hint" for the infrastructure and may or may not result in an actual index in the underlying storage engine.
    • Best impact will be when used together with "unique"=true. Not needed when "identifier"=true, because a property definition with that field (and associated value) is indexed by default.
    • Don't over-index! If you do, storage space will be consumed very quickly and insert-times will become very very long. Judiciousness is best.

A Data Model Is a Singly Rooted Nested Collection of Property Definitions

We've seen property definitions on their own above. A document will likely have many properties & sub-properties, all needing definitions.
  • The full data model document will contain all of these various property definitions.
  • Each document model has a single root property, which is the document identifier.

Illustrative Example/Template

// DOCUMENT MODEL
// * At the top level, it's an array/list of the top-level properties
[
  // Definition of the top-level document identifier property
  {
    "name"        : "nameOfIdentifierProperty",
    "domain"      : "string",
    "identifier"  : true,
    "properties" :
    [
      {
          "name" : "subPropName1 - an optional String" 
      },
      {
         "name"    : "subPropName2 - an optional Boolean",
         "domain"  : "boolean" 
      }
      // ...
    ]
  }
]