YAML data model specification
The caosadvancedtools
library features the possibility to create and update
CaosDB models using a simplified definition in YAML format.
Let’s start with an example taken from model.yml in the library sources.
Project:
obligatory_properties:
projectId:
datatype: INTEGER
description: 'UID of this project'
Person:
recommended_properties:
firstName:
datatype: TEXT
description: 'first name'
lastName:
datatype: TEXT
description: 'last name'
LabbookEntry:
recommended_properties:
Project:
entryId:
datatype: INTEGER
description: 'UID of this entry'
responsible:
datatype: Person
description: 'the person responsible for these notes'
textElement:
datatype: TEXT
description: 'a text element of a labbook recording'
associatedFile:
datatype: FILE
description: 'A file associated with this recording'
table:
datatype: FILE
description: 'A table document associated with this recording'
extern:
- Textfile
This example defines 3 RecordTypes
:
A
Project
with one obligatory propertydatatype
A Person with a
firstName
and alastName
(as recommended properties)A
LabbookEntry
with multiple recommended properties of different data typesIt is assumed that the server knows a RecordType or Property with the name
Textfile
.
One major advantage of using this interface (in contrast to the standard python interface) is that properties can be defined and added to record types “on-the-fly”. E.g. the three lines for firstName
as sub entries of Person
have two effects on CaosDB:
A new property with name
firstName
, datatypeTEXT
and descriptionfirst name
is inserted (or updated, if already present) into CaosDB.The new property is added as a recommended property to record type
Person
.
Any further occurrences of firstName
in the yaml file will reuse the definition provided for Person
.
Note the difference between the three property declarations of LabbookEntry
:
Project
: This record type is added directly as a property ofLabbookEntry
. Therefore it does not specify any further attributes. Compare to the original declaration of record typeProject
.responsible
: This defines and adds a property with name “responsible” toLabbookEntry`, which has a datatype ``Person
.Person
is defined above.firstName
: This defines and adds a property with the standard data typeTEXT
to record typePerson
.
If the data model depends on record types or properties which already exist in
LinkAhead, those can be added using the extern
keyword: extern
takes a
list of previously defined names of Properties and/or RecordTypes. Note that if you happen to use an already existing REFERENCE
property that has an already existing RecordType as datatype, you also need to add that RecordType’s name to the extern
list, e.g.,
extern:
# Let's assume the following is a reference property with datatype Person
- Author
# We need Person (since it's the datatype of Author) even though we might
# not use it explicitly
- Person
Dataset:
recommended_properties:
Author:
Reusing Properties
Properties defined once (either as a property of a Record or as a separate Property) can be reused later in the yaml file. That requires that after the first occurrence of the property, the attributes have to be empty. Otherwise the reuse of the property would be conflicting with its original definition.
Example:
Project:
obligatory_properties:
projectId:
datatype: INTEGER
description: 'UID of this project'
date:
datetype: DATETIME
description: Date of a project or an experiment
Experiment:
obligatory_properties:
experimentId:
datatype: INTEGER
description: 'UID of this experiment'
date: # no further attributes here, since property was defined above in 'Project'!
The above example defines two Records: Project and Experiment
The property date
is defined upon its first occurrence as a property of Project
.
Later, the same property is also added to Experiment
where no additional attributes are
allowed to specify.
Datatypes
You can use any data type understood by CaosDB as datatype attribute in the yaml model.
List attributes are a bit special:
datatype: LIST<DOUBLE>
would declare a list datatype of DOUBLE elements.
datatype: LIST<Project>
would declare a list of elements with datatype Project.
Keywords
importance: Importance of this entity. Possible values: “recommended”, “obligatory”, “suggested”
datatype: The datatype of this property, e.g. TEXT, INTEGER or Project.
unit: The unit of the property, e.g. “m/s”.
description: A description for this entity.
recommended_properties: Add properties to this entity with importance “recommended”.
obligatory_properties: Add properties to this entity with importance “obligatory”.
suggested_properties: Add properties to this entity with importance “suggested”.
inherit_from_XXX: This keyword accepts a list of other RecordTypes. Those RecordTypes are added as parents, and all Properties with at least the importance
XXX
are inherited. For example,inherited_from_recommended
will inherit all Properties of importancerecommended
andobligatory
, but notsuggested
.
Usage
You can use the yaml parser directly in python as follows:
from caosadvancedtools.models import parser as parser
model = parser.parse_model_from_yaml("model.yml")
This creates a DataModel object containing all entities defined in the yaml file.
If the parsed data model shall be appended to a pre-exsting data model, the optional
extisting_model
can be used:
new_model = parser.parse_model_from_yaml("model.yml", existing_model=old_model)
You can now use the functions from DataModel
to synchronize
the model with a CaosDB instance, e.g.:
model.sync_data_model()