Tutorial: Parameter File ======================== Our data -------- In the "HelloWorld" Example, the Record, that was synchronized with the server, was created "manually" using the Python client. Now, we want to have a look at how the Crawler can be told to do this for us. The Crawler needs instructions on what kind of Records it should create given the data that it sees. This is done using so called "CFood" YAML files. Let’s once again start with something simple. A common scenario is that we want to insert the contents of a parameter file. Suppose the parameter file is named ``params_2022-02-02.json`` and looks like the following: .. code-block:: json :caption: params_2022-02-02.json { "frequency": 0.5, "resolution": 0.01 } Suppose these are two Properties of an Experiment and the date in the file name is the date of the Experiment. Thus, the data model could be described in a ``model.yml`` like this: .. code-block:: yaml :caption: model.yml Experiment: recommended_properties: frequency: datatype: DOUBLE resolution: datatype: DOUBLE date: datatype: DATETIME We will identify experiments solely using the date, so the ``identifiable.yml`` is: .. code-block:: yaml :caption: identifiable.yml Experiment: - date Getting started with the CFood ------------------------------ CFoods (Crawler configurations) can be stored in YAML files: The following section in a `cfood.yml` tells the crawler that the key value pair ``frequency: 0.5`` shall be used to set the Property "frequency" of an "Experiment" Record: .. code:: yaml ... my_frequency: # just the name of this section type: FloatElement # it is a float value match_name: ^frequency$ # regular expression: Match the 'frequency' key from the data json match_value: ^(?P.*)$ # regular expression: We match any value of that key records: Experiment: frequency: $value ... The first part of this section defines which kind of data element shall be handled (here: a key-value pair with key "frequency" and a float value) and then we use this to set the "frequency" Property. How does it work to actually assign the value? Let's look at what the regular expressions do: - ``^frequency$`` assures that the key is exactly "frequency". "^" matches the beginning of the string and "$" matches the end. - ``^(?P.*)$`` creates a *named match group* with the name "value" and the pattern of this group is ".*". The dot matches any character and the star means that it can occur zero, one or multiple times. Thus, this regular expression matches anything and puts it in a group with the name ``value``. We can use the groups from the regular expressions that are used for matching. In our example, we use the "value" group to assign the "frequency" value to the "Experiment". A fully grown CFood ------------------- Since we will not pass this key value pair on its own to the crawler, we need to embed it into its context. The full CFood file ``cfood.yml`` for this example might look like the following: .. code-block:: yaml :caption: cfood.yml --- metadata: crawler-version: 0.5.0 --- directory: # corresponds to the directory given to the crawler type: Directory match: .* # we do not care how it is named here subtree: parameterfile: # corresponds to our parameter file type: JSONFile match: params_(?P\d+-\d+-\d+)\.json # extract the date from the parameter file records: Experiment: # one Experiment is associated with the file date: $date # the date is taken from the file name subtree: dict: # the JSON contains a dictionary type: Dict match: .* # the dictionary does not have a meaningful name subtree: my_frequency: # here we parse the frequency... type: FloatElement match_name: frequency match_value: (?P.*) records: Experiment: frequency: $val resolution: # ... and here the resolution type: FloatElement match_name: resolution match_value: (?P.*) records: Experiment: resolution: $val You do not need to understand every aspect of this right now. We will cover this later in greater depth. You might think: "Ohh.. This is lengthy". Well, yes BUT this is a very generic approach that allows data integration from ANY hierarchical data structure (directory trees, JSON, YAML, HDF5, DICOM, ... and combinations of those!) and as you will see in later chapters there are ways to write this in a more condensed way! For now, we want to see it running! The crawler can now be run with the following command (assuming that the CFood file is in the current working directory): .. code:: sh caosdb-crawler -s update -i identifiables.yml cfood.yml .