CFood-Definition ================ The crawler specification is called CFood-definition. It is stored inside a yaml file, or - more precisely - inside of one single or two yaml documents inside a yaml file. The specification consists of three separate parts: #. Metadata and macro definitions #. Custom converter registrations #. The converter tree specification In the simplest case, there is just one yaml file with just a single document including at least the converter tree specification (see :ref:`example 1`). Additionally the custom converter part may be also included in this single document (for historical reasons, see :ref:`example 2`), but it is recommended to include them in the separate document together with the metadata and :doc:`macro` definitions (see :ref:`below`). If metadata and macro definitions are provided, there **must** be a second document preceeding the converter tree specification, including these definitions. It is highly recommended to specify the version of the CaosDB crawler for which the cfood is written in the metadata section, see :ref:`below`. Examples ++++++++ A single document with a converter tree specification: .. _example_1: .. code-block:: yaml extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) A single document with a converter tree specification, but also including a custom converters section: .. _example_2: .. code-block:: yaml Converters: CustomConverter_1: package: mypackage.converters converter: CustomConverter1 CustomConverter_2: package: mypackage.converters converter: CustomConverter2 extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) A yaml multi-document, defining metadata and some macros in the first document and declaring two custom converters in the second document (**not recommended**, see the recommended version :ref:`below`). Please note, that two separate yaml documents can be defined using the ``---`` syntax: .. _example_3: .. code-block:: yaml --- metadata: name: Datascience CFood description: CFood for data from the local data science work group crawler-version: 0.2.1 macros: - !defmacro name: SimulationDatasetFile params: match: null recordtype: null nodename: null definition: # (...) --- Converters: CustomConverter_1: package: mypackage.converters converter: CustomConverter1 CustomConverter_2: package: mypackage.converters converter: CustomConverter2 extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) The **recommended way** of defining metadata, custom converters, macros and the main cfood specification is shown in the following code example: .. _example_4: .. code-block:: yaml --- metadata: name: Datascience CFood description: CFood for data from the local data science work group crawler-version: 0.2.1 macros: - !defmacro name: SimulationDatasetFile params: match: null recordtype: null nodename: null definition: # (...) Converters: CustomConverter_1: package: mypackage.converters converter: CustomConverter1 CustomConverter_2: package: mypackage.converters converter: CustomConverter2 --- extroot: type: Directory match: ^extroot$ subtree: DataAnalysis: type: Directory match: DataAnalysis # (...) List Mode --------- Specifying values of properties can make use of two special characters, in order to automatically create lists or multi properties instead of single values: .. code-block:: yaml Experiment1: Measurement: +Measurement # Element in List (list is cleared before run) *Measurement # Multi Property (properties are removed before run) Measurement # Overwrite File Entities ------------- In order to use File Entities, you must set the appropriate ``role: File``. Additionally, the path and file keys have to be given, with values that set the paths remotely and locally, respectively. You can use the variable ``_path`` that is automatically created by converters that deal with file system related StructureElements. The file object itsself is stored in a vairable with the same name (as it is the case for other Records). .. code-block:: yaml somefile: type: SimpleFile match: ^params.*$ # macht any file that starts with "params" records: fileEntity: role: File # necessary to create a File Entity path: somefile.path # defines the path in CaosDB file: somefile.path # path where the file is found locally SomeRecord: ParameterFile: $fileEntity # creates a reference to the file Transform Functions ------------------- You can use transform functions to alter variable values that the crawler consumes (e.g. a string that was matched with a reg exp). See :doc:`Converter Documentation`. You can define your own transform functions by adding the the same way you add custom converters: .. code-block:: yaml Transformers: transform_foo: package: some.package function: some_foo Automatically generated keys ++++++++++++++++++++++++++++ Some variable names are automatically generated and can be used using the ``$`` syntax. Those include: - ````: access the path of converter names to the current converter - ``.path``: the file system path to the structure element (file system related converters only; you need curly brackets to use them: ``${.path}``) - ````: all entities that are created in the ``records`` section are available under the same key