How to upgrade

0.8.x to 0.9.0

If you were using the optional HDF5 converter classes, you need to adapt the package path in your cfood definition from the old

Converters:
  H5Dataset:
    converter: H5DatasetConverter
    package: caoscrawler.hdf5_converter
  H5File:
    converter: H5FileConverter
    package: caoscrawler.hdf5_converter
  H5Group:
    converter: H5GroupConverter
    package: caoscrawler.hdf5_converter
  H5Ndarray:
    converter: H5NdarrayConverter
    package: caoscrawler.hdf5_converter

to the new paths:

Converters:
  H5Dataset:
    converter: H5DatasetConverter
    package: caoscrawler.converters.hdf5_converter
  H5File:
    converter: H5FileConverter
    package: caoscrawler.converters.hdf5_converter
  H5Group:
    converter: H5GroupConverter
    package: caoscrawler.converters.hdf5_converter
  H5Ndarray:
    converter: H5NdarrayConverter
    package: caoscrawler.converters.hdf5_converter

0.6.x to 0.7.0

If you added Parents to Records at multiple places in the CFood, you must now do this at a single location because this key now overwrites previously set parents.

0.5.x to 0.6.0

#41 was fixed. This means that you previously used the name of Entities as an identifying property without adding it to the identifiable definition, you now need to add ‘name’ explicitly.

0.4.x to 0.5.0

The crawler was split into two modules: the scanner and the crawler. The scanner creates a Record structure from the data and the crawler synchronizes this with the server. Due to this change you should:

  • Remove the debug argument from the Crawler constructor. For debugging supply a DebugTree as argument to functions like the scanner.

  • Remove the generalStore argument from the Crawler constructor. A store can no longer be provided to the crawler.

  • load_definition and initialize_converters are now part of the scanner module

  • crawl_directory is replcaced by scan_directory of the scanner module

  • start_crawling is replcaced by scan_structure_elements of the scanner module

0.2.x to 0.3.0

DictElementConverter (old: DictConverter) now can use “match” keywords. If none are in the definition, the behavior is as before. If you had “match”, “match_name” or “match_value” in the definition of a

DictConverter (StructureElement: Dict) before, you probably want to remove those. They were ignored before and are now used.

TextElement used the ‘match’ keyword before, which was applied to the value. This is will in future be applied to the key instead and is now forbidden to used. If you used the ‘match’ keyword in the definition of TextElementConverter (StructureElement: TextElement) before, you need to change the key from “match” to “match_name” in order to preserve the behavior.

The JSONFileConverter was changed such that it creates StructureElements as children corresponding to the content. I.e. if there is an Object (dict in Python) in the file, the children will be a list with one DictElement. Before, only JSON files with one Object were accepted and the key value pairs of the dict were directly transformed to children. This means, that all previously used JSONFileConverters need to introduce a new level for the DictElement:

    json:
      type: JSONFile
      match: metadata.json
      validate: schema/dataset.schema.json
      subtree:
        jsondict:                               # new
          type: DictElement                     # new
          match: .*                             # new
          subtree:                              # new