How to upgrade

0.4.0 to 0.5.0

The crawler was split into two modules: the scanner and the crawler. The scanner creates a Record structure from the data and the crawler synchronizes this with the server. Due to this change you should:

  • Remove the debug argument from the Crawler constructor. For debugging supply a DebugTree as argument to functions like the scanner.

  • Remove the generalStore argument from the Crawler constructor. A store can no longer be provided to the crawler.

  • load_definition and initialize_converters are now part of the scanner module

  • crawl_directory is replcaced by scan_directory of the scanner module

  • start_crawling is replcaced by scan_structure_elements of the scanner module

0.2.x to 0.3.0

DictElementConverter (old: DictConverter) now can use “match” keywords. If none are in the definition, the behavior is as before. If you had “match”, “match_name” or “match_value” in the definition of a

DictConverter (StructureElement: Dict) before, you probably want to remove those. They were ignored before and are now used.

TextElement used the ‘match’ keyword before, which was applied to the value. This is will in future be applied to the key instead and is now forbidden to used. If you used the ‘match’ keyword in the definition of TextElementConverter (StructureElement: TextElement) before, you need to change the key from “match” to “match_name” in order to preserve the behavior.

The JSONFileConverter was changed such that it creates StructureElements as children corresponding to the content. I.e. if there is an Object (dict in Python) in the file, the children will be a list with one DictElement. Before, only JSON files with one Object were accepted and the key value pairs of the dict were directly transformed to children. This means, that all previously used JSONFileConverters need to introduce a new level for the DictElement:

    json:
      type: JSONFile
      match: metadata.json
      validate: schema/dataset.schema.json
      subtree:
        jsondict:                               # new
          type: DictElement                     # new
          match: .*                             # new
          subtree:                              # new