How to upgrade
0.8.x to 0.9.0
If you were using the optional HDF5 converter classes, you need to adapt the package path in your cfood definition from the old
Converters:
H5Dataset:
converter: H5DatasetConverter
package: caoscrawler.hdf5_converter
H5File:
converter: H5FileConverter
package: caoscrawler.hdf5_converter
H5Group:
converter: H5GroupConverter
package: caoscrawler.hdf5_converter
H5Ndarray:
converter: H5NdarrayConverter
package: caoscrawler.hdf5_converter
to the new paths:
Converters:
H5Dataset:
converter: H5DatasetConverter
package: caoscrawler.converters.hdf5_converter
H5File:
converter: H5FileConverter
package: caoscrawler.converters.hdf5_converter
H5Group:
converter: H5GroupConverter
package: caoscrawler.converters.hdf5_converter
H5Ndarray:
converter: H5NdarrayConverter
package: caoscrawler.converters.hdf5_converter
0.6.x to 0.7.0
If you added Parents to Records at multiple places in the CFood, you must now do this at a single location because this key now overwrites previously set parents.
0.5.x to 0.6.0
#41 was fixed. This means that you previously used the name of Entities as an identifying property without adding it to the identifiable definition, you now need to add ‘name’ explicitly.
0.4.x to 0.5.0
The crawler was split into two modules: the scanner and the crawler. The scanner creates a Record structure from the data and the crawler synchronizes this with the server. Due to this change you should:
Remove the
debug
argument from the Crawler constructor. For debugging supply a DebugTree as argument to functions like the scanner.Remove the
generalStore
argument from the Crawler constructor. A store can no longer be provided to the crawler.load_definition
andinitialize_converters
are now part of the scanner modulecrawl_directory
is replcaced byscan_directory
of the scanner modulestart_crawling
is replcaced byscan_structure_elements
of the scanner module
0.2.x to 0.3.0
DictElementConverter (old: DictConverter) now can use “match” keywords. If none are in the definition, the behavior is as before. If you had “match”, “match_name” or “match_value” in the definition of a
DictConverter (StructureElement: Dict) before, you probably want to remove those. They were ignored before and are now used.
TextElement used the ‘match’ keyword before, which was applied to the value. This is will in future be applied to the key instead and is now forbidden to used. If you used the ‘match’ keyword in the definition of TextElementConverter (StructureElement: TextElement) before, you need to change the key from “match” to “match_name” in order to preserve the behavior.
The JSONFileConverter was changed such that it creates StructureElements as children corresponding to the content. I.e. if there is an Object (dict in Python) in the file, the children will be a list with one DictElement. Before, only JSON files with one Object were accepted and the key value pairs of the dict were directly transformed to children. This means, that all previously used JSONFileConverters need to introduce a new level for the DictElement:
json:
type: JSONFile
match: metadata.json
validate: schema/dataset.schema.json
subtree:
jsondict: # new
type: DictElement # new
match: .* # new
subtree: # new