caoscrawler.validator module

This module contains functions to validate the output of a scanner run with a json schema.

caoscrawler.validator.load_json_schema_from_datamodel_yaml(filename: str) dict[str, dict]

Load a data model yaml file (using caosadvancedtools) and convert all record types into a json schema using the json_schema_exporter module.

Parameters:

filename (str) – The filename of the yaml file to load.

Returns:

  • A dict of json schema objects. The keys are the record types for which the schemas

  • are generated.

caoscrawler.validator.representer_ordereddict(dumper, data)

Helper function to be able to represent the converted json schema objects correctly as yaml. This representer essentially replaced OrderedDict objects with simple dict objects.

Since Python 3.7 dicts are ordered by default, see e.g.: https://softwaremaniacs.org/blog/2020/02/05/dicts-ordered/en/

Example how to use the representer: `python yaml.add_representer(OrderedDict, caoscrawler.validator.representer_ordereddict) `

caoscrawler.validator.convert_record(record: Record)

Convert a record into a form suitable for validation with jsonschema.

Uses high_level_api.convert_to_python_object Afterwards _apply_schema_patches is called recursively to refactor the dictionary to match the current form of the jsonschema.

Arguments:

record: db.Record

The record that is supposed to be converted.

caoscrawler.validator.validate(records: list[Record], schemas: dict[str, dict]) list[tuple]

Validate a list of records against a dictionary of schemas. The keys of the dictionary are record types and the corresponding values are json schemata associated with that record type. The current implementation assumes that each record that is checked has exactly one parent and raises an error if that is not the case. The schema belonging to a record is identified using the name of the first (and only) parent of the record.

Arguments:

records: list[db.Record]

List of records that will be validated.

schemas: dict[str, dict]

A dictionary of JSON schemas generated using load_json_schema_from_datamodel_yaml.

Returns:

A list of tuples, one element for each record:

  • Index 0: A boolean that determines whether the schema belonging to the record type of the

    record matched.

  • Index 1: A validation error if the schema did not match or None otherwise.