caoscrawler.converters.xml_converter module

Converters take structure elements and create Records and new structure elements from them.

class caoscrawler.converters.xml_converter.XMLFileConverter(definition: dict, name: str, converter_registry: dict)

Bases: SimpleFileConverter

Convert XML files. See https://gitlab.indiscale.com/caosdb/src/caosdb-crawler/-/issues/145 for the current suggestion for the specification.

create_children(generalStore: GeneralStore, element: StructureElement)
class caoscrawler.converters.xml_converter.XMLTagConverter(definition: dict, name: str, converter_registry: dict)

Bases: Converter

create_children(generalStore: GeneralStore, element: StructureElement)

Children that are generated by this function are the result of the xpath query given in the yaml property xpath. Its default (when not given) is child::*, so the direct children of the current xml node. The xpath expression must be designed in a way that it returns xml tags (and no attributes or texts). That means, that the axis attribute:: and the function text() must not be used.

The following yaml properties can be used to generate other types of nodes (text nodes and attribute nodes) as subtree structure elements:

# _*_ marks the default:
attribs_as_children: true  # true / _false_
text_as_children: true  # true / _false_
tags_as_children: true  # _true_ / false

The default is to generate the tags matched by the xpath expression only.

  • When text_as_children is set to true, text nodes will be generated that contain the text contained in the matched tags.

  • When attribs_as_children is set to true, attribute nodes will be generated from the attributes of the matched tags.

Notes

The default is to take the namespace map from the current node and use it in xpath queries. Because default namespaces cannot be handled by xpath, it is possible to remap the default namespace using the key default_namespace. The key nsmap can be used to define additional nsmap entries.

typecheck(element: StructureElement)

Check whether the current structure element can be converted using this converter.

match(element: StructureElement) dict | None

This method is used to implement detailed checks for matching compatibility of the current structure element with this converter.

The return value is a dictionary providing possible matched variables from the structure elements information.

class caoscrawler.converters.xml_converter.XMLTextNodeConverter(definition: dict, name: str, converter_registry: dict)

Bases: Converter

create_children(generalStore: GeneralStore, element: StructureElement)

This converter does not create children.

typecheck(element: StructureElement)

Check whether the current structure element can be converted using this converter.

match(element: StructureElement) dict | None

This method is used to implement detailed checks for matching compatibility of the current structure element with this converter.

The return value is a dictionary providing possible matched variables from the structure elements information.

class caoscrawler.converters.xml_converter.XMLAttributeNodeConverter(definition: dict, name: str, converter_registry: dict)

Bases: Converter

create_children(generalStore: GeneralStore, element: StructureElement)

This converter does not create children.

typecheck(element: StructureElement)

Check whether the current structure element can be converted using this converter.

match(element: StructureElement) dict | None

This method is used to implement detailed checks for matching compatibility of the current structure element with this converter.

The return value is a dictionary providing possible matched variables from the structure elements information.