caoscrawler.converters.xml_converter module
Converters take structure elements and create Records and new structure elements from them.
- class caoscrawler.converters.xml_converter.XMLFileConverter(definition: dict, name: str, converter_registry: dict)
Bases:
SimpleFileConverter
Convert XML files. See https://gitlab.indiscale.com/caosdb/src/caosdb-crawler/-/issues/145 for the current suggestion for the specification.
- create_children(generalStore: GeneralStore, element: StructureElement)
- class caoscrawler.converters.xml_converter.XMLTagConverter(definition: dict, name: str, converter_registry: dict)
Bases:
Converter
- create_children(generalStore: GeneralStore, element: StructureElement)
Children that are generated by this function are the result of the xpath query given in the yaml property
xpath
. Its default (when not given) ischild::*
, so the direct children of the current xml node. The xpath expression must be designed in a way that it returns xml tags (and no attributes or texts). That means, that the axisattribute::
and the functiontext()
must not be used.The following yaml properties can be used to generate other types of nodes (text nodes and attribute nodes) as subtree structure elements:
# _*_ marks the default: attribs_as_children: true # true / _false_ text_as_children: true # true / _false_ tags_as_children: true # _true_ / false
The default is to generate the tags matched by the xpath expression only.
When text_as_children is set to true, text nodes will be generated that contain the text contained in the matched tags.
When attribs_as_children is set to true, attribute nodes will be generated from the attributes of the matched tags.
Notes
The default is to take the namespace map from the current node and use it in xpath queries. Because default namespaces cannot be handled by xpath, it is possible to remap the default namespace using the key
default_namespace
. The keynsmap
can be used to define additional nsmap entries.
- typecheck(element: StructureElement)
Check whether the current structure element can be converted using this converter.
- match(element: StructureElement) dict | None
This method is used to implement detailed checks for matching compatibility of the current structure element with this converter.
The return value is a dictionary providing possible matched variables from the structure elements information.
- class caoscrawler.converters.xml_converter.XMLTextNodeConverter(definition: dict, name: str, converter_registry: dict)
Bases:
Converter
- create_children(generalStore: GeneralStore, element: StructureElement)
This converter does not create children.
- typecheck(element: StructureElement)
Check whether the current structure element can be converted using this converter.
- match(element: StructureElement) dict | None
This method is used to implement detailed checks for matching compatibility of the current structure element with this converter.
The return value is a dictionary providing possible matched variables from the structure elements information.
- class caoscrawler.converters.xml_converter.XMLAttributeNodeConverter(definition: dict, name: str, converter_registry: dict)
Bases:
Converter
- create_children(generalStore: GeneralStore, element: StructureElement)
This converter does not create children.
- typecheck(element: StructureElement)
Check whether the current structure element can be converted using this converter.
- match(element: StructureElement) dict | None
This method is used to implement detailed checks for matching compatibility of the current structure element with this converter.
The return value is a dictionary providing possible matched variables from the structure elements information.