caosadvancedtools.table_json_conversion package
Submodules
caosadvancedtools.table_json_conversion.convert module
Convert XLSX files to JSON dictionaries.
- exception caosadvancedtools.table_json_conversion.convert.ForeignError(*args, definitions: list, message: str = '')
Bases:
KeyError
- class caosadvancedtools.table_json_conversion.convert.XLSXConverter(xlsx: str | BinaryIO, schema: dict | str | TextIO, strict: bool = False)
Bases:
objectClass for conversion from XLSX to JSON.
For a detailed description of the required formatting of the XLSX files, see
specs.mdin the documentation.- PARSER: dict[str, Callable] = {'boolean': <function _strict_bool>, 'integer': <class 'int'>, 'number': <class 'float'>, 'string': <class 'str'>}
- to_dict(validate: bool = False, collect_errors: bool = True) dict
Convert the xlsx contents to a dict.
- Parameters:
validate (bool, default=False) – If True, validate the result against the schema.
collect_errors (bool, default=True) – If True, do not fail at the first error, but try to collect as many errors as possible. After an exception is raised, the errors can be collected with
get_errors()and printed withget_error_str().
- Returns:
out – A dict representing the JSON with the extracted data.
- Return type:
- caosadvancedtools.table_json_conversion.convert.deep_record_from_json(recdata: dict, schema: dict, parent: str, container: Container | None = None) Container
Create a single Record (and linked references recursivley) from json data.
- Parameters:
recdata (dict) – The data dictionary for the single Record.
schema (dict) – The json schema for this data.
parent (str) – The name of the RecordType for this Record.
container (db.Container, optional) – If given, use this to collect the generates entities. Not generally necessary when calling this function from the outside.
- Returns:
out – The Record and all its references (recursively). The first element is guaranteed to be the top-level element described by
data.- Return type:
db.Container
- caosadvancedtools.table_json_conversion.convert.records_from_json(data: dict, schema: dict | str | TextIO) Container
Create Records from json data.
- caosadvancedtools.table_json_conversion.convert.to_dict(xlsx: str | BinaryIO, schema: dict | str | TextIO, validate: bool = False, strict: bool = False, collect_errors: bool = True) dict
Convert the xlsx contents to a dict, it must follow a schema.
- Parameters:
xlsx (Union[str, BinaryIO]) – Path to the XLSX file or opened file object.
schema (Union[dict, str, TextIO]) – Schema for validation of XLSX content.
validate (bool, default=False) – If True, validate the result against the schema.
strict (bool, default=False) – If True, fail faster.
collect_errors (bool, default=True) – See
XLSXConverter.to_dict()for details.
- Returns:
out – A dict representing the JSON with the extracted data.
- Return type:
caosadvancedtools.table_json_conversion.export_import_xlsx module
Utilities for automatically exporting and importing data to and from xlsx.
- caosadvancedtools.table_json_conversion.export_import_xlsx.export_container_to_xlsx(records: Container, xlsx_data_filepath: str | Path, include_referenced_entities: bool = False, jsonschema_filepath: str | Path | None = None, jsondata_filepath: str | Path | None = None, xlsx_template_filepath: str | Path | None = None)
Export the data of the given records to an xlsx file.
- Parameters:
records (Container, Iterable) – List of records to export.
xlsx_data_filepath (str, Path) – Write the resulting xlsx file to the file at this location.
include_referenced_entities (bool) – If set to true, any records referenced by properties of those given in ‘records’ will also be exported. Optional, default False
jsonschema_filepath (str, Path) – If given, write the jsonschema to this file. Optional, default None
jsondata_filepath (str, Path) – If given, write the json data to this file. Optional, default None
xlsx_template_filepath (str, Path) – If given, write the xlsx template to this file. Optional, default None
Limitations
-----------
references (This function drops any versioning information from versioned)
reduced (references are)
references. (to unversioned)
caosadvancedtools.table_json_conversion.fill_xlsx module
Class and function to fill an XLSX template from actual data.
- class caosadvancedtools.table_json_conversion.fill_xlsx.TemplateFiller(workbook: Workbook, graceful: bool = False)
Bases:
objectClass to fill XLSX templates. Has an index for all relevant columns.
- class Context(current_path: list[str] | None = None, props: dict[str, Any] | None = None)
Bases:
objectContext for an entry: simple properties of all ancestors, organized in a dict.
This is similar to a dictionary with all scalar element properties at the tree nodes up to the root. Siblings in lists and dicts are ignored. Additionally the context knows where its current position is.
Lookup of elements can easily be achieved by giving the path (as
list[str]or stringified path).
- property workbook
Return the workbook of this TemplateFiller.
- caosadvancedtools.table_json_conversion.fill_xlsx.fill_template(data: dict | str | TextIO, template: str, result: str | Path, validation_schema: dict | str | TextIO | None = None) None
Insert json data into an xlsx file, according to a template.
This function fills the json data into the template stored at
templateand stores the result asresult.- Parameters:
data (Union[dict, str, TextIO]) – The data, given as Python dict, path to a file or a file-like object.
template (str) – Path to the XLSX template.
result (str) – Path for the result XLSX.
validation_schema (dict, optional) – If given, validate the date against this schema first. This raises an exception if the validation fails. If no validation schema is given, try to ignore more errors in the data when filling the XLSX template.
development (Possible future)
---------------------------
option (- Add an allow_missing_foreign)
props.get(key). (which would replace props[key] by)
caosadvancedtools.table_json_conversion.table_generator module
This module allows to generate template tables from JSON schemas.
- exception caosadvancedtools.table_json_conversion.table_generator.EmptySchemaException
Bases:
RuntimeErrorException for when a schema is empty, so that no meaningful table can be generated.
- class caosadvancedtools.table_json_conversion.table_generator.TableTemplateGenerator
Bases:
ABCBase class for generating tables from json schema
- abstractmethod generate(schema: dict, foreign_keys: dict, filepath: str, use_ids_as_foreign: bool = False)
Generate a sheet definition from a given JSON schema.
Parameters:
- schema: dict
Given JSON schema.
- foreign_keys: dict
A tree-like configuration (nested dict) that defines which attributes shall be used to create additional columns when a list of references exists. The nested dict is structured like the data model, its innermost elements are leaves of the path trees within the JSON, they define the required keys.
Suppose we want to distinguish Persons that are referenced by Trainings, thenforeign_keysmust at least contain the following:{"Training": {"Person": ["name", "email"]}}.Values within the dicts can be either a list representing the keys (as in the example above) or a dict that allows to set additional foreign keys at higher depths. In the latter case (dict instead of list) if foreign keys exist at that level (e.g. in the above example there might be further levels below “Person”), then the foreign keys can be set using the special
__this__key.Example:
{"Training": {"__this__": ["date"], "Person": ["name", "email"]}}Here,dateis the sole foreign key for Training.It probably is worth extending the first example, with a case where a “Training” shall be distiguished by the “name” and “email” of a “Person” which it references. The foreign keys for this example are specified like this:{"Training": {"__this__": [["Person", "name"], ["Person", "email"]]}}- use_ids_as_foreign: bool, optional
If True, use the id (a property named “id”) as foreign key, if the key does not exist in the dict. Default is False.
- class caosadvancedtools.table_json_conversion.table_generator.XLSXTemplateGenerator
Bases:
TableTemplateGeneratorClass for generating XLSX tables from json schema definitions.
- generate(schema: dict, foreign_keys: dict, filepath: str | Path, use_ids_as_foreign: bool = False) None
Generate a sheet definition from a given JSON schema.
Parameters:
- schema: dict
Given JSON schema
- foreign_keys: dict
A configuration that defines which attributes shall be used to create additional columns when a list of references exists. See
foreign_keysargument ofTableTemplateGenerator.generate().- filepath: Union[str, Path]
The XLSX file will be stored under this path.
- static normalize_sheet_titles(sheets: dict[str, Any]) dict[str, Any]
Shorten title with more than 31 characters. Return normalized dict.
Specification:
Short titles are left unchanged.
Titles are unique.
Long titles are shortened like this:
Titles are split into parts at the dot character (
.)Parts are reused from the end to front, until at most 26 characters are used up. These parts form the tail (joined by
.characters).A head is added, and also joined by the
.character. The head is exactly 4 characters long and consists of these parts:The first character of the first part.
An underscore character (
_).The lowest two-digit number starting from
01(so one of 01, 02, …, 98, 99), that leads to no collision with existing titles. If all options would lead to a collision, a OverflowError is thrown.
caosadvancedtools.table_json_conversion.xlsx_utils module
General utilities to work with XLSX files with (hidden) column and row annotations and typing.
The most prominent functions are:
p2s: Path to string:["some", "path"] -> "some.path"read_or_dict: Load JSON object from path, file or dict.
This module also defines these enums:
ColumnType
RowType
- class caosadvancedtools.table_json_conversion.xlsx_utils.ColumnType(*values)
Bases:
Enumcolumn types enum
- FOREIGN = 3
- IGNORE = 5
- LIST = 2
- MULTIPLE_CHOICE = 4
- SCALAR = 1
- class caosadvancedtools.table_json_conversion.xlsx_utils.RowType(*values)
Bases:
Enumrow types enum
- COL_TYPE = 1
- IGNORE = 3
- PATH = 2
- caosadvancedtools.table_json_conversion.xlsx_utils.array_schema_from_model_schema(model_schema: dict) dict
Convert a data model schema to a data array schema.
Practically, this means that the top level properties are converted into lists. In a simplified notation, this can be expressed as:
array_schema = { elem: [elem typed data...] for elem in model_schema }- Parameters:
model_schema (dict) – The schema description of the data model. Must be a json schema object, with a number of object typed properties.
- Returns:
array_schema – A corresponding json schema, where the properties are arrays with the types of the input’s top-level properties.
- Return type:
- caosadvancedtools.table_json_conversion.xlsx_utils.get_column_type_row_index(sheet: Worksheet)
Return the row index (0-indexed) of the row which defines the column types.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_data_columns(sheet: Worksheet) dict[str, SimpleNamespace]
Return the data paths of the worksheet.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_defining_paths(workbook: Workbook) dict[str, list[list[str]]]
For all sheets in
workbook, list the paths which they define.A sheet is said to define a path, if it has data columns for properties inside that path. For example, consider the following worksheet:
COL_TYPE
SCALAR
SCALAR
LIST
SCALAR
PATH
Training
Training
Training
Training
PATH
url
date
subjects
supervisor
PATH
email
example.com/mp
2024-02-27
Math;Physics
example.com/m
2024-02-27
Math
This worksheet defines properties for the paths
["Training"]and["Training", "supervisor"], and thus these two path lists would be returned for the key with this sheet’s sheetname.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_foreign_key_columns(sheet: Worksheet) dict[str, SimpleNamespace]
Return the foreign keys of the worksheet.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_path_position(sheet: Worksheet) tuple[list[str], str]
Return a path which represents the parent element, and the sheet’s “proper name”.
For top-level sheets / entries (those without foreign columns), the path is an empty list.
A sheet’s “proper name” is detected from the data column paths: it is the first component after the parent components.
- Returns:
parent (list[str]) – Path to the parent element. Note that there may be list elements on the path which are not represented in this return value.
proper_name (str) – The “proper name” of this sheet. This defines an array where all the data lives, relative to the parent path.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_path_rows(sheet: Worksheet)
Return the 0-based indices of the rows which represent paths.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_row_type_column_index(sheet: Worksheet)
Return the column index (0-indexed) of the column which defines the row types.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_subschema(path: list[str], schema: dict) dict
Return the sub schema at
path.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_worksheet_for_path(path: list[str], defining_path_index: dict[str, list[list[str]]]) str
Find the sheet name which corresponds to the given path.
- caosadvancedtools.table_json_conversion.xlsx_utils.is_exploded_sheet(sheet: Worksheet) bool
Return True if this is a an “exploded” sheet.
An exploded sheet is a sheet whose data entries are LIST valued properties of entries in another sheet. A sheet is detected as exploded if and only if it has FOREIGN columns.
- caosadvancedtools.table_json_conversion.xlsx_utils.is_recursively_none(obj: list | dict | None = None)
Test if
objis None or recursively consists only of None-like objects.
- caosadvancedtools.table_json_conversion.xlsx_utils.next_row_index(sheet: Worksheet) int
Return the index for the next data row.
This is defined as the first row without any content.
- caosadvancedtools.table_json_conversion.xlsx_utils.p2s(path: list[str]) str
Path to string: dot-separated.
Special treatment for jumping levels
Starting form the last element, if any element starts with one or more dots “.”, one preceding element will be removed for each dot. For example,
["A", "B1.B2", ".C", "..D.E"]will result in"A.D.E"(the elementsB1.B2and.Care removed by the two leading dots in..D.E).
- caosadvancedtools.table_json_conversion.xlsx_utils.parse_multiple_choice(value: Any) bool
Interpret
valueas a multiple choice input.Truthy values are: - The boolean
True. - The number “1”. - The (case-insensitive) stringstrue,wahr,x,√,yes,ja,y,j.Falsy values are: - The boolean
False. -None, empty strings, lists, dicts. - The number “0”. - The (case-insensitive) stringsfalse,falsch,-,no,nein,n. - Everything else.- Returns:
out – The interpretation result of
value.- Return type:
- caosadvancedtools.table_json_conversion.xlsx_utils.prune_none_objects(obj: list | dict) list | dict
Recursively replace elements of
obj, that are empty or recursively None, by None.- Parameters:
obj (Union[list, dict]) – The object to be tested, a json like dict or array.
- Returns:
out (Union[list, dict]) – A deep copy, with the None objects and empty sequences replaced by None.
Detailed explanation
——————–
- Elements of lists that are recursively None, shall be removed from the list.
- Elements that are empty lists / dicts are replaced by None.