caosadvancedtools.table_json_conversion package

Submodules

caosadvancedtools.table_json_conversion.convert module

Convert XLSX files to JSON dictionaries.

exception caosadvancedtools.table_json_conversion.convert.ForeignError(*args, definitions: list, message: str = '')

Bases: KeyError

class caosadvancedtools.table_json_conversion.convert.XLSXConverter(xlsx: str | BinaryIO, schema: dict | str | TextIO, strict: bool = False)

Bases: object

Class for conversion from XLSX to JSON.

For a detailed description of the required formatting of the XLSX files, see specs.md in the documentation.

PARSER: dict[str, Callable] = {'boolean': <function _strict_bool>, 'integer': <class 'int'>, 'number': <class 'float'>, 'string': <class 'str'>}
get_error_str() str

Return a beautiful string with the collected errors.

get_errors() dict

Return a dict with collected errors.

to_dict(validate: bool = False, collect_errors: bool = True) dict

Convert the xlsx contents to a dict.

Parameters:
  • validate (bool, default=False) – If True, validate the result against the schema.

  • collect_errors (bool, default=True) – If True, do not fail at the first error, but try to collect as many errors as possible. After an exception is raised, the errors can be collected with get_errors() and printed with get_error_str().

Returns:

out – A dict representing the JSON with the extracted data.

Return type:

dict

caosadvancedtools.table_json_conversion.convert.deep_record_from_json(recdata: dict, schema: dict, parent: str, container: Container | None = None) Container

Create a single Record (and linked references recursivley) from json data.

Parameters:
  • recdata (dict) – The data dictionary for the single Record.

  • schema (dict) – The json schema for this data.

  • parent (str) – The name of the RecordType for this Record.

  • container (db.Container, optional) – If given, use this to collect the generates entities. Not generally necessary when calling this function from the outside.

Returns:

out – The Record and all its references (recursively). The first element is guaranteed to be the top-level element described by data.

Return type:

db.Container

caosadvancedtools.table_json_conversion.convert.records_from_json(data: dict, schema: dict | str | TextIO) Container

Create Records from json data.

Parameters:
  • data (dict) – The data dictionary, as returned by to_dict.

  • schema (dict) – The json schema for this data.

caosadvancedtools.table_json_conversion.convert.to_dict(xlsx: str | BinaryIO, schema: dict | str | TextIO, validate: bool = False, strict: bool = False, collect_errors: bool = True) dict

Convert the xlsx contents to a dict, it must follow a schema.

Parameters:
  • xlsx (Union[str, BinaryIO]) – Path to the XLSX file or opened file object.

  • schema (Union[dict, str, TextIO]) – Schema for validation of XLSX content.

  • validate (bool, default=False) – If True, validate the result against the schema.

  • strict (bool, default=False) – If True, fail faster.

  • collect_errors (bool, default=True) – See XLSXConverter.to_dict() for details.

Returns:

out – A dict representing the JSON with the extracted data.

Return type:

dict

caosadvancedtools.table_json_conversion.export_import_xlsx module

Utilities for automatically exporting and importing data to and from xlsx.

caosadvancedtools.table_json_conversion.export_import_xlsx.export_container_to_xlsx(records: Container, xlsx_data_filepath: str | Path, include_referenced_entities: bool = False, jsonschema_filepath: str | Path | None = None, jsondata_filepath: str | Path | None = None, xlsx_template_filepath: str | Path | None = None)

Export the data of the given records to an xlsx file.

Parameters:
  • records (Container, Iterable) – List of records to export.

  • xlsx_data_filepath (str, Path) – Write the resulting xlsx file to the file at this location.

  • include_referenced_entities (bool) – If set to true, any records referenced by properties of those given in ‘records’ will also be exported. Optional, default False

  • jsonschema_filepath (str, Path) – If given, write the jsonschema to this file. Optional, default None

  • jsondata_filepath (str, Path) – If given, write the json data to this file. Optional, default None

  • xlsx_template_filepath (str, Path) – If given, write the xlsx template to this file. Optional, default None

  • Limitations

  • -----------

  • references (This function drops any versioning information from versioned)

  • reduced (references are)

  • references. (to unversioned)

caosadvancedtools.table_json_conversion.fill_xlsx module

Class and function to fill an XLSX template from actual data.

class caosadvancedtools.table_json_conversion.fill_xlsx.TemplateFiller(workbook: Workbook, graceful: bool = False)

Bases: object

Class to fill XLSX templates. Has an index for all relevant columns.

class Context(current_path: list[str] | None = None, props: dict[str, Any] | None = None)

Bases: object

Context for an entry: simple properties of all ancestors, organized in a dict.

This is similar to a dictionary with all scalar element properties at the tree nodes up to the root. Siblings in lists and dicts are ignored. Additionally the context knows where its current position is.

Lookup of elements can easily be achieved by giving the path (as list[str] or stringified path).

copy() Context

Deep copy.

fill_from_data(data: dict[str, Any])

Fill current level with all scalar elements of data.

next_level(next_level: str) Context

Return a copy of this Context, with the path appended by next_level.

fill_data(data: dict)

Fill the data into the workbook.

property workbook

Return the workbook of this TemplateFiller.

caosadvancedtools.table_json_conversion.fill_xlsx.fill_template(data: dict | str | TextIO, template: str, result: str | Path, validation_schema: dict | str | TextIO | None = None) None

Insert json data into an xlsx file, according to a template.

This function fills the json data into the template stored at template and stores the result as result.

Parameters:
  • data (Union[dict, str, TextIO]) – The data, given as Python dict, path to a file or a file-like object.

  • template (str) – Path to the XLSX template.

  • result (str) – Path for the result XLSX.

  • validation_schema (dict, optional) – If given, validate the date against this schema first. This raises an exception if the validation fails. If no validation schema is given, try to ignore more errors in the data when filling the XLSX template.

  • development (Possible future)

  • ---------------------------

  • option (- Add an allow_missing_foreign)

  • props.get(key). (which would replace props[key] by)

caosadvancedtools.table_json_conversion.table_generator module

This module allows to generate template tables from JSON schemas.

exception caosadvancedtools.table_json_conversion.table_generator.EmptySchemaException

Bases: RuntimeError

Exception for when a schema is empty, so that no meaningful table can be generated.

class caosadvancedtools.table_json_conversion.table_generator.TableTemplateGenerator

Bases: ABC

Base class for generating tables from json schema

abstractmethod generate(schema: dict, foreign_keys: dict, filepath: str, use_ids_as_foreign: bool = False)

Generate a sheet definition from a given JSON schema.

Parameters:

schema: dict

Given JSON schema.

foreign_keys: dict

A tree-like configuration (nested dict) that defines which attributes shall be used to create additional columns when a list of references exists. The nested dict is structured like the data model, its innermost elements are leaves of the path trees within the JSON, they define the required keys.

Suppose we want to distinguish Persons that are referenced by Trainings, then foreign_keys must at least contain the following:
{"Training": {"Person": ["name", "email"]}}.

Values within the dicts can be either a list representing the keys (as in the example above) or a dict that allows to set additional foreign keys at higher depths. In the latter case (dict instead of list) if foreign keys exist at that level (e.g. in the above example there might be further levels below “Person”), then the foreign keys can be set using the special __this__ key.

Example: {"Training": {"__this__": ["date"], "Person": ["name", "email"]}} Here, date is the sole foreign key for Training.

It probably is worth extending the first example, with a case where a “Training” shall be distiguished by the “name” and “email” of a “Person” which it references. The foreign keys for this example are specified like this:
{"Training": {"__this__": [["Person", "name"], ["Person", "email"]]}}
use_ids_as_foreign: bool, optional

If True, use the id (a property named “id”) as foreign key, if the key does not exist in the dict. Default is False.

class caosadvancedtools.table_json_conversion.table_generator.XLSXTemplateGenerator

Bases: TableTemplateGenerator

Class for generating XLSX tables from json schema definitions.

generate(schema: dict, foreign_keys: dict, filepath: str | Path, use_ids_as_foreign: bool = False) None

Generate a sheet definition from a given JSON schema.

Parameters:

schema: dict

Given JSON schema

foreign_keys: dict

A configuration that defines which attributes shall be used to create additional columns when a list of references exists. See foreign_keys argument of TableTemplateGenerator.generate() .

filepath: Union[str, Path]

The XLSX file will be stored under this path.

static normalize_sheet_titles(sheets: dict[str, Any]) dict[str, Any]

Shorten title with more than 31 characters. Return normalized dict.

Specification:

  • Short titles are left unchanged.

  • Titles are unique.

  • Long titles are shortened like this:

    1. Titles are split into parts at the dot character (.)

    2. Parts are reused from the end to front, until at most 26 characters are used up. These parts form the tail (joined by . characters).

    3. A head is added, and also joined by the . character. The head is exactly 4 characters long and consists of these parts:

      • The first character of the first part.

      • An underscore character (_).

      • The lowest two-digit number starting from 01 (so one of 01, 02, …, 98, 99), that leads to no collision with existing titles. If all options would lead to a collision, a OverflowError is thrown.

Parameters:

sheets (dict[str, Any]) – The title -> content dict of sheets.

Returns:

out – Same as the input sheets, but long titles (keys) are shortened in a sensible way.

Return type:

dict[str, Any]

caosadvancedtools.table_json_conversion.xlsx_utils module

General utilities to work with XLSX files with (hidden) column and row annotations and typing.

The most prominent functions are:

  • p2s: Path to string: ["some", "path"] -> "some.path"

  • read_or_dict: Load JSON object from path, file or dict.

This module also defines these enums:

  • ColumnType

  • RowType

class caosadvancedtools.table_json_conversion.xlsx_utils.ColumnType(*values)

Bases: Enum

column types enum

FOREIGN = 3
IGNORE = 5
LIST = 2
MULTIPLE_CHOICE = 4
SCALAR = 1
class caosadvancedtools.table_json_conversion.xlsx_utils.RowType(*values)

Bases: Enum

row types enum

COL_TYPE = 1
IGNORE = 3
PATH = 2
caosadvancedtools.table_json_conversion.xlsx_utils.array_schema_from_model_schema(model_schema: dict) dict

Convert a data model schema to a data array schema.

Practically, this means that the top level properties are converted into lists. In a simplified notation, this can be expressed as:

array_schema = { elem: [elem typed data...] for elem in model_schema }

Parameters:

model_schema (dict) – The schema description of the data model. Must be a json schema object, with a number of object typed properties.

Returns:

array_schema – A corresponding json schema, where the properties are arrays with the types of the input’s top-level properties.

Return type:

dict

caosadvancedtools.table_json_conversion.xlsx_utils.get_column_type_row_index(sheet: Worksheet)

Return the row index (0-indexed) of the row which defines the column types.

caosadvancedtools.table_json_conversion.xlsx_utils.get_data_columns(sheet: Worksheet) dict[str, SimpleNamespace]

Return the data paths of the worksheet.

Returns:

out – The keys are the stringified paths. The values are SimpleNamespace objects with index, path and column attributes.

Return type:

dict[str, SimpleNamespace]

caosadvancedtools.table_json_conversion.xlsx_utils.get_defining_paths(workbook: Workbook) dict[str, list[list[str]]]

For all sheets in workbook, list the paths which they define.

A sheet is said to define a path, if it has data columns for properties inside that path. For example, consider the following worksheet:

COL_TYPE

SCALAR

SCALAR

LIST

SCALAR

PATH

Training

Training

Training

Training

PATH

url

date

subjects

supervisor

PATH

email

example.com/mp

2024-02-27

Math;Physics

steve@example.com

example.com/m

2024-02-27

Math

stella@example.com

This worksheet defines properties for the paths ["Training"] and ["Training", "supervisor"], and thus these two path lists would be returned for the key with this sheet’s sheetname.

Parameters:

workbook (Workbook) – The workbook to analyze.

Returns:

out – A dict with worksheet names as keys and lists of paths (represented as string lists) as values.

Return type:

dict[str, list[list[str]]

caosadvancedtools.table_json_conversion.xlsx_utils.get_foreign_key_columns(sheet: Worksheet) dict[str, SimpleNamespace]

Return the foreign keys of the worksheet.

Returns:

out – The keys are the stringified paths. The values are SimpleNamespace objects with index, path and column attributes.

Return type:

dict[str, SimpleNamespace]

caosadvancedtools.table_json_conversion.xlsx_utils.get_path_position(sheet: Worksheet) tuple[list[str], str]

Return a path which represents the parent element, and the sheet’s “proper name”.

For top-level sheets / entries (those without foreign columns), the path is an empty list.

A sheet’s “proper name” is detected from the data column paths: it is the first component after the parent components.

Returns:

  • parent (list[str]) – Path to the parent element. Note that there may be list elements on the path which are not represented in this return value.

  • proper_name (str) – The “proper name” of this sheet. This defines an array where all the data lives, relative to the parent path.

caosadvancedtools.table_json_conversion.xlsx_utils.get_path_rows(sheet: Worksheet)

Return the 0-based indices of the rows which represent paths.

caosadvancedtools.table_json_conversion.xlsx_utils.get_row_type_column_index(sheet: Worksheet)

Return the column index (0-indexed) of the column which defines the row types.

caosadvancedtools.table_json_conversion.xlsx_utils.get_subschema(path: list[str], schema: dict) dict

Return the sub schema at path.

caosadvancedtools.table_json_conversion.xlsx_utils.get_worksheet_for_path(path: list[str], defining_path_index: dict[str, list[list[str]]]) str

Find the sheet name which corresponds to the given path.

caosadvancedtools.table_json_conversion.xlsx_utils.is_exploded_sheet(sheet: Worksheet) bool

Return True if this is a an “exploded” sheet.

An exploded sheet is a sheet whose data entries are LIST valued properties of entries in another sheet. A sheet is detected as exploded if and only if it has FOREIGN columns.

caosadvancedtools.table_json_conversion.xlsx_utils.is_recursively_none(obj: list | dict | None = None)

Test if obj is None or recursively consists only of None-like objects.

Parameters:

obj (Union[list, dict, None]) – The object to be tested, a json like dict or array, or None.

caosadvancedtools.table_json_conversion.xlsx_utils.next_row_index(sheet: Worksheet) int

Return the index for the next data row.

This is defined as the first row without any content.

caosadvancedtools.table_json_conversion.xlsx_utils.p2s(path: list[str]) str

Path to string: dot-separated.

Special treatment for jumping levels

Starting form the last element, if any element starts with one or more dots “.”, one preceding element will be removed for each dot. For example, ["A", "B1.B2", ".C", "..D.E"] will result in "A.D.E" (the elements B1.B2 and .C are removed by the two leading dots in ..D.E).

caosadvancedtools.table_json_conversion.xlsx_utils.parse_multiple_choice(value: Any) bool

Interpret value as a multiple choice input.

Truthy values are: - The boolean True. - The number “1”. - The (case-insensitive) strings true, wahr, x, , yes, ja, y, j.

Falsy values are: - The boolean False. - None, empty strings, lists, dicts. - The number “0”. - The (case-insensitive) strings false, falsch, -, no, nein, n. - Everything else.

Returns:

out – The interpretation result of value.

Return type:

bool

caosadvancedtools.table_json_conversion.xlsx_utils.prune_none_objects(obj: list | dict) list | dict

Recursively replace elements of obj, that are empty or recursively None, by None.

Parameters:

obj (Union[list, dict]) – The object to be tested, a json like dict or array.

Returns:

  • out (Union[list, dict]) – A deep copy, with the None objects and empty sequences replaced by None.

  • Detailed explanation

  • ——————–

  • - Elements of lists that are recursively None, shall be removed from the list.

  • - Elements that are empty lists / dicts are replaced by None.

caosadvancedtools.table_json_conversion.xlsx_utils.read_or_dict(data: dict | str | TextIO) dict

If data is a json file name or input stream, read data from there. If it is a dict already, just return it.

Module contents