caosadvancedtools.table_json_conversion package

Submodules

caosadvancedtools.table_json_conversion.convert module

Convert XLSX files to JSON dictionaries.

exception caosadvancedtools.table_json_conversion.convert.ForeignError(*args, definitions: list, message: str = '')

Bases: KeyError

class caosadvancedtools.table_json_conversion.convert.XLSXConverter(xlsx: str | BinaryIO, schema: dict | str | TextIO, strict: bool = False)

Bases: object

Class for conversion from XLSX to JSON.

For a detailed description of the required formatting of the XLSX files, see specs.md in the documentation.

PARSER: dict[str, Callable] = {'boolean': <function _strict_bool>, 'integer': <class 'int'>, 'number': <class 'float'>, 'string': <class 'str'>}
get_error_str() str

Return a beautiful string with the collected errors.

get_errors() dict

Return a dict with collected errors.

to_dict(validate: bool = False, collect_errors: bool = True) dict

Convert the xlsx contents to a dict.

Parameters:
  • validate (bool, optional) – If True, validate the result against the schema.

  • collect_errors (bool, optional) – If True, do not fail at the first error, but try to collect as many errors as possible. After an Exception is raised, the errors can be collected with get_errors() and printed with get_error_str().

Returns:

out – A dict representing the JSON with the extracted data.

Return type:

dict

caosadvancedtools.table_json_conversion.convert.to_dict(xlsx: str | BinaryIO, schema: dict | str | TextIO, validate: bool | None = None, strict: bool = False) dict

Convert the xlsx contents to a dict, it must follow a schema.

Parameters:
  • xlsx (Union[str, BinaryIO]) – Path to the XLSX file or opened file object.

  • schema (Union[dict, str, TextIO]) – Schema for validation of XLSX content.

  • validate (bool, optional) – If True, validate the result against the schema.

  • strict (bool, optional) – If True, fail faster.

Returns:

out – A dict representing the JSON with the extracted data.

Return type:

dict

caosadvancedtools.table_json_conversion.fill_xlsx module

Class and function to fill an XLSX template from actual data.

class caosadvancedtools.table_json_conversion.fill_xlsx.TemplateFiller(workbook: Workbook, graceful: bool = False)

Bases: object

Class to fill XLSX templates. Has an index for all relevant columns.

class Context(current_path: list[str] | None = None, props: dict[str, Any] | None = None)

Bases: object

Context for an entry: simple properties of all ancestors, organized in a dict.

This is similar to a dictionary with all scalar element properties at the tree nodes up to the root. Siblings in lists and dicts are ignored. Additionally the context knows where its current position is.

Lookup of elements can easily be achieved by giving the path (as list[str] or stringified path).

copy() Context

Deep copy.

fill_from_data(data: dict[str, Any])

Fill current level with all scalar elements of data.

next_level(next_level: str) Context

Return a copy of this Context, with the path appended by next_level.

fill_data(data: dict)

Fill the data into the workbook.

property workbook

Return the workbook of this TemplateFiller.

caosadvancedtools.table_json_conversion.fill_xlsx.fill_template(data: dict | str | TextIO, template: str, result: str, validation_schema: dict | str | TextIO | None = None) None

Insert json data into an xlsx file, according to a template.

This function fills the json data into the template stored at template and stores the result as result.

Parameters:
  • data (Union[dict, str, TextIO]) – The data, given as Python dict, path to a file or a file-like object.

  • template (str) – Path to the XLSX template.

  • result (str) – Path for the result XLSX.

  • validation_schema (dict, optional) – If given, validate the date against this schema first. This raises an exception if the validation fails. If no validation schema is given, try to ignore more errors in the data when filling the XLSX template.

caosadvancedtools.table_json_conversion.table_generator module

This module allows to generate template tables from JSON schemas.

class caosadvancedtools.table_json_conversion.table_generator.TableTemplateGenerator

Bases: ABC

base class for generating tables from json schema

abstract generate(schema: dict, foreign_keys: dict, filepath: str)

Generate a sheet definition from a given JSON schema.

Parameters:

schema: dict

Given JSON schema.

foreign_keys: dict

A tree-like configuration (nested dict) that defines which attributes shall be used to create additional columns when a list of references exists. The nested dict is structured like the data model, its innermost elements are leaves of the path trees within the JSON, they define the required keys.

Suppose we want to distinguish Persons that are referenced by Trainings, then foreign_keys must at least contain the following:
{"Training": {"Person": ["name", "email"]}}.

Values within the dicts can be either a list representing the keys (as in the example above) or a dict that allows to set additional foreign keys at higher depths. In the latter case (dict instead of list) if foreign keys exist at that level (e.g. in the above example there might be further levels below “Person”), then the foreign keys can be set using the special __this__ key.

Example: {"Training": {"__this__": ["date"], "Person": ["name", "email"]}} Here, date is the sole foreign key for Training.

It probably is worth extending the first example, with a case where a “Training” shall be distiguished by the “name” and “email” of a “Person” which it references. The foreign keys for this example are specified like this:
{"Training": {"__this__": [["Person", "name"], ["Person", "email"]]}}
class caosadvancedtools.table_json_conversion.table_generator.XLSXTemplateGenerator

Bases: TableTemplateGenerator

Class for generating XLSX tables from json schema definitions.

generate(schema: dict, foreign_keys: dict, filepath: str) None

Generate a sheet definition from a given JSON schema.

Parameters:

schema: dict

Given JSON schema

foreign_keys: dict

A configuration that defines which attributes shall be used to create additional columns when a list of references exists. See foreign_keys argument of TableTemplateGenerator.generate() .

filepath: str

The XLSX file will be stored under this path.

caosadvancedtools.table_json_conversion.xlsx_utils module

General utilities to work with XLSX files with (hidden) column and row annotations and typing.

The most prominent functions are:

  • p2s: Path to string: ["some", "path"] -> "some.path"

  • read_or_dict: Load JSON object from path, file or dict.

This module also defines these enums:

  • ColumnType

  • RowType

class caosadvancedtools.table_json_conversion.xlsx_utils.ColumnType(value)

Bases: Enum

column types enum

FOREIGN = 3
IGNORE = 5
LIST = 2
MULTIPLE_CHOICE = 4
SCALAR = 1
class caosadvancedtools.table_json_conversion.xlsx_utils.RowType(value)

Bases: Enum

row types enum

COL_TYPE = 1
IGNORE = 3
PATH = 2
caosadvancedtools.table_json_conversion.xlsx_utils.array_schema_from_model_schema(model_schema: dict) dict

Convert a data model schema to a data array schema.

Practically, this means that the top level properties are converted into lists. In a simplified notation, this can be expressed as:

array_schema = { elem: [elem typed data...] for elem in model_schema }

Parameters:

model_schema (dict) – The schema description of the data model. Must be a json schema object, with a number of object typed properties.

Returns:

array_schema – A corresponding json schema, where the properties are arrays with the types of the input’s top-level properties.

Return type:

dict

caosadvancedtools.table_json_conversion.xlsx_utils.get_data_columns(sheet: Worksheet) dict[str, SimpleNamespace]

Return the data paths of the worksheet.

Returns:

out – The keys are the stringified paths. The values are SimpleNamespace objects with index, path and column attributes.

Return type:

dict[str, SimpleNamespace]

caosadvancedtools.table_json_conversion.xlsx_utils.get_defining_paths(workbook: Workbook) dict[str, list[list[str]]]

For all sheets in workbook, list the paths which they define.

A sheet is said to define a path, if it has data columns for properties inside that path. For example, consider the following worksheet:

COL_TYPE | SCALAR | SCALAR | LIST | SCALAR |
PATH | Training | Training | Training | Training |
PATH | url | date | subjects | supervisor |
PATH | | | | email |

|------------|—————-|---------------|————–|--------------------| | | example.com/mp | 2024-02-27 | Math;Physics | steve@example.com | | | example.com/m | 2024-02-27 | Math | stella@example.com |

This worksheet defines properties for the paths ["Training"] and ["Training", "supervisor"], and thus these two path lists would be returned for the key with this sheet’s sheetname.

Parameters:

workbook (Workbook) – The workbook to analyze.

Returns:

out – A dict with worksheet names as keys and lists of paths (represented as string lists) as values.

Return type:

dict[str, list[list[str]]

caosadvancedtools.table_json_conversion.xlsx_utils.get_foreign_key_columns(sheet: Worksheet) dict[str, SimpleNamespace]

Return the foreign keys of the worksheet.

Returns:

out – The keys are the stringified paths. The values are SimpleNamespace objects with index, path and column attributes.

Return type:

dict[str, SimpleNamespace]

caosadvancedtools.table_json_conversion.xlsx_utils.get_path_position(sheet: Worksheet) tuple[list[str], str]

Return a path which represents the parent element, and the sheet’s “proper name”.

For top-level sheets / entries (those without foreign columns), the path is an empty list.

A sheet’s “proper name” is detected from the data column paths: it is the first component after the parent components.

Returns:

  • parent (list[str]) – Path to the parent element. Note that there may be list elements on the path which are not represented in this return value.

  • proper_name (str) – The “proper name” of this sheet. This defines an array where all the data lives, relative to the parent path.

caosadvancedtools.table_json_conversion.xlsx_utils.get_path_rows(sheet: Worksheet)

Return the 0-based indices of the rows which represent paths.

caosadvancedtools.table_json_conversion.xlsx_utils.get_row_type_column_index(sheet: Worksheet)

Return the column index (0-indexed) of the column which defines the row types.

caosadvancedtools.table_json_conversion.xlsx_utils.get_subschema(path: list[str], schema: dict) dict

Return the sub schema at path.

caosadvancedtools.table_json_conversion.xlsx_utils.get_worksheet_for_path(path: list[str], defining_path_index: dict[str, list[list[str]]]) str

Find the sheet name which corresponds to the given path.

caosadvancedtools.table_json_conversion.xlsx_utils.is_exploded_sheet(sheet: Worksheet) bool

Return True if this is a an “exploded” sheet.

An exploded sheet is a sheet whose data entries are LIST valued properties of entries in another sheet. A sheet is detected as exploded iff it has FOREIGN columns.

caosadvancedtools.table_json_conversion.xlsx_utils.next_row_index(sheet: Worksheet) int

Return the index for the next data row.

This is defined as the first row without any content.

caosadvancedtools.table_json_conversion.xlsx_utils.p2s(path: list[str]) str

Path to string: dot-separated.

caosadvancedtools.table_json_conversion.xlsx_utils.parse_multiple_choice(value: Any) bool

Interpret value as a multiple choice input.

Truthy values are: - The boolean True. - The number “1”. - The (case-insensitive) strings true, wahr, x, , yes, ja, y, j.

Falsy values are: - The boolean False. - None, empty strings, lists, dicts. - The number “0”. - The (case-insensitive) strings false, falsch, -, no, nein, n. - Everything else.

Returns:

out – The interpretation result of value.

Return type:

bool

caosadvancedtools.table_json_conversion.xlsx_utils.read_or_dict(data: dict | str | TextIO) dict

If data is a json file name or input stream, read data from there. If it is a dict already, just return it.

Module contents