caosadvancedtools.table_json_conversion package
Submodules
caosadvancedtools.table_json_conversion.convert module
Convert XLSX files to JSON dictionaries.
- exception caosadvancedtools.table_json_conversion.convert.ForeignError(*args, definitions: list, message: str = '')
Bases:
KeyError
- class caosadvancedtools.table_json_conversion.convert.XLSXConverter(xlsx: str | BinaryIO, schema: dict | str | TextIO, strict: bool = False)
Bases:
object
Class for conversion from XLSX to JSON.
For a detailed description of the required formatting of the XLSX files, see
specs.md
in the documentation.- PARSER: dict[str, Callable] = {'boolean': <function _strict_bool>, 'integer': <class 'int'>, 'number': <class 'float'>, 'string': <class 'str'>}
- to_dict(validate: bool = False, collect_errors: bool = True) dict
Convert the xlsx contents to a dict.
- Parameters:
validate (bool, optional) – If True, validate the result against the schema.
collect_errors (bool, optional) – If True, do not fail at the first error, but try to collect as many errors as possible. After an Exception is raised, the errors can be collected with
get_errors()
and printed withget_error_str()
.
- Returns:
out – A dict representing the JSON with the extracted data.
- Return type:
- caosadvancedtools.table_json_conversion.convert.to_dict(xlsx: str | BinaryIO, schema: dict | str | TextIO, validate: bool | None = None, strict: bool = False) dict
Convert the xlsx contents to a dict, it must follow a schema.
- Parameters:
- Returns:
out – A dict representing the JSON with the extracted data.
- Return type:
caosadvancedtools.table_json_conversion.fill_xlsx module
Class and function to fill an XLSX template from actual data.
- class caosadvancedtools.table_json_conversion.fill_xlsx.TemplateFiller(workbook: Workbook, graceful: bool = False)
Bases:
object
Class to fill XLSX templates. Has an index for all relevant columns.
- class Context(current_path: list[str] = None, props: dict[str, Any] = None)
Bases:
object
Context for an entry: simple properties of all ancestors, organized in a dict.
This is similar to a dictionary with all scalar element properties at the tree nodes up to the root. Siblings in lists and dicts are ignored. Additionally the context knows where its current position is.
Lookup of elements can easily be achieved by giving the path (as
list[str]
or stringified path).
- property workbook
Return the workbook of this TemplateFiller.
- caosadvancedtools.table_json_conversion.fill_xlsx.fill_template(data: dict | str | TextIO, template: str, result: str, validation_schema: dict | str | TextIO = None) None
Insert json data into an xlsx file, according to a template.
This function fills the json data into the template stored at
template
and stores the result asresult
.- Parameters:
data (Union[dict, str, TextIO]) – The data, given as Python dict, path to a file or a file-like object.
template (str) – Path to the XLSX template.
result (str) – Path for the result XLSX.
validation_schema (dict, optional) – If given, validate the date against this schema first. This raises an exception if the validation fails. If no validation schema is given, try to ignore more errors in the data when filling the XLSX template.
caosadvancedtools.table_json_conversion.table_generator module
This module allows to generate template tables from JSON schemas.
- class caosadvancedtools.table_json_conversion.table_generator.TableTemplateGenerator
Bases:
ABC
base class for generating tables from json schema
- abstract generate(schema: dict, foreign_keys: dict, filepath: str)
Generate a sheet definition from a given JSON schema.
Parameters:
- schema: dict
Given JSON schema.
- foreign_keys: dict
A tree-like configuration (nested dict) that defines which attributes shall be used to create additional columns when a list of references exists. The nested dict is structured like the data model, its innermost elements are leaves of the path trees within the JSON, they define the required keys.
Suppose we want to distinguish Persons that are referenced by Trainings, thenforeign_keys
must at least contain the following:{"Training": {"Person": ["name", "email"]}}
.Values within the dicts can be either a list representing the keys (as in the example above) or a dict that allows to set additional foreign keys at higher depths. In the latter case (dict instead of list) if foreign keys exist at that level (e.g. in the above example there might be further levels below “Person”), then the foreign keys can be set using the special
__this__
key.Example:
{"Training": {"__this__": ["date"], "Person": ["name", "email"]}}
Here,date
is the sole foreign key for Training.It probably is worth extending the first example, with a case where a “Training” shall be distiguished by the “name” and “email” of a “Person” which it references. The foreign keys for this example are specified like this:{"Training": {"__this__": [["Person", "name"], ["Person", "email"]]}}
- class caosadvancedtools.table_json_conversion.table_generator.XLSXTemplateGenerator
Bases:
TableTemplateGenerator
Class for generating XLSX tables from json schema definitions.
- generate(schema: dict, foreign_keys: dict, filepath: str) None
Generate a sheet definition from a given JSON schema.
Parameters:
- schema: dict
Given JSON schema
- foreign_keys: dict
A configuration that defines which attributes shall be used to create additional columns when a list of references exists. See
foreign_keys
argument ofTableTemplateGenerator.generate()
.- filepath: str
The XLSX file will be stored under this path.
caosadvancedtools.table_json_conversion.xlsx_utils module
General utilities to work with XLSX files with (hidden) column and row annotations and typing.
The most prominent functions are:
p2s
: Path to string:["some", "path"] -> "some.path"
read_or_dict
: Load JSON object from path, file or dict.
This module also defines these enums:
ColumnType
RowType
- class caosadvancedtools.table_json_conversion.xlsx_utils.ColumnType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
column types enum
- FOREIGN = 3
- IGNORE = 5
- LIST = 2
- MULTIPLE_CHOICE = 4
- SCALAR = 1
- class caosadvancedtools.table_json_conversion.xlsx_utils.RowType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)
Bases:
Enum
row types enum
- COL_TYPE = 1
- IGNORE = 3
- PATH = 2
- caosadvancedtools.table_json_conversion.xlsx_utils.array_schema_from_model_schema(model_schema: dict) dict
Convert a data model schema to a data array schema.
Practically, this means that the top level properties are converted into lists. In a simplified notation, this can be expressed as:
array_schema = { elem: [elem typed data...] for elem in model_schema }
- Parameters:
model_schema (dict) – The schema description of the data model. Must be a json schema object, with a number of object typed properties.
- Returns:
array_schema – A corresponding json schema, where the properties are arrays with the types of the input’s top-level properties.
- Return type:
- caosadvancedtools.table_json_conversion.xlsx_utils.get_column_type_row_index(sheet: Worksheet)
Return the row index (0-indexed) of the row which defines the column types.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_data_columns(sheet: Worksheet) dict[str, SimpleNamespace]
Return the data paths of the worksheet.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_defining_paths(workbook: Workbook) dict[str, list[list[str]]]
For all sheets in
workbook
, list the paths which they define.A sheet is said to define a path, if it has data columns for properties inside that path. For example, consider the following worksheet:
COL_TYPE
|SCALAR
|SCALAR
|LIST
|SCALAR
|PATH
|Training
|Training
|Training
|Training
|PATH
|url
|date
|subjects
|supervisor
|PATH
| | | |email
||------------|—————-|---------------|————–|--------------------| | | example.com/mp | 2024-02-27 | Math;Physics | steve@example.com | | | example.com/m | 2024-02-27 | Math | stella@example.com |
This worksheet defines properties for the paths
["Training"]
and["Training", "supervisor"]
, and thus these two path lists would be returned for the key with this sheet’s sheetname.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_foreign_key_columns(sheet: Worksheet) dict[str, SimpleNamespace]
Return the foreign keys of the worksheet.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_path_position(sheet: Worksheet) tuple[list[str], str]
Return a path which represents the parent element, and the sheet’s “proper name”.
For top-level sheets / entries (those without foreign columns), the path is an empty list.
A sheet’s “proper name” is detected from the data column paths: it is the first component after the parent components.
- Returns:
parent (list[str]) – Path to the parent element. Note that there may be list elements on the path which are not represented in this return value.
proper_name (str) – The “proper name” of this sheet. This defines an array where all the data lives, relative to the parent path.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_path_rows(sheet: Worksheet)
Return the 0-based indices of the rows which represent paths.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_row_type_column_index(sheet: Worksheet)
Return the column index (0-indexed) of the column which defines the row types.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_subschema(path: list[str], schema: dict) dict
Return the sub schema at
path
.
- caosadvancedtools.table_json_conversion.xlsx_utils.get_worksheet_for_path(path: list[str], defining_path_index: dict[str, list[list[str]]]) str
Find the sheet name which corresponds to the given path.
- caosadvancedtools.table_json_conversion.xlsx_utils.is_exploded_sheet(sheet: Worksheet) bool
Return True if this is a an “exploded” sheet.
An exploded sheet is a sheet whose data entries are LIST valued properties of entries in another sheet. A sheet is detected as exploded if and only if it has FOREIGN columns.
- caosadvancedtools.table_json_conversion.xlsx_utils.next_row_index(sheet: Worksheet) int
Return the index for the next data row.
This is defined as the first row without any content.
- caosadvancedtools.table_json_conversion.xlsx_utils.p2s(path: list[str]) str
Path to string: dot-separated.
- caosadvancedtools.table_json_conversion.xlsx_utils.parse_multiple_choice(value: Any) bool
Interpret
value
as a multiple choice input.Truthy values are: - The boolean
True
. - The number “1”. - The (case-insensitive) stringstrue
,wahr
,x
,√
,yes
,ja
,y
,j
.Falsy values are: - The boolean
False
. -None
, empty strings, lists, dicts. - The number “0”. - The (case-insensitive) stringsfalse
,falsch
,-
,no
,nein
,n
. - Everything else.- Returns:
out – The interpretation result of
value
.- Return type: