Conversion between LinkAhead data models, JSON schema, and XLSX (and vice versa)

This file describes the conversion between JSON schema files and XLSX templates, and between JSON data files following a given schema and XLSX files with data. This conversion is handled by the Python modules in the table_json_conversion library.

Requirements: When converting from a json schema, the top level of the json schema must be a dict. The keys of the dict are RecordType names.

Data models in JSON Schema and JSON data

The data model in LinkAhead defines the types of records present in a LinkAhead instance and their structure. This data model can also be represented in a JSON Schema, which defines the structure of JSON files containing records pertaining to the data model.

For example, the following JSON can describe a “Person” Record:

{
    "Person": {
        "family_name": "Steve",
        "given_name": "Stevie"
    }
}

A JSON Schema specifies a concrete structure, and the associated JSON files can be used to represent data for specific record structures. For instance, one could create a JSON Schema allowing the storage of “Training” Records containing information about conducted trainings. This is particularly valuable for data import and export. One could generate web forms from the JSON Schema or use it to export objects stored in LinkAhead as JSON.

From JSON to XLSX: Data Representation

The following describes how JSON files representing LinkAhead records are converted into XLSX files, or how JSON files with records are created from XLSX files.

The attribute name (e.g., “Person” above) determines the RecordType, and the value of this attribute can either be an object or a list. If it is an object (as in the example above), a single record is represented. In the case of a list, multiple records sharing the same RecordType as the parent are represented.

The Properties of the record (e.g., family_name and given_name above) become columns in the XLSX file. These properties have an attribute name and a value. The value can be:

a. A primitive (text, number, boolean, …) b. A record c. A list of primitive types d. A list of records

In cases a. and c., a cell is created in the column corresponding to the property in the XLSX file. In case b., columns are created for the Properties of the record, where for each of the Properties the cases a. - d. are considered recursively.

For case d. however, the two-dimensional structure of an XLSX sheet is not sufficient. Therefore, for such cases, new XLSX sheets/tables are created.

In these sheets/tables, the referenced records are treated as described above (new columns for the Properties). However, there are now additional columns that indicate from which “external” record these records are referenced.

Let’s now consider these four cases in detail and with examples:

a. Properties with Primitive Data Types

{
    "Training": {
        "date": "2023-01-01",
        "url": "www.indiscale.com",
        "duration": 1.0,
        "participants": 1,
        "remote": false
    }
}

This entry is represented in an XLSX sheet with the following content: date url duration participants remote 2023-01-01 www.indiscale.com 1.0 1 false

a. Properties mit primitiven Datentypen

{
    "Training": {
        "date": "2023-01-01",
        "url": "www.indiscale.com",
        "duration": 1.0,
        "participants": 1,
        "remote": false
    }
}

This entry will be represented in an XLSX sheet with the following content:

| date | url | duration | participants | remote | |————|——————-|———-|————–|——–| | 2023-01-01 | www.indiscale.com | 1.0 | 1 | false |

b. Property referencing a record

{
    "Training": {
        "date": "2023-01-01",
        "supervisor": {
            "family_name": "Stevenson",
            "given_name": "Stevie",
        }
    }
}

This entry will be represented in an XLSX sheet with the following content:

| date | supervisor.family_name | supervisor.given_name | |————|————————–|————————-| | 2023-01-01 | Stevenson | Stevie |

Note that column names may be renamed. The mapping of columns to properties of records is ensured through the content of hidden rows. (See below for the definition of hidden rows.)

c. Properties containing lists of primitive data types

{
    "Training": {
        "url": "www.indiscale.com",
        "subjects": ["Math", "Physics"],
    }
}

This entry would be represented in an XLSX sheet with the following content:

| url | subjects | |——————-|————–| | www.indiscale.com | Math;Physics |

The list elements are written into the cell separated by ; (semicolon). If the elements contain the separator ;, it is escaped with \\.

d. Properties containing lists with references

{
    "Training": {
        "date": "2023-01-01",
        "coach": [
            {
              "family_name": "Sky",
              "given_name": "Max",
            },
            {
              "family_name": "Sky",
              "given_name": "Min",
            }
        ]
    }
}

Since the two coaches cannot be represented properly in a single cell, another worksheet is needed to contain the properties of the coaches.

The sheet for the Trainings in this example only contains the “date” column

| date | |————| | 2023-01-01 |

Additionally, there is another sheet where the coaches are stored. Here, it is crucial to define how the correct element is chosen from potentially multiple “Trainings”. In this case, it means that the “date” must be unique.

Note: This uniqueness requirement is not strictly checked right now, it is your responsibility as a user that such “foreign properties” are truly unique.

The second sheet looks like this:

| date | coach.family_name | coach.given_name | |————|———————|——————–| | 2023-01-01 | Sky | Max | | 2023-01-01 | Sky | Min |

Data in XLSX: Hidden automation logic

First column: Marker for row types

The first column in each sheet will be hidden and it will contain an entry in each row that needs special treatment. The following values are used:

  • IGNORE: This row is ignored. It can be used for explanatory texts or layout.

  • COL_TYPE: Typically the first row that is not IGNORE. It indicates the row that defines the type of columns (FOREIGN, SCALAR, LIST, IGNORE). This row may occur only once.

  • PATH: Indicates that the row is used to define the path within the JSON. These rows are typically hidden for users.

An example table could look like this:

| IGNORE | | Welcome | to this | file! | | | IGNORE | | Please | enter your | data here: | | | COL_TYPE | IGNORE | SCALAR | SCALAR | LIST | SCALAR | | PATH | | Training | Training | Training | Training | | PATH | | url | date | subjects | supervisor | | PATH | | | | | email | | IGNORE | Please enter one training per line. | Training URL | Training date | Subjects | Supervisor’s email | |————|————————————-|—————-|—————|————–|——————–| | | | example.com/mp | 2024-02-27 | Math;Physics | steve@example.com | | | | example.com/m | 2024-02-27 | Math | stella@example.com |

Parsing XLSX data

To extract the value of a given cell, we traverse all path elements (in PATH rows) from top to bottom. The final element of the path is the name of the Property to which the value belongs. In the example above, steve@example.com is the value of the email Property in the path ["Training", "supervisor", "email"].

The path elements are sufficient to identify the object within a JSON, at least if the corresponding JSON element is a single object. If the JSON element is an array, the appropriate object within the array needs to be selected.

For this selection additional FOREIGN columns are used. The paths in these columns must all have the same base and one additional unique key component. For example, two FOREIGN columns could be ["Training", "date"] and ["Training", "url"], where ["Training"] is the base path and "date" and "url" are the unique keys.

The base path defines the table (or recordtype) to which the entries belong, and the values of the unique keys define the actual rows to which data belongs.

For example, this table defines three coaches for the two trainings from the last table:

| COL_TYPE | FOREIGN | FOREIGN | SCALAR | | PATH | Training | Training | Training | | PATH | date | url | coach | | PATH | | | given_name | | IGNORE | Date of training | URL of training | The coach’s given name | | IGNORE | from sheet ‘Training’ | from sheet ‘Training’ | | |————|———————–|———————–|————————| | | 2024-02-27 | example.com/mp | Ada | | | 2024-02-27 | example.com/mp | Berta | | | 2024-02-27 | example.com/m | Chris |

Current limitations

The current implementation still lacks the following:

  • Lists of enum references are not yet implemented as columns where matching cell can simply be ticked/crossed.

  • Files handling is not implemented yet.