caosadvancedtools package

Subpackages

Submodules

caosadvancedtools.cache module

class caosadvancedtools.cache.AbstractCache(db_file=None, force_creation=False)

Bases: ABC

check_cache()

Check whether the cache in db file self.db_file exists and conforms to the latest database schema.

If it does not exist, it will be created using the newest database schema.

If it exists, but the schema is outdated, an exception will be raised.

abstract create_cache(): Provide an overloaded function here that creates the cache in the most recent version.

abstract get_cache_schema_version()

A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.

Increase this variable, when changes to the cache tables are made.

get_cache_version(): Return the version of the cache stored in self.db_file. The version is stored as the only entry in colum schema of table version.

abstract get_default_file_name(): Supply a default file name for the cache here.

run_sql_commands(commands, fetchall: bool = False)

Run a list of SQL commands on self.db_file.

Parameters:

commands – List of sql commands (tuples) to execute
fetchall (bool, optional) – When True, run fetchall as last command and return the results. Otherwise nothing is returned.

class caosadvancedtools.cache.Cache(*args, **kwargs): Bases: IdentifiableCache

class caosadvancedtools.cache.IdentifiableCache(db_file=None, force_creation=False)

Bases: AbstractCache

stores identifiables (as a hash of xml) and their respective ID.

This allows to retrieve the Record corresponding to an indentifiable without querying.

check_existing(ent_hash)

Check the cache for a hash.

ent_hash: The hash to search for.

Return the ID and the version ID of the hashed entity. Return None if no entity with that hash is in the cache.

create_cache()

Create a new SQLITE cache file in self.db_file.

Two tables will be created: - identifiables is the actual cache. - version is a table with version information about the cache.

get_cache_schema_version()

A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.

Increase this variable, when changes to the cache tables are made.

get_default_file_name(): Supply a default file name for the cache here.

static hash_entity(ent): Format an entity as “pretty” XML and return the SHA256 hash.

insert(ent_hash, ent_id, ent_version)

Insert a new cache entry.

ent_hash: Hash of the entity. Should be generated with Cache.hash_entity ent_id: ID of the entity ent_version: Version string of the entity

insert_list(hashes, entities)

Insert the ids of entities into the cache

The hashes must correspond to the entities in the list

update_ids_from_cache(entities)

sets ids of those entities that are in cache

A list of hashes corresponding to the entities is returned

validate_cache(entities=None)

Runs through all entities stored in the cache and checks whether the version still matches the most recent version. Non-matching entities will be removed from the cache.

entities: When set to a db.Container or a list of Entities: the IDs from the cache will not be retrieved from the CaosDB database, but the versions from the cache will be checked against the versions contained in that collection. Only entries in the cache that have a corresponding version in the collection will be checked, all others will be ignored. Useful for testing.

Return a list of invalidated entries or an empty list if no elements have been invalidated.

class caosadvancedtools.cache.UpdateCache(db_file=None, force_creation=False)

Bases: AbstractCache

stores unauthorized inserts and updates

If the Guard is set to a mode that does not allow an insert or update, the insert or update can be stored in this cache such that it can be authorized and performed later.

create_cache(): initialize the cache

get(run_id, querystring): returns the pending updates for a given run id

Parameters:

run_id: the id of the crawler run querystring: the sql query

get_cache_schema_version()

A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.

Increase this variable, when changes to the cache tables are made.

get_default_file_name(): Supply a default file name for the cache here.

get_inserts(run_id): returns the pending updates for a given run id

Parameters:

run_id: the id of the crawler run

static get_previous_version(cont): Retrieve the current, unchanged version of the entities that shall be updated, i.e. the version before the update

get_updates(run_id): returns the pending updates for a given run id

Parameters:

run_id: the id of the crawler run

insert(cont, run_id, insert=False)

Insert a pending, unauthorized insert or update

Parameters:

cont (Container with the records to be inserted or updated containing the desired) – version, i.e. the state after the update.
run_id (int) – The id of the crawler run
insert (bool) – Whether the entities in the container shall be inserted or updated.

caosadvancedtools.cache.cleanXML(xml)

caosadvancedtools.cache.get_pretty_xml(cont)

caosadvancedtools.cache.put_in_container(stuff)

caosadvancedtools.cfood module

Defines how something that shall be inserted into LinkAhead is treated.

LinkAhead can automatically be filled with Records based on some structure, a file structure, a table or similar.

The Crawler will iterate over the respective items and test for each item whether a CFood class exists that matches the file path, i.e. whether CFood class wants to treat that pariticular item. If one does, it is instanciated to treat the match. This occurs in basically three steps:

Create a list of identifiables, i.e. unique representation of LinkAhead Records (such as an experiment belonging to a project and a date/time).
The identifiables are either found in LinkAhead or they are created.
The identifiables are update based on the date in the file structure.

class caosadvancedtools.cfood.AbstractCFood(item)

Bases: object

Abstract base class for Crawler food (CFood).

attach(item)

collect_information()

The CFood collects information for further processing.

Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.

To be overwritten by subclasses

abstract create_identifiables(): should set the instance variable Container with the identifiables

looking_for(item)

returns True if item can be added to this CFood.

Typically a CFood exists for a file and defines how to deal with the file. However, sometimes additional files “belong” to a CFood. E.g. an experiment CFood might match against a README file but labnotes.txt also shall be treated by the cfood (and not a special cfood created for labnotes.txt) This function can be used to define what files shall be ‘attached’.

To be overwritten by subclasses

classmethod match_item(item)

Matches an item found by the crawler against this class. Returns True if the item shall be treated by this class, i.e. if this class matches the item.

Parameters:

item (object) – iterated by the crawler
subclasses! (To be overwritten by)

static remove_property(entity, prop)

static set_parents(entity, names)

static set_property(entity, prop, value, datatype=None)

abstract update_identifiables(): Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

class caosadvancedtools.cfood.AbstractFileCFood(crawled_path, *args, **kwargs)

Bases: AbstractCFood

property crawled_file

classmethod get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

looking_for(crawled_file)

returns True if crawled_file can be added to this CFood.

Typically a CFood exists for a file and defines how to deal with the file. However, sometimes additional files “belong” to a CFood. E.g. an experiment CFood might match against a README file but labnotes.txt also shall be treated by the cfood (and not a special cfood created for labnotes.txt) This function can be used to define what files shall be ‘attached’.

classmethod match_item(path)

Matches the regular expression of this class against file names

Parameters:: path (str) – The path of the file that shall be matched.

static re_from_extensions(extensions)

Return a regular expression which matches the given file extensions.

Useful for inheriting classes.

Parameters:: extensions (iterable<str>) – An iterable with the allowed extensions.
Returns:: out – The regular expression, starting with .*\. and ending with the EOL dollar character. The actual extension will be accessible in the pattern group name ext.
Return type:: str

class caosadvancedtools.cfood.CMeal

Bases: object

CMeal groups equivalent items and allow their collected insertion.

Sometimes there is no one item that can be used to trigger the creation of some Record. E.g. if a collection of image files shall be referenced from one Record that groups them, it is unclear which image should trigger the creation of the Record.

CMeals are grouped based on the groups in the used regular expression. If, in the above example, all the images reside in one folder, all groups of the filename match except that for the file name should match. The groups that shall match need to be listed in the matching_groups class property. Subclasses will overwrite this property.

This allows to use has_suitable_cfood in the match_item function of a CFood to check whether the necessary CFood was already created. In order to allow this all instances of a CFood class are tracked in the existing_instances class member.

Subclasses must have a cls.get_re function and a match member variable (see AbstractFileCFood)

classmethod all_groups_equal(m1, m2)

belongs_to_meal(item)

existing_instances = []

static get_re()

classmethod has_suitable_cfood(item)

checks whether the required cfood object already exists.

item : the crawled item

matching_groups = []

class caosadvancedtools.cfood.FileGuide

Bases: object

access(path)

should be replaced by a function that adds a prefix to paths to allow to access LinkAhead files locally

This default just returns the unchanged path.

class caosadvancedtools.cfood.RowCFood(item, unique_cols, recordtype, **kwargs)

Bases: AbstractCFood

create_identifiables(): should set the instance variable Container with the identifiables

update_identifiables(): Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

caosadvancedtools.cfood.add_files(filemap): add to the file cache

caosadvancedtools.cfood.assure_has_description(entity, description, to_be_updated=None, force=False)

Checks whether entity has the description that is passed.

If this is the case this function ends. Otherwise the entity is assigned a new description. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

caosadvancedtools.cfood.assure_has_parent(entity, parent, to_be_updated=None, force=False, unique=True)

Checks whether entity has a parent with name parent.

If this is the case this function ends. Otherwise the entity is assigned a new parent. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

caosadvancedtools.cfood.assure_has_property(entity, name, value, to_be_updated=None, datatype=None, setproperty=False)

Checks whether entity has a property name with the value value.

If this is the case this function ends. Otherwise the entity is assigned a new parent.

Note that property matching occurs based on names.

If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

setproperty: boolean, if True, overwrite existing properties.

caosadvancedtools.cfood.assure_name_is(entity, name, to_be_updated=None, force=False)

Checks whether entity has the name that is passed.

If this is the case this function ends. Otherwise the entity is assigned a new name. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

caosadvancedtools.cfood.assure_object_is_in_list(obj, containing_object, property_name, to_be_updated=None, datatype=None)

Checks whether obj is one of the values in the list property property_name of the supplied entity containing_object.

If this is the case this function returns. Otherwise the entity is added to the property property_name and the entity containing_object is added to the supplied list to_be_updated in order to indicate, that the entity containing_object should be updated. If none is submitted the update will be conducted in-place.

If the property is missing, it is added first and then the entity is added/updated.

If obj is a list, every element is added

caosadvancedtools.cfood.assure_parents_are(entity, parents, to_be_updated=None, force=False, unique=True)

Checks whether entity has the provided parents (and only those).

If this is the case this function ends. Otherwise the entity is assigned the new parents and the old ones are discarded.

Note that parent matching occurs based on names. If a parent does not have a name, a ValueError is raised.

If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

parents: single string or list of strings

caosadvancedtools.cfood.assure_property_is(entity, name, value, datatype=None, to_be_updated=None, force=False)

Checks whether entity has a Property name with the given value.

If this is the case this function ends. Otherwise the entity is assigned a new property or an existing one is updated.

If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

caosadvancedtools.cfood.assure_special_is(entity, value, kind, to_be_updated=None, force=False)

Checks whether entity has the name or description that is passed.

If this is the case this function ends. Otherwise the entity is assigned a new name. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated

caosadvancedtools.cfood.get_entity(name)

Returns the entity with a given name, preferably from a local cache.

If the local cache does not contain the entity, retrieve it from LinkAhead.

caosadvancedtools.cfood.get_entity_for_path(path)

caosadvancedtools.cfood.get_ids_for_entities_with_names(entities)

caosadvancedtools.cfood.get_property(name)

Returns the property with a given name, preferably from a local cache.

If the local cache does not contain the record type, try to retrieve it from LinkAhead. If it does not exist, see whether it could be a record type used as a property.

caosadvancedtools.cfood.get_record(name)

Returns the record with a given name, preferably from a local cache.

If the local cache does not contain the record, try to retrieve it from LinkAhead.

caosadvancedtools.cfood.get_recordtype(name)

Returns the record type with a given name, preferably from a local cache.

If the local cache does not contain the record type, try to retrieve it from LinkAhead. If it does not exist, add it to the data model problems

caosadvancedtools.cfood.insert_id_based_on_name(entity)

caosadvancedtools.collect_datamodel module

caosadvancedtools.crawler module

Crawls a file structure and inserts Records into LinkAhead based on what is found.

LinkAhead can automatically be filled with Records based on some file structure. The Crawler will iterate over the files and test for each file whether a CFood exists that matches the file path. If one does, it is instanciated to treat the match. This occurs in basically three steps: 1. create a list of identifiables, i.e. unique representation of LinkAhead Records (such as an experiment belonging to a project and a date/time) 2. the identifiables are either found in LinkAhead or they are created. 3. the identifiables are update based on the date in the file structure

class caosadvancedtools.crawler.Crawler(cfood_types, use_cache=False, abort_on_exception=True, interactive=True, hideKnown=False, debug_file=None, cache_file=None)

Bases: object

check_matches(matches)

collect_cfoods()

This is the first phase of the crawl. It collects all cfoods that shall be processed. The second phase is iterating over cfoods and updating LinkAhead. This separate first step is necessary in order to allow a single cfood being influenced by multiple crawled items. E.g. the FileCrawler can have a single cfood treat multiple files.

This is a very basic implementation and this function should be overwritten by subclasses.

The basic structure of this function should be, that what ever is being processed is iterated and each cfood is checked whether the item ‘matches’. If it does, a cfood is instantiated passing the item as an argument. The match can depend on the cfoods already being created, i.e. a file migth no longer match because it is already treaded by an earlier cfood.

should return cfoods, tbs and errors_occured. # TODO do this via logging? tbs text returned from traceback errors_occured True if at least one error occured

crawl(security_level=0, path=None)

static create_query_for_identifiable(ident): uses the properties of ident to create a query that can determine whether the required record already exists.

static find_existing(entity)

searches for an entity that matches the identifiable in LinkAhead

Characteristics of the identifiable like, properties, name or id are used for the match.

static find_or_insert_identifiables(identifiables): Sets the ids of identifiables (that do not have already an id from the cache) based on searching LinkAhead and retrieves those entities. The remaining entities (those which can not be retrieved) have no correspondence in LinkAhead and are thus inserted.

iteritems(): generates items to be crawled with an index

static save_form(changes, path, run_id)

Saves an html website to a file that contains a form with a button to authorize the given changes.

The button will call the crawler with the same path that was used for the current run and with a parameter to authorize the changes of the current run.

Parameters:

changes: The LinkAhead entities in the version after the update. path: the path defining the subtree that is crawled

static send_mail(changes, filename): calls sendmail in order to send a mail to the curator about pending changes

Parameters:

changes: The LinkAhead entities in the version after the update. filename: path to the html site that allow the authorization

static update_authorized_changes(run_id)

execute the pending updates of a specific run id.

This should be called if the updates of a certain run were authorized.

Parameters:

run_id: the id of the crawler run

class caosadvancedtools.crawler.FileCrawler(files, **kwargs)

Bases: Crawler

iteritems(): generates items to be crawled with an index

static query_files(path)

class caosadvancedtools.crawler.TableCrawler(table, unique_cols, recordtype, **kwargs)

Bases: Crawler

iteritems(): generates items to be crawled with an index

caosadvancedtools.crawler.apply_list_of_updates(to_be_updated, update_flags=None, update_cache=None, run_id=None)

Updates the to_be_updated Container, i.e., pushes the changes to LinkAhead after removing possible duplicates. If a chace is provided, uauthorized updates can be cached for further authorization.

Parameters:

to_be_updateddb.Container: Container with the entities that will be updated.
update_flagsdict, optional: Dictionary of LinkAhead server flags that will be used for the update. Default is an empty dict.
update_cacheUpdateCache or None, optional: Cache in which the intended updates will be stored so they can be authorized afterwards. Default is None.
run_idString or None, optional: Id with which the pending updates are cached. Only meaningful if update_cache is provided. Default is None.

caosadvancedtools.crawler.get_value(prop)

Returns the value of a Property

Parameters:: prop (The property of which the value shall be returned.)
Returns:: out
Return type:: The value of the property; if the value is an entity, its ID.

caosadvancedtools.crawler.separated(text)

caosadvancedtools.datainconsistency module

Implements an error to be used when there is a problem with the data to be read. I.e. something that users of CaosDB need to fix.

exception caosadvancedtools.datainconsistency.DataInconsistencyError: Bases: ValueError

caosadvancedtools.datamodel_problems module

Implements a class for finding and storing missing entities, either record types or properties, that are missing in a data model. They can be inserted by hand or gueesed from possible exceptions when inserting or updating entities with missing parents and/or properties.

class caosadvancedtools.datamodel_problems.DataModelProblems

Bases: object

Collect and store missing RecordTypes and Properties.

static add(ent): Add a missing record type or property.

static evaluate_exception(e)

Take a TransactionError, see whether it was caused by datamodel problems, and update missing parents and/or properties if this was the case. Afterwards, raise the exception.

Parameters:: e (TransactionError) – TransactionError, the children of which are checked for possible datamodel problems.

missing = {}

caosadvancedtools.example_cfood module

class caosadvancedtools.example_cfood.ExampleCFood(crawled_path, *args, **kwargs)

Bases: AbstractFileCFood

create_identifiables(): should set the instance variable Container with the identifiables

classmethod get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

update_identifiables(): Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

caosadvancedtools.export_related module

This file allows to create an xml representation of a complete dataset. Using the given entity all related entities are collected and saved in a way that the data can be imported in another LinkAhead instance.

Files that are smaller than 1MB are saved in a downloads folder and can be imported along with the entities themselves.

caosadvancedtools.export_related.defineParser()

caosadvancedtools.export_related.export(cont, directory='.')

caosadvancedtools.export_related.export_related_to(rec_id, directory='.')

caosadvancedtools.export_related.get_ids_of_related_entities(entity)

returns a list of ids of entities that related to the given one.

Related means in this context, that it is kind of necessary for the representation of this entity: ids of properties and parents as well as the ids of referenced entities.

caosadvancedtools.export_related.invert_ids(entities)

caosadvancedtools.export_related.main()

caosadvancedtools.export_related.recursively_collect_related(entity): collects all related entities. Starting from a single entity the related entities are retrieved (see get_ids_of_related_entities) and then the related entities of those are retrieved and so forth. This is usefull to create a collection of kind of related dataset

caosadvancedtools.guard module

class caosadvancedtools.guard.Guard(level=0)

Bases: object

safe_insert(obj, *args, **kwargs)

safe_update(obj, *args, **kwargs)

set_level(level)

exception caosadvancedtools.guard.ProhibitedException: Bases: Exception

caosadvancedtools.import_from_xml module

This file allows to import a dataset stored in a xml representation and corresponding files.

The export should have been done with export_related.py

caosadvancedtools.import_from_xml.create_dummy_file(text='Please ask the administrator for this file.')

caosadvancedtools.import_from_xml.defineParser()

caosadvancedtools.import_from_xml.import_xml(filename, rerun=False, interactive=True): filename: path to the xml file with the data to be inserted rerun: boolean; if true, files are not inserted as paths would conflict.

caosadvancedtools.import_from_xml.main()

caosadvancedtools.json_schema_exporter module

Convert a data model into a json schema.

Sometimes you may want to have a json schema which describes a LinkAhead data model, for example for the automatic generation of user interfaces with third-party tools like rjsf. Then this is the right module for you!

The json_schema_exporter module has one main class, JsonSchemaExporter, and a few utility and wrapper functions.

For easy usage, you may simply import recordtype_to_json_schema and use it on a fully referenced RecordType like this:

import caosadvancedtools.models.parser as parser
import caosadvancedtools.json_schema_exporter as jsex

model = parser.parse_model_from_yaml("my_model.yml")

# get the data model schema for the "Journey" recordtype
schema, ui_schema = recordtype_to_json_schema(
    rt=model.get_deep("Journey"),
    do_not_create=["Continent"],         # only choose from existing Records
    multiple_choice=["visited_cities"],
    rjsf=True                            # also create a UI schema
)

For more details on how to use this wrapper, read the function documentation.

Other useful functions are make_array, which creates an array out of a single schema, and merge_schemas, which as the name suggests allows to combine multiple schema definitions into a single schema.

class caosadvancedtools.json_schema_exporter.JsonSchemaExporter(additional_properties: bool = True, name_property_for_new_records: bool = False, description_property_for_new_records: bool = False, additional_options_for_text_props: dict = None, additional_json_schema: Dict[str, dict] = None, additional_ui_schema: Dict[str, dict] = None, units_in_description: bool = True, do_not_create: List[str] = None, do_not_retrieve: List[str] = None, no_remote: bool = False, use_rt_pool: DataModel = None, multiple_choice: List[str] = None, wrap_files_in_objects: bool = False)

Bases: object

A class which collects everything needed for the conversion.

recordtype_to_json_schema(rt: RecordType, rjsf: bool = False) → dict | Tuple[dict, dict]

Create a jsonschema from a given RecordType that can be used, e.g., to validate a json specifying a record of the given type.

Parameters:

rt (RecordType) – The RecordType from which a json schema will be created.
rjsf (bool, optional) – If True, uiSchema definitions for react-jsonschema-forms will be output as the second return value. Default is False

Returns:

schema (dict) – A dict containing the json schema created from the given RecordType’s properties.
ui_schema (dict, optional) – A ui schema. Only if a parameter asks for it (e.g. rjsf).

caosadvancedtools.json_schema_exporter.make_array(schema: dict, rjsf_uischema: dict = None) → dict | Tuple[dict, dict]

Create an array of the given schema.

The result will look like this:

{ "type": "array",
  "items": {
      // the schema
    }
}

Parameters:

schema (dict) – The JSON schema which shall be packed into an array.
rjsf_uischema (dict, optional) – A react-jsonschema-forms ui schema that shall be wrapped as well.

Returns:

schema (dict) – A JSON schema dict with a top-level array which contains instances of the given schema.
ui_schema (dict, optional) – The wrapped ui schema. Only returned if rjsf_uischema is given as parameter.

caosadvancedtools.json_schema_exporter.merge_schemas(schemas: Dict[str, dict] | Iterable[dict], rjsf_uischemas: Dict[str, dict] | Sequence[dict] = None) → dict | Tuple[dict, dict]

Merge the given schemata into a single schema.

The result will look like this:

{
  "type": "object",
  "properties": {
    // A, B, C
  },
  "required": [
    // "A", "B", "C"
  ],
  "additionalProperties": false
}

Parameters:

schemas (dict[str, dict] | Iterable[dict]) – A dict or iterable of schemata which shall be merged together. If this is a dict, the keys will be used as property names, otherwise the titles of the submitted schemata. If they have no title, numbers will be used as a fallback. Note that even with a dict, the original schema’s “title” is not changed.
rjsf_uischemas (dict[str, dict] | Iterable[dict], optional) – If given, also merge the react-jsonschema-forms from this argument and return as the second return value. If schemas is a dict, this parameter must also be a dict, if schemas is only an iterable, this paramater must support numerical indexing.

Returns:

schema (dict) – A JSON schema dict with a top-level object which contains the given schemata as properties.
uischema (dict) – If rjsf_uischemas was given, this contains the merged UI schemata.

caosadvancedtools.json_schema_exporter.recordtype_to_json_schema(rt: RecordType, additional_properties: bool = True, name_property_for_new_records: bool = False, description_property_for_new_records: bool = False, additional_options_for_text_props: dict | None = None, additional_json_schema: Dict[str, dict] = None, additional_ui_schema: Dict[str, dict] = None, units_in_description: bool = True, do_not_create: List[str] = None, do_not_retrieve: List[str] = None, no_remote: bool = False, use_rt_pool: DataModel = None, multiple_choice: List[str] = None, rjsf: bool = False, wrap_files_in_objects: bool = False) → dict | Tuple[dict, dict]

Create a jsonschema from a given RecordType that can be used, e.g., to validate a json specifying a record of the given type.

This is a standalone function which works without manually creating a JsonSchemaExporter object.

Parameters:

rt (RecordType) – The RecordType from which a json schema will be created.
additional_properties (bool, optional) – Whether additional properties will be admitted in the resulting schema. Optional, default is True.
name_property_for_new_records (bool, optional) – Whether objects shall generally have a name property in the generated schema. Optional, default is False.
description_property_for_new_records (bool, optional) – Whether objects shall generally have a description property in the generated schema. Optional, default is False.
additional_options_for_text_props (dict, optional) – Dictionary containing additional “pattern” or “format” options for string-typed properties. Optional, default is empty.
additional_json_schema (dict[str, dict], optional) – Additional schema content for elements of the given names.
additional_ui_schema (dict[str, dict], optional) – Additional ui schema content for elements of the given names.
units_in_description (bool, optional) – Whether to add the unit of a LinkAhead property (if it has any) to the description of the corresponding schema entry. If set to false, an additional unit key is added to the schema itself which is purely annotational and ignored, e.g., in validation. Default is True.
do_not_create (list[str], optional) – A list of reference Property names, for which there should be no option to create them. Instead, only the choice of existing elements should be given.
do_not_retrieve (list[str], optional) – A list of RedcordType names, for which no Records shall be retrieved. Instead, only an object description should be given. If this list overlaps with the do_not_create parameter, the behavior is undefined.
no_remote (bool, optional) – If True, do not attempt to connect to a LinkAhead server at all. Default is False.
use_rt_pool (models.data_model.DataModel, optional) – If given, do not attempt to retrieve RecordType information remotely but from this parameter instead.
multiple_choice (list[str], optional) – A list of reference Property names which shall be denoted as multiple choice properties. This means that each option in this property may be selected at most once. This is not implemented yet if the Property is not in do_not_create as well.
rjsf (bool, optional) – If True, uiSchema definitions for react-jsonschema-forms will be output as the second return value. Default is False.
wrap_files_in_objects (bool, optional) – Whether (lists of) files should be wrapped into an array of objects that have a file property. The sole purpose of this wrapping is to provide a workaround for a react-jsonschema-form bug so only set this to True if you’re using the exported schema with react-json-form and you are experiencing the bug. Default is False.

Returns:

schema (dict) – A dict containing the json schema created from the given RecordType’s properties.
ui_schema (dict, optional) – A ui schema. Only if a parameter asks for it (e.g. rjsf).

caosadvancedtools.loadFiles module

Utilities to make the LinkAhead server aware of files.

Installation of caosadvancedtools also creates an executable script linkahead-loadfiles which calls the loadpath function. Get the full help with linkahead-loadfiles --help. In short, that script tells the LinkAhead server to create FILE entities for existing files in one branch of the directory tree. It is necessary that this directory is already visible for the server (for example because it is defined as extroot in the LinkAhead profile).

caosadvancedtools.loadFiles.combine_ignore_files(caosdbignore: str, localignore: str, dirname=None) → str

Append the contents of localignore to caosdbignore, save the result, and return the name.

Parameters:

caosdbignore (str) – Path to parent level caosdbignore file
localignore (str) – Path to current working directory’s local caosdbignore.
dirname (str, optional) – The path of the directory to which the temporary combined file is written. If None is given, NamedTemporaryFile’s default is used. Default is None.

Returns:

name – Name of the temporary combined caosdbignore file.

Return type:

str

caosadvancedtools.loadFiles.compile_file_list(caosdbignore: str, localpath: str) → list[str]

Create a list of files that contain all files under localpath except those excluded by caosdbignore.

Parameters:

caosdbignore (str) – Path of caosdbignore file
localpath (str) – Path of the directory from which the file list will be compiled.

Returns:

file_list – List of files in localpath after appling the ignore rules from caosdbignore.

Return type:

list[str]

caosadvancedtools.loadFiles.convert_size(size: int): Convert size from bytes to a human-readable file size in KB, MB, …

caosadvancedtools.loadFiles.create_re_for_file_list(files: list[str], localroot: str, remoteroot: str) → str

Create a regular expression that matches file paths contained in the files argument and all parent directories. The prefix localroot is replaced by the prefix `remoteroot.

Parameters:

files (list[str]) – List of file paths to be converted to a regular expression.
localroot (str) – Prefix (of the local directory root) to be removed from the paths in files.
remoteroot (str) – Prefix (of the LinkAhead filesystem’s directory root) to be prepended to the file paths after the removal of the localroot prefix.

Returns:

regexp – Regular expression that matches all file paths from files adapted for the remote directory root.

Return type:

str

caosadvancedtools.loadFiles.loadpath(path: str, include: str | None, exclude: str | None, prefix: str, dryrun: bool, forceAllowSymlinks: bool, caosdbignore: str | None = None, localpath: str | None = None)

Make all files in path available to the LinkAhead server as FILE entities.

Notes

Run linkahead-loadfiles --help for more information and examples.

Parameters:

path (str) – Path to the directory the files of which are to be made available as seen by the linkahead server (i.e., the path from within the Docker container in a typical LinkAhead Control setup.)
include (str or None) – Regular expression matching the files that will be included. If None, all files are matched. This is ignored if a caosdbignore is provided.
exclude (str or None) – Regular expression matching files that are to be included.
prefix (str) – The prefix under which the files are to be inserted into LinkAhead’s file system.
dryrun (bool) – Whether a dryrun should be performed.
forceAllowSymlinks (bool) – Whether symlinks in the path to be inserted should be processed.
caosdbignore (str, optional) – Path to a caosdbignore file that defines which files shall be included and which do not. The syntax is the same as in a gitignore file. You must also provide the localpath option since the check is done locally. If this is given, any include is ignored.
localpath (str, optional) – Path of path on the local machine. Only needed in combination with a caosdbignore file since that is processed locally.

caosadvancedtools.loadFiles.main(argv=None): Run loadpath with the arguments specified on the command line, extended by the optional argv parameter. See --help for more information.

caosadvancedtools.pandoc_header_tools module

exception caosadvancedtools.pandoc_header_tools.MetadataFileMissing(filename, *args, **kwargs): Bases: Exception

exception caosadvancedtools.pandoc_header_tools.NoValidHeader(filename, *args, **kwargs): Bases: Exception

exception caosadvancedtools.pandoc_header_tools.ParseErrorsInHeader(filename, reason, *args, **kwargs): Bases: Exception

caosadvancedtools.pandoc_header_tools.add_header(filename, header_dict=None)

Add a header to an md file.

If the file does not exist it will be created.

If header_dict is a dictionary and not None the header will be created based on the keys and values of that dictionary.

caosadvancedtools.pandoc_header_tools.clean_header(header)

caosadvancedtools.pandoc_header_tools.get_header(filename, add_header_to_file=False)

Open an md file identified by filename and read out the yaml header.

filename can also be a folder. In this case folder/README.md will be used for getting the header.

If a header is found a tuple is returned: (first yaml header line index, last+1 yaml header line index, header)

Otherwise, if add_header_to_file is True, a header is added and the function is called again.

The header is normalized in the following way:

If the value to a key is a string, a list with that string as only element is returned.

From https://pandoc.org/MANUAL.html:

A YAML metadata block is a valid YAML object, delimited by a line of three hyphens (—) at the top and a line of three hyphens (—) or three dots (…) at the bottom. A YAML metadata block may occur anywhere in the document, but if it is not at the beginning, it must be preceded by a blank line.

caosadvancedtools.pandoc_header_tools.kw_present(header, kw): Check whether keywords are present in the header.

caosadvancedtools.pandoc_header_tools.save_header(filename, header_data)

Save a header identified by the tuple header_data to the file identified by filename.

filename can also be a folder. In this case folder/README.md will be used for getting the header.

caosadvancedtools.read_md_header module

caosadvancedtools.read_md_header.get_header(fn)

caosadvancedtools.structure_mapping module

class caosadvancedtools.structure_mapping.EntityMapping

Bases: object

map local entities to entities on the server

the dict to_existing maps _cuid property to entity objects the dict to_target maps id property to entity objects

add(target, existing)

caosadvancedtools.structure_mapping.collect_existing_structure(target_structure, existing_root, em)

recursively collects existing entities

The collected entities are those that correspond to the ones in target_structure.

em: EntityMapping

caosadvancedtools.structure_mapping.update_matched_entity(em, updating, target_record, existing_record): update the Record existing in the server according to the Record supplied as target_record

caosadvancedtools.structure_mapping.update_structure(em, updating: Container, target_structure: Record)

compare the existing records with the target record tree created from the h5 object

Parameters:

existing_structure – retrieved entity; e.g. the top level identifiable
target_structure (db.Record) – A record which may have references to other records. Must be a DAG.

caosadvancedtools.suppressKnown module

class caosadvancedtools.suppressKnown.SuppressKnown(db_file=None)

Bases: Filter

This filter allows to suppress log messages that were shown before.

The python logging module can be used as normal. This Filter needs to be added to the appropriate Logger and logging calls (e.g. to warning, info etc.) need to have an additional extra argument. This argument should be a dict that contains an identifier and a category.

Example:

extra={"identifier":"<Record>something</Record>", category="entities"}

The identifier is used to check whether a message was shown before and should be a string. The category can be used to remove a specific group of messages from memory and the logger would show those messages again even when they are known.

create_cache()

filter(record)

Return whether the record shall be logged.

If either identifier of category is missing 1 is returned (logging enabled). If the record has both attributes, it is checked whether the combination was shown before (was_tagged). If so 0 is returned. Otherwise the combination is saved and 1 is returned

hash(txt, identifier)

reset(category)

tag_msg(txt, identifier, category)

was_tagged(digest)

caosadvancedtools.table_converter module

caosadvancedtools.table_converter.from_table(spreadsheet, recordtype): parses a pandas DataFrame to a list of records

caosadvancedtools.table_converter.from_tsv(filename, recordtype): parses a tsv file to a list of records

caosadvancedtools.table_converter.generate_property_name(prop)

caosadvancedtools.table_converter.main()

caosadvancedtools.table_converter.to_table(container): Create a table from the records in a container.

caosadvancedtools.table_converter.to_tsv(filename, container)

caosadvancedtools.table_export module

Collect optional and mandatory data from LinkAhead records and prepare them for an export as a table, e.g., for the export to metadata repositories.

class caosadvancedtools.table_export.BaseTableExporter(export_dict, record=None, raise_error_if_missing=False)

Bases: object

Base exporter class from which all actual implementations inherit. It contains the basic structure with a dictionary for optional and mandatory keys, and the error handling. The actual logic for finding the values to the entries has to be implemented elsewhere. The final results are stored in the info dict.

collect_information(): Use the items of export_dict to collect the information for the export.

prepare_csv_export(delimiter=',', print_header=False, skip_empty_optionals=False)

Return the values in self.info as a single-line string, separated by the delimiter. If header is true, a header line with the names of the entries, separated by the same delimiter is added. Header and body are separated by a newline character.

Parameters:

delimiter (string, optional) – symbol that separates two consecutive entries, e.g. ‘,’ for .csv or ‘ ‘ for .tsv. Default is ‘,’.
print_header (bool, optional) – specify whether a header line with all entry names separated by the delimiter precedes the body. Default is False.
skip_empty_optionals (bool, True) – if this is true, optional entries without value will be skipped in the output string. Otherwise an empty field will be attached. Default is False.

Raises:

TableExportError: – if mandatory entries are missing a value

Returns:

a single string, either only the body line, or header and body separated by a newline character if header is True.

Return type:

string

exception caosadvancedtools.table_export.TableExportError(msg)

Bases: LinkAheadException

Error that is raised in case of failing export, e.g., because of missing mandatory entries.

caosadvancedtools.table_importer module

This module allows to read table files like tsv and xls. They are converted to a Pandas DataFrame and checked whether they comply with the rules provided. For example, a list of column names that have to exist can be provided.

This module also implements some converters that can be applied to cell entries.

Those converters can also be used to apply checks on the entries.

class caosadvancedtools.table_importer.CSVImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)

Bases: TableImporter

read_file(filename, sep=',', **kwargs)

class caosadvancedtools.table_importer.TSVImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)

Bases: CSVImporter

read_file(filename, **kwargs)

class caosadvancedtools.table_importer.TableImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)

Bases: object

Abstract base class for importing data from tables.

check_columns(df, filename=None)

Check whether all required columns exist.

Required columns are columns for which converters are defined.

Raises:: DataInconsistencyError –

check_dataframe(df, filename=None, strict=False)

Check if the dataframe conforms to the restrictions.

Checked restrictions are: Columns, data types, uniqueness requirements.

Parameters:

df (pandas.DataFrame) – The dataframe to be checked.
filename (string, optional) – The file name, only used for output in case of problems.
strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.

check_datatype(df, filename=None, strict=False)

Check for each column whether non-null fields have the correct datatype.

Note

If columns are integer, but should be float, this method converts the respective columns in place. The same for columns that should have string value but have numeric value.

Parameters:: strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.

check_missing(df, filename=None)

Check in each row whether obligatory fields are empty or null.

Rows that have missing values are removed.

Returns:: out – The input DataFrame with incomplete rows removed.
Return type:: pandas.DataFrame

check_unique(df, filename=None)

Check whether value combinations that shall be unique for each row are unique.

If a second row is found, that uses the same combination of values as a previous one, the second one is removed.

read_file(filename, **kwargs)

class caosadvancedtools.table_importer.XLSImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)

Bases: TableImporter

read_file(filename, **kwargs)

read_xls(filename, **kwargs)

Convert an xls file into a Pandas DataFrame.

The converters of the XLSImporter object are used.

Raises: DataInconsistencyError

caosadvancedtools.table_importer.assure_name_format(name): checks whether a string can be interpreted as ‘LastName, FirstName’

caosadvancedtools.table_importer.check_reference_field(ent_id, recordtype)

caosadvancedtools.table_importer.date_converter(val, fmt='%Y-%m-%d'): if the value is already a datetime, it is returned otherwise it converts it using format string

caosadvancedtools.table_importer.datetime_converter(val, fmt='%Y-%m-%d %H:%M:%S'): if the value is already a datetime, it is returned otherwise it converts it using format string

caosadvancedtools.table_importer.incomplete_date_converter(val, fmts=None)

if the value is already a datetime, it is returned otherwise it converts it using format string

Parameters:

val (str) – Candidate value for one of the possible date formats.
fmts (dict, optional) – Dictionary containing the possible (incomplete) date formats: keys are the formats into which the input value is tried to be converted, values are the possible input formats.

caosadvancedtools.table_importer.string_in_list(val, options, ignore_case=True)

Return the given value if it is contained in options, raise an error otherwise.

Parameters:

val (str) – String value to be checked.
options (list<str>) – List of possible values that val may obtain
ignore_case (bool, optional) – Specify whether the comparison of val and the possible options should ignor capitalization. Default is True.

Returns:

val – The original value if it is contained in options

Return type:

str

Raises:

ValueError – If val is not contained in options.

caosadvancedtools.table_importer.win_path_converter(val): checks whether the value looks like a windows path and converts it to posix

caosadvancedtools.table_importer.win_path_list_converter(val): checks whether the value looks like a list of windows paths and converts it to posix paths

caosadvancedtools.table_importer.yes_no_converter(val)

converts a string to True or False if possible.

Allowed filed values are yes and no.

caosadvancedtools.utils module

caosadvancedtools.utils.check_win_path(path: str, filename: str = None)

check whether ‘/’ are in the path but no ‘’.

If that is the case, it is likely, that the path is not a Windows path.

Parameters:

path (str) – Path to be checked.
filename (str) – If the path is located in a file, this parameter can be used to direct the user to the file where the path is located.

caosadvancedtools.utils.create_entity_link(entity: Entity, base_url: str = '')

creates a string that contains the code for an html link to the provided entity.

The text of the link is the entity name if one exists and the id otherwise.

Parameters:

entity (db.Entity) – the entity object to which the link will point
base_url (str) – optional, by default, the url starts with ‘/Entity’ and thus is relative. You can provide a base url that will be prefixed.

Returns:

the string containing the html code

Return type:

str

caosadvancedtools.utils.find_records_that_reference_ids(referenced_ids, rt='', step_size=50)

Returns a list with ids of records that reference entities with supplied ids

Sometimes a file or folder will be referenced in a README.md (e.g. in an Analysis) but not those files shall be referenced but the corresponding object (e.g. the Experiment). Thus the ids of all Records (of a suitable type) are collected that reference one or more of the supplied ids. This is done in chunks as the ids are passed in the header of the http request.

caosadvancedtools.utils.get_referenced_files(glob: str, prefix: str = None, filename: str = None, location: str = None)

queries the database for files referenced by the provided glob

Parameters:

glob (str) – the glob referencing the file(s)
prefix (str, optional) – the glob can be relative to some path, in that case that path needs to be given as prefix
filename (str, optional) – the file in which the glob is given (used for error messages)
location (str, optional) – the location in the file in which the glob is given (used for error messages)

caosadvancedtools.utils.read_field_as_list(field): E.g. in yaml headers entries can be single values or list. To simplify the work with those values, this function puts single values in a list.

caosadvancedtools.utils.replace_path_prefix(path, old_prefix, new_prefix)

Replaces the prefix old_prefix in path with new_prefix.

Raises a RuntimeError when the path does not start with old_prefix.

caosadvancedtools.utils.return_field_or_property(value, prop=None)

returns value itself of a property.

Typical in yaml headers is that a field might sometimes contain a single value and other times a dict itself. This function either returns the single value or (in case of dict as value) a value of the dict.

caosadvancedtools.utils.set_log_level(level=10)

caosadvancedtools.utils.string_to_person(person)

Creates a Person Record from a string.

The following formats are supported: - <Firstname> - <Lastname(s)>,

The part after the name can be used for an affiliation for example.

caosadvancedtools.webui_formatter module

class caosadvancedtools.webui_formatter.WebUI_Formatter(*args, full_file=None, **kwargs)

Bases: Formatter

allows to make logging to be nicely displayed in the WebUI

You can enable this as follows: logger = logging.getLogger(“<LoggerName>”) formatter = WebUI_Formatter(full_file=”path/to/file”) handler = logging.Handler() handler.setFormatter(formatter) logger.addHandler(handler)

format(record)

Return the HTML formatted log record for display on a website.

This essentially wraps the text formatted by the parent class in html.

Parameters:: record
Raises:: RuntimeError – If the log level of the record is not supported. Supported log levels include logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, and logging.CRITICAL.
Returns:: The formatted log record.
Return type:: str

caosadvancedtools package

Subpackages

Submodules

caosadvancedtools.cache module

Parameters:

Parameters:

Parameters:

caosadvancedtools.cfood module

caosadvancedtools.collect_datamodel module

caosadvancedtools.crawler module

Parameters:

Parameters:

Parameters:

Parameters:

caosadvancedtools.datainconsistency module

caosadvancedtools.datamodel_problems module

caosadvancedtools.example_cfood module

caosadvancedtools.export_related module

caosadvancedtools.guard module

caosadvancedtools.import_from_xml module

caosadvancedtools.json_schema_exporter module

caosadvancedtools.loadFiles module

caosadvancedtools.pandoc_header_tools module

caosadvancedtools.read_md_header module

caosadvancedtools.structure_mapping module

caosadvancedtools.suppressKnown module

caosadvancedtools.table_converter module

caosadvancedtools.table_export module

caosadvancedtools.table_importer module

caosadvancedtools.utils module

caosadvancedtools.webui_formatter module

Module contents