caosadvancedtools package
Subpackages
- caosadvancedtools.cfoods package
- caosadvancedtools.models package
- Submodules
- caosadvancedtools.models.data_model module
- caosadvancedtools.models.parser module
- Module contents
- caosadvancedtools.scifolder package
- Submodules
- caosadvancedtools.scifolder.analysis_cfood module
- caosadvancedtools.scifolder.experiment_cfood module
- caosadvancedtools.scifolder.generic_pattern module
- caosadvancedtools.scifolder.publication_cfood module
- caosadvancedtools.scifolder.result_table_cfood module
- caosadvancedtools.scifolder.simulation_cfood module
- caosadvancedtools.scifolder.software_cfood module
- caosadvancedtools.scifolder.utils module
- caosadvancedtools.scifolder.withreadme module
- Module contents
- caosadvancedtools.serverside package
- Submodules
- caosadvancedtools.serverside.generic_analysis module
- caosadvancedtools.serverside.helper module
- caosadvancedtools.serverside.logging module
- caosadvancedtools.serverside.sync module
- Module contents
- caosadvancedtools.table_json_conversion package
- Submodules
- caosadvancedtools.table_json_conversion.convert module
- caosadvancedtools.table_json_conversion.fill_xlsx module
- caosadvancedtools.table_json_conversion.table_generator module
- caosadvancedtools.table_json_conversion.xlsx_utils module
ColumnType
RowType
array_schema_from_model_schema()
get_column_type_row_index()
get_data_columns()
get_defining_paths()
get_foreign_key_columns()
get_path_position()
get_path_rows()
get_row_type_column_index()
get_subschema()
get_worksheet_for_path()
is_exploded_sheet()
next_row_index()
p2s()
parse_multiple_choice()
read_or_dict()
- Module contents
Submodules
caosadvancedtools.cache module
- class caosadvancedtools.cache.AbstractCache(db_file=None, force_creation=False)
Bases:
ABC
- check_cache()
Check whether the cache in db file self.db_file exists and conforms to the latest database schema.
If it does not exist, it will be created using the newest database schema.
If it exists, but the schema is outdated, an exception will be raised.
- abstract create_cache()
Provide an overloaded function here that creates the cache in the most recent version.
- abstract get_cache_schema_version()
A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
- get_cache_version()
Return the version of the cache stored in self.db_file. The version is stored as the only entry in colum schema of table version.
- abstract get_default_file_name()
Supply a default file name for the cache here.
- class caosadvancedtools.cache.Cache(*args, **kwargs)
Bases:
IdentifiableCache
- class caosadvancedtools.cache.IdentifiableCache(db_file=None, force_creation=False)
Bases:
AbstractCache
stores identifiables (as a hash of xml) and their respective ID.
This allows to retrieve the Record corresponding to an indentifiable without querying.
- check_existing(ent_hash)
Check the cache for a hash.
ent_hash: The hash to search for.
Return the ID and the version ID of the hashed entity. Return None if no entity with that hash is in the cache.
- create_cache()
Create a new SQLITE cache file in self.db_file.
Two tables will be created: - identifiables is the actual cache. - version is a table with version information about the cache.
- get_cache_schema_version()
A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
- get_default_file_name()
Supply a default file name for the cache here.
- static hash_entity(ent)
Format an entity as “pretty” XML and return the SHA256 hash.
- insert(ent_hash, ent_id, ent_version)
Insert a new cache entry.
ent_hash: Hash of the entity. Should be generated with Cache.hash_entity ent_id: ID of the entity ent_version: Version string of the entity
- insert_list(hashes, entities)
Insert the ids of entities into the cache
The hashes must correspond to the entities in the list
- update_ids_from_cache(entities)
sets ids of those entities that are in cache
A list of hashes corresponding to the entities is returned
- validate_cache(entities=None)
Runs through all entities stored in the cache and checks whether the version still matches the most recent version. Non-matching entities will be removed from the cache.
- entities: When set to a db.Container or a list of Entities
the IDs from the cache will not be retrieved from the CaosDB database, but the versions from the cache will be checked against the versions contained in that collection. Only entries in the cache that have a corresponding version in the collection will be checked, all others will be ignored. Useful for testing.
Return a list of invalidated entries or an empty list if no elements have been invalidated.
- class caosadvancedtools.cache.UpdateCache(db_file=None, force_creation=False)
Bases:
AbstractCache
stores unauthorized inserts and updates
If the Guard is set to a mode that does not allow an insert or update, the insert or update can be stored in this cache such that it can be authorized and performed later.
- create_cache()
initialize the cache
- get(run_id, querystring)
returns the pending updates for a given run id
Parameters:
run_id: the id of the crawler run querystring: the sql query
- get_cache_schema_version()
A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
- get_default_file_name()
Supply a default file name for the cache here.
- get_inserts(run_id)
returns the pending updates for a given run id
Parameters:
run_id: the id of the crawler run
- static get_previous_version(cont)
Retrieve the current, unchanged version of the entities that shall be updated, i.e. the version before the update
- get_updates(run_id)
returns the pending updates for a given run id
Parameters:
run_id: the id of the crawler run
- insert(cont, run_id, insert=False)
Insert a pending, unauthorized insert or update
- caosadvancedtools.cache.cleanXML(xml)
- caosadvancedtools.cache.get_pretty_xml(cont)
- caosadvancedtools.cache.put_in_container(stuff)
caosadvancedtools.cfood module
Defines how something that shall be inserted into LinkAhead is treated.
LinkAhead can automatically be filled with Records based on some structure, a file structure, a table or similar.
The Crawler will iterate over the respective items and test for each item whether a CFood class exists that matches the file path, i.e. whether CFood class wants to treat that pariticular item. If one does, it is instanciated to treat the match. This occurs in basically three steps:
Create a list of identifiables, i.e. unique representation of LinkAhead Records (such as an experiment belonging to a project and a date/time).
The identifiables are either found in LinkAhead or they are created.
The identifiables are update based on the date in the file structure.
- class caosadvancedtools.cfood.AbstractCFood(item)
Bases:
object
Abstract base class for Crawler food (CFood).
- attach(item)
- collect_information()
The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
- abstract create_identifiables()
should set the instance variable Container with the identifiables
- looking_for(item)
returns True if item can be added to this CFood.
Typically a CFood exists for a file and defines how to deal with the file. However, sometimes additional files “belong” to a CFood. E.g. an experiment CFood might match against a README file but labnotes.txt also shall be treated by the cfood (and not a special cfood created for labnotes.txt) This function can be used to define what files shall be ‘attached’.
To be overwritten by subclasses
- classmethod match_item(item)
Matches an item found by the crawler against this class. Returns True if the item shall be treated by this class, i.e. if this class matches the item.
- Parameters:
item (object) – iterated by the crawler
subclasses! (To be overwritten by)
- static remove_property(entity, prop)
- static set_parents(entity, names)
- static set_property(entity, prop, value, datatype=None)
- abstract update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- class caosadvancedtools.cfood.AbstractFileCFood(crawled_path, *args, **kwargs)
Bases:
AbstractCFood
- property crawled_file
- classmethod get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- looking_for(crawled_file)
returns True if crawled_file can be added to this CFood.
Typically a CFood exists for a file and defines how to deal with the file. However, sometimes additional files “belong” to a CFood. E.g. an experiment CFood might match against a README file but labnotes.txt also shall be treated by the cfood (and not a special cfood created for labnotes.txt) This function can be used to define what files shall be ‘attached’.
- classmethod match_item(path)
Matches the regular expression of this class against file names
- Parameters:
path (str) – The path of the file that shall be matched.
- static re_from_extensions(extensions)
Return a regular expression which matches the given file extensions.
Useful for inheriting classes.
- Parameters:
extensions (iterable<str>) – An iterable with the allowed extensions.
- Returns:
out – The regular expression, starting with
.*\.
and ending with the EOL dollar character. The actual extension will be accessible in thepattern group name
ext
.- Return type:
- class caosadvancedtools.cfood.CMeal
Bases:
object
CMeal groups equivalent items and allow their collected insertion.
Sometimes there is no one item that can be used to trigger the creation of some Record. E.g. if a collection of image files shall be referenced from one Record that groups them, it is unclear which image should trigger the creation of the Record.
CMeals are grouped based on the groups in the used regular expression. If, in the above example, all the images reside in one folder, all groups of the filename match except that for the file name should match. The groups that shall match need to be listed in the matching_groups class property. Subclasses will overwrite this property.
This allows to use has_suitable_cfood in the match_item function of a CFood to check whether the necessary CFood was already created. In order to allow this all instances of a CFood class are tracked in the existing_instances class member.
Subclasses must have a cls.get_re function and a match member variable (see AbstractFileCFood)
- classmethod all_groups_equal(m1, m2)
- belongs_to_meal(item)
- existing_instances = []
- static get_re()
- classmethod has_suitable_cfood(item)
checks whether the required cfood object already exists.
item : the crawled item
- matching_groups = []
- class caosadvancedtools.cfood.FileGuide
Bases:
object
- access(path)
should be replaced by a function that adds a prefix to paths to allow to access LinkAhead files locally
This default just returns the unchanged path.
- class caosadvancedtools.cfood.RowCFood(item, unique_cols, recordtype, **kwargs)
Bases:
AbstractCFood
- create_identifiables()
should set the instance variable Container with the identifiables
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- caosadvancedtools.cfood.add_files(filemap)
add to the file cache
- caosadvancedtools.cfood.assure_has_description(entity, description, to_be_updated=None, force=False)
Checks whether
entity
has the description that is passed.If this is the case this function ends. Otherwise the entity is assigned a new description. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updated
- caosadvancedtools.cfood.assure_has_parent(entity, parent, to_be_updated=None, force=False, unique=True)
Checks whether
entity
has a parent with nameparent
.If this is the case this function ends. Otherwise the entity is assigned a new parent. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updated
- caosadvancedtools.cfood.assure_has_property(entity, name, value, to_be_updated=None, datatype=None, setproperty=False)
Checks whether
entity
has a propertyname
with the valuevalue
.If this is the case this function ends. Otherwise the entity is assigned a new parent.
Note that property matching occurs based on names.
If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updatedsetproperty: boolean, if True, overwrite existing properties.
- caosadvancedtools.cfood.assure_name_is(entity, name, to_be_updated=None, force=False)
Checks whether
entity
has the name that is passed.If this is the case this function ends. Otherwise the entity is assigned a new name. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updated
- caosadvancedtools.cfood.assure_object_is_in_list(obj, containing_object, property_name, to_be_updated=None, datatype=None)
Checks whether
obj
is one of the values in the list propertyproperty_name
of the supplied entitycontaining_object
.If this is the case this function returns. Otherwise the entity is added to the property
property_name
and the entitycontaining_object
is added to the supplied list to_be_updated in order to indicate, that the entitycontaining_object
should be updated. If none is submitted the update will be conducted in-place.If the property is missing, it is added first and then the entity is added/updated.
If obj is a list, every element is added
- caosadvancedtools.cfood.assure_parents_are(entity, parents, to_be_updated=None, force=False, unique=True)
Checks whether
entity
has the provided parents (and only those).If this is the case this function ends. Otherwise the entity is assigned the new parents and the old ones are discarded.
Note that parent matching occurs based on names. If a parent does not have a name, a ValueError is raised.
If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updatedparents: single string or list of strings
- caosadvancedtools.cfood.assure_property_is(entity, name, value, datatype=None, to_be_updated=None, force=False)
Checks whether
entity
has a Propertyname
with the given value.If this is the case this function ends. Otherwise the entity is assigned a new property or an existing one is updated.
If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updated
- caosadvancedtools.cfood.assure_special_is(entity, value, kind, to_be_updated=None, force=False)
Checks whether
entity
has the name or description that is passed.If this is the case this function ends. Otherwise the entity is assigned a new name. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity
entity
should be updated. Otherwise it is directly updated
- caosadvancedtools.cfood.get_entity(name)
Returns the entity with a given name, preferably from a local cache.
If the local cache does not contain the entity, retrieve it from LinkAhead.
- caosadvancedtools.cfood.get_entity_for_path(path)
- caosadvancedtools.cfood.get_ids_for_entities_with_names(entities)
- caosadvancedtools.cfood.get_property(name)
Returns the property with a given name, preferably from a local cache.
If the local cache does not contain the record type, try to retrieve it from LinkAhead. If it does not exist, see whether it could be a record type used as a property.
- caosadvancedtools.cfood.get_record(name)
Returns the record with a given name, preferably from a local cache.
If the local cache does not contain the record, try to retrieve it from LinkAhead.
- caosadvancedtools.cfood.get_recordtype(name)
Returns the record type with a given name, preferably from a local cache.
If the local cache does not contain the record type, try to retrieve it from LinkAhead. If it does not exist, add it to the data model problems
- caosadvancedtools.cfood.insert_id_based_on_name(entity)
caosadvancedtools.collect_datamodel module
caosadvancedtools.crawler module
Crawls a file structure and inserts Records into LinkAhead based on what is found.
LinkAhead can automatically be filled with Records based on some file structure. The Crawler will iterate over the files and test for each file whether a CFood exists that matches the file path. If one does, it is instanciated to treat the match. This occurs in basically three steps: 1. create a list of identifiables, i.e. unique representation of LinkAhead Records (such as an experiment belonging to a project and a date/time) 2. the identifiables are either found in LinkAhead or they are created. 3. the identifiables are update based on the date in the file structure
- class caosadvancedtools.crawler.Crawler(cfood_types, use_cache=False, abort_on_exception=True, interactive=True, hideKnown=False, debug_file=None, cache_file=None)
Bases:
object
- check_matches(matches)
- collect_cfoods()
This is the first phase of the crawl. It collects all cfoods that shall be processed. The second phase is iterating over cfoods and updating LinkAhead. This separate first step is necessary in order to allow a single cfood being influenced by multiple crawled items. E.g. the FileCrawler can have a single cfood treat multiple files.
This is a very basic implementation and this function should be overwritten by subclasses.
The basic structure of this function should be, that what ever is being processed is iterated and each cfood is checked whether the item ‘matches’. If it does, a cfood is instantiated passing the item as an argument. The match can depend on the cfoods already being created, i.e. a file migth no longer match because it is already treaded by an earlier cfood.
should return cfoods, tbs and errors_occured. # TODO do this via logging? tbs text returned from traceback errors_occured True if at least one error occured
- crawl(security_level=0, path=None)
- static create_query_for_identifiable(ident)
uses the properties of ident to create a query that can determine whether the required record already exists.
- static find_existing(entity)
searches for an entity that matches the identifiable in LinkAhead
Characteristics of the identifiable like, properties, name or id are used for the match.
- static find_or_insert_identifiables(identifiables)
Sets the ids of identifiables (that do not have already an id from the cache) based on searching LinkAhead and retrieves those entities. The remaining entities (those which can not be retrieved) have no correspondence in LinkAhead and are thus inserted.
- iteritems()
generates items to be crawled with an index
- static save_form(changes, path, run_id)
Saves an html website to a file that contains a form with a button to authorize the given changes.
The button will call the crawler with the same path that was used for the current run and with a parameter to authorize the changes of the current run.
Parameters:
changes: The LinkAhead entities in the version after the update. path: the path defining the subtree that is crawled
- class caosadvancedtools.crawler.FileCrawler(files, **kwargs)
Bases:
Crawler
- iteritems()
generates items to be crawled with an index
- static query_files(path)
- class caosadvancedtools.crawler.TableCrawler(table, unique_cols, recordtype, **kwargs)
Bases:
Crawler
- iteritems()
generates items to be crawled with an index
- caosadvancedtools.crawler.apply_list_of_updates(to_be_updated, update_flags=None, update_cache=None, run_id=None)
Updates the
to_be_updated
Container, i.e., pushes the changes to LinkAhead after removing possible duplicates. If a chace is provided, uauthorized updates can be cached for further authorization.Parameters:
- to_be_updateddb.Container
Container with the entities that will be updated.
- update_flagsdict, optional
Dictionary of LinkAhead server flags that will be used for the update. Default is an empty dict.
- update_cacheUpdateCache or None, optional
Cache in which the intended updates will be stored so they can be authorized afterwards. Default is None.
- run_idString or None, optional
Id with which the pending updates are cached. Only meaningful if
update_cache
is provided. Default is None.
- caosadvancedtools.crawler.get_value(prop)
Returns the value of a Property
- Parameters:
prop (The property of which the value shall be returned.)
- Returns:
out
- Return type:
The value of the property; if the value is an entity, its ID.
- caosadvancedtools.crawler.separated(text)
caosadvancedtools.datainconsistency module
Implements an error to be used when there is a problem with the data to be read. I.e. something that users of CaosDB need to fix.
- exception caosadvancedtools.datainconsistency.DataInconsistencyError
Bases:
ValueError
caosadvancedtools.datamodel_problems module
Implements a class for finding and storing missing entities, either record types or properties, that are missing in a data model. They can be inserted by hand or gueesed from possible exceptions when inserting or updating entities with missing parents and/or properties.
- class caosadvancedtools.datamodel_problems.DataModelProblems
Bases:
object
Collect and store missing RecordTypes and Properties.
- static add(ent)
Add a missing record type or property.
- static evaluate_exception(e)
Take a TransactionError, see whether it was caused by datamodel problems, and update missing parents and/or properties if this was the case. Afterwards, raise the exception.
- Parameters:
e (TransactionError) – TransactionError, the children of which are checked for possible datamodel problems.
- missing = {}
caosadvancedtools.example_cfood module
- class caosadvancedtools.example_cfood.ExampleCFood(crawled_path, *args, **kwargs)
Bases:
AbstractFileCFood
- create_identifiables()
should set the instance variable Container with the identifiables
- classmethod get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
caosadvancedtools.guard module
caosadvancedtools.import_from_xml module
This file allows to import a dataset stored in a xml representation and corresponding files.
The export should have been done with export_related.py
- caosadvancedtools.import_from_xml.create_dummy_file(text='Please ask the administrator for this file.')
- caosadvancedtools.import_from_xml.defineParser()
- caosadvancedtools.import_from_xml.import_xml(filename, rerun=False, interactive=True)
filename: path to the xml file with the data to be inserted rerun: boolean; if true, files are not inserted as paths would conflict.
- caosadvancedtools.import_from_xml.main()
caosadvancedtools.json_schema_exporter module
Convert a data model into a json schema.
Sometimes you may want to have a json schema which describes a LinkAhead data model, for example for the automatic generation of user interfaces with third-party tools like rjsf. Then this is the right module for you!
The json_schema_exporter
module has one main class,
JsonSchemaExporter
, and a few utility and wrapper functions.
For easy usage, you may simply import recordtype_to_json_schema
and use it on a fully referenced
RecordType like this:
import caosadvancedtools.models.parser as parser
import caosadvancedtools.json_schema_exporter as jsex
model = parser.parse_model_from_yaml("my_model.yml")
# get the data model schema for the "Journey" recordtype
schema, ui_schema = recordtype_to_json_schema(
rt=model.get_deep("Journey"),
do_not_create=["Continent"], # only choose from existing Records
multiple_choice=["visited_cities"],
rjsf=True # also create a UI schema
)
For more details on how to use this wrapper, read the function documentation
.
Other useful functions are make_array
, which creates an array out of a single schema, and
merge_schemas
, which as the name suggests allows to combine multiple schema definitions into a
single schema.
- class caosadvancedtools.json_schema_exporter.JsonSchemaExporter(additional_properties: bool = True, name_property_for_new_records: bool = False, description_property_for_new_records: bool = False, additional_options_for_text_props: dict = None, additional_json_schema: Dict[str, dict] = None, additional_ui_schema: Dict[str, dict] = None, units_in_description: bool = True, do_not_create: List[str] = None, do_not_retrieve: List[str] = None, no_remote: bool = False, use_rt_pool: DataModel = None, multiple_choice: List[str] = None, wrap_files_in_objects: bool = False)
Bases:
object
A class which collects everything needed for the conversion.
- recordtype_to_json_schema(rt: RecordType, rjsf: bool = False) dict | Tuple[dict, dict]
Create a jsonschema from a given RecordType that can be used, e.g., to validate a json specifying a record of the given type.
- Parameters:
rt (RecordType) – The RecordType from which a json schema will be created.
rjsf (bool, optional) – If True, uiSchema definitions for react-jsonschema-forms will be output as the second return value. Default is False
- Returns:
schema (dict) – A dict containing the json schema created from the given RecordType’s properties.
ui_schema (dict, optional) – A ui schema. Only if a parameter asks for it (e.g.
rjsf
).
- caosadvancedtools.json_schema_exporter.make_array(schema: dict, rjsf_uischema: dict = None) dict | Tuple[dict, dict]
Create an array of the given schema.
The result will look like this:
{ "type": "array", "items": { // the schema } }
- Parameters:
- Returns:
schema (dict) – A JSON schema dict with a top-level array which contains instances of the given schema.
ui_schema (dict, optional) – The wrapped ui schema. Only returned if
rjsf_uischema
is given as parameter.
- caosadvancedtools.json_schema_exporter.merge_schemas(schemas: Dict[str, dict] | Iterable[dict], rjsf_uischemas: Dict[str, dict] | Sequence[dict] = None) dict | Tuple[dict, dict]
Merge the given schemata into a single schema.
The result will look like this:
{ "type": "object", "properties": { // A, B, C }, "required": [ // "A", "B", "C" ], "additionalProperties": false }
- Parameters:
schemas (dict[str, dict] | Iterable[dict]) – A dict or iterable of schemata which shall be merged together. If this is a dict, the keys will be used as property names, otherwise the titles of the submitted schemata. If they have no title, numbers will be used as a fallback. Note that even with a dict, the original schema’s “title” is not changed.
rjsf_uischemas (dict[str, dict] | Iterable[dict], optional) – If given, also merge the react-jsonschema-forms from this argument and return as the second return value. If
schemas
is a dict, this parameter must also be a dict, ifschemas
is only an iterable, this paramater must support numerical indexing.
- Returns:
schema (dict) – A JSON schema dict with a top-level object which contains the given schemata as properties.
uischema (dict) – If
rjsf_uischemas
was given, this contains the merged UI schemata.
- caosadvancedtools.json_schema_exporter.recordtype_to_json_schema(rt: RecordType, additional_properties: bool = True, name_property_for_new_records: bool = False, description_property_for_new_records: bool = False, additional_options_for_text_props: dict | None = None, additional_json_schema: Dict[str, dict] = None, additional_ui_schema: Dict[str, dict] = None, units_in_description: bool = True, do_not_create: List[str] = None, do_not_retrieve: List[str] = None, no_remote: bool = False, use_rt_pool: DataModel = None, multiple_choice: List[str] = None, rjsf: bool = False, wrap_files_in_objects: bool = False) dict | Tuple[dict, dict]
Create a jsonschema from a given RecordType that can be used, e.g., to validate a json specifying a record of the given type.
This is a standalone function which works without manually creating a JsonSchemaExporter object.
- Parameters:
rt (RecordType) – The RecordType from which a json schema will be created.
additional_properties (bool, optional) – Whether additional properties will be admitted in the resulting schema. Optional, default is True.
name_property_for_new_records (bool, optional) – Whether objects shall generally have a
name
property in the generated schema. Optional, default is False.description_property_for_new_records (bool, optional) – Whether objects shall generally have a
description
property in the generated schema. Optional, default is False.additional_options_for_text_props (dict, optional) – Dictionary containing additional “pattern” or “format” options for string-typed properties. Optional, default is empty.
additional_json_schema (dict[str, dict], optional) – Additional schema content for elements of the given names.
additional_ui_schema (dict[str, dict], optional) – Additional ui schema content for elements of the given names.
units_in_description (bool, optional) – Whether to add the unit of a LinkAhead property (if it has any) to the description of the corresponding schema entry. If set to false, an additional
unit
key is added to the schema itself which is purely annotational and ignored, e.g., in validation. Default is True.do_not_create (list[str], optional) – A list of reference Property names, for which there should be no option to create them. Instead, only the choice of existing elements should be given.
do_not_retrieve (list[str], optional) – A list of RedcordType names, for which no Records shall be retrieved. Instead, only an object description should be given. If this list overlaps with the
do_not_create
parameter, the behavior is undefined.no_remote (bool, optional) – If True, do not attempt to connect to a LinkAhead server at all. Default is False.
use_rt_pool (models.data_model.DataModel, optional) – If given, do not attempt to retrieve RecordType information remotely but from this parameter instead.
multiple_choice (list[str], optional) – A list of reference Property names which shall be denoted as multiple choice properties. This means that each option in this property may be selected at most once. This is not implemented yet if the Property is not in
do_not_create
as well.rjsf (bool, optional) – If True, uiSchema definitions for react-jsonschema-forms will be output as the second return value. Default is False.
wrap_files_in_objects (bool, optional) – Whether (lists of) files should be wrapped into an array of objects that have a file property. The sole purpose of this wrapping is to provide a workaround for a react-jsonschema-form bug so only set this to True if you’re using the exported schema with react-json-form and you are experiencing the bug. Default is False.
- Returns:
schema (dict) – A dict containing the json schema created from the given RecordType’s properties.
ui_schema (dict, optional) – A ui schema. Only if a parameter asks for it (e.g.
rjsf
).
caosadvancedtools.loadFiles module
Utilities to make the LinkAhead server aware of files.
Installation of caosadvancedtools
also creates an executable script linkahead-loadfiles
which
calls the loadpath
function. Get the full help with linkahead-loadfiles --help
. In short,
that script tells the LinkAhead server to create FILE
entities for existing files in one branch of
the directory tree. It is necessary that this directory is already visible for the server (for
example because it is defined as extroot
in the LinkAhead profile).
- caosadvancedtools.loadFiles.combine_ignore_files(caosdbignore: str, localignore: str, dirname=None) str
Append the contents of localignore to caosdbignore, save the result, and return the name.
- Parameters:
caosdbignore (str) – Path to parent level caosdbignore file
localignore (str) – Path to current working directory’s local caosdbignore.
dirname (str, optional) – The path of the directory to which the temporary combined file is written. If None is given,
NamedTemporaryFile
’s default is used. Default is None.
- Returns:
name – Name of the temporary combined caosdbignore file.
- Return type:
- caosadvancedtools.loadFiles.compile_file_list(caosdbignore: str, localpath: str) list[str]
Create a list of files that contain all files under localpath except those excluded by caosdbignore.
- caosadvancedtools.loadFiles.convert_size(size: int)
Convert
size
from bytes to a human-readable file size in KB, MB, …
- caosadvancedtools.loadFiles.create_re_for_file_list(files: list[str], localroot: str, remoteroot: str) str
Create a regular expression that matches file paths contained in the
files
argument and all parent directories. The prefixlocalroot is replaced by the prefix `remoteroot
.- Parameters:
files (list[str]) – List of file paths to be converted to a regular expression.
localroot (str) – Prefix (of the local directory root) to be removed from the paths in
files
.remoteroot (str) – Prefix (of the LinkAhead filesystem’s directory root) to be prepended to the file paths after the removal of the
localroot
prefix.
- Returns:
regexp – Regular expression that matches all file paths from
files
adapted for the remote directory root.- Return type:
- caosadvancedtools.loadFiles.loadpath(path: str, include: str | None, exclude: str | None, prefix: str, dryrun: bool, forceAllowSymlinks: bool, caosdbignore: str | None = None, localpath: str | None = None)
Make all files in
path
available to the LinkAhead server as FILE entities.Notes
Run
linkahead-loadfiles --help
for more information and examples.- Parameters:
path (str) – Path to the directory the files of which are to be made available as seen by the linkahead server (i.e., the path from within the Docker container in a typical LinkAhead Control setup.)
include (str or None) – Regular expression matching the files that will be included. If None, all files are matched. This is ignored if a
caosdbignore
is provided.exclude (str or None) – Regular expression matching files that are to be included.
prefix (str) – The prefix under which the files are to be inserted into LinkAhead’s file system.
dryrun (bool) – Whether a dryrun should be performed.
forceAllowSymlinks (bool) – Whether symlinks in the
path
to be inserted should be processed.caosdbignore (str, optional) – Path to a caosdbignore file that defines which files shall be included and which do not. The syntax is the same as in a gitignore file. You must also provide the
localpath
option since the check is done locally. If this is given, anyinclude
is ignored.localpath (str, optional) – Path of
path
on the local machine. Only needed in combination with acaosdbignore
file since that is processed locally.
caosadvancedtools.pandoc_header_tools module
- exception caosadvancedtools.pandoc_header_tools.MetadataFileMissing(filename, *args, **kwargs)
Bases:
Exception
- exception caosadvancedtools.pandoc_header_tools.NoValidHeader(filename, *args, **kwargs)
Bases:
Exception
- exception caosadvancedtools.pandoc_header_tools.ParseErrorsInHeader(filename, reason, *args, **kwargs)
Bases:
Exception
- caosadvancedtools.pandoc_header_tools.add_header(filename, header_dict=None)
Add a header to an md file.
If the file does not exist it will be created.
If header_dict is a dictionary and not None the header will be created based on the keys and values of that dictionary.
- caosadvancedtools.pandoc_header_tools.clean_header(header)
- caosadvancedtools.pandoc_header_tools.get_header(filename, add_header_to_file=False)
Open an md file identified by filename and read out the yaml header.
filename can also be a folder. In this case folder/README.md will be used for getting the header.
If a header is found a tuple is returned: (first yaml header line index, last+1 yaml header line index, header)
Otherwise, if
add_header_to_file
is True, a header is added and the function is called again.The header is normalized in the following way:
If the value to a key is a string, a list with that string as only element is returned.
From https://pandoc.org/MANUAL.html:
A YAML metadata block is a valid YAML object, delimited by a line of three hyphens (—) at the top and a line of three hyphens (—) or three dots (…) at the bottom. A YAML metadata block may occur anywhere in the document, but if it is not at the beginning, it must be preceded by a blank line.
- caosadvancedtools.pandoc_header_tools.kw_present(header, kw)
Check whether keywords are present in the header.
- caosadvancedtools.pandoc_header_tools.save_header(filename, header_data)
Save a header identified by the tuple header_data to the file identified by filename.
filename can also be a folder. In this case folder/README.md will be used for getting the header.
caosadvancedtools.read_md_header module
- caosadvancedtools.read_md_header.get_header(fn)
caosadvancedtools.structure_mapping module
- class caosadvancedtools.structure_mapping.EntityMapping
Bases:
object
map local entities to entities on the server
the dict to_existing maps _cuid property to entity objects the dict to_target maps id property to entity objects
- add(target, existing)
- caosadvancedtools.structure_mapping.collect_existing_structure(target_structure, existing_root, em)
recursively collects existing entities
The collected entities are those that correspond to the ones in target_structure.
em: EntityMapping
- caosadvancedtools.structure_mapping.update_matched_entity(em, updating, target_record, existing_record)
update the Record existing in the server according to the Record supplied as target_record
- caosadvancedtools.structure_mapping.update_structure(em, updating: Container, target_structure: Record)
compare the existing records with the target record tree created from the h5 object
- Parameters:
existing_structure – retrieved entity; e.g. the top level identifiable
target_structure (db.Record) – A record which may have references to other records. Must be a DAG.
caosadvancedtools.suppressKnown module
- class caosadvancedtools.suppressKnown.SuppressKnown(db_file=None)
Bases:
Filter
This filter allows to suppress log messages that were shown before.
The python logging module can be used as normal. This Filter needs to be added to the appropriate Logger and logging calls (e.g. to warning, info etc.) need to have an additional
extra
argument. This argument should be a dict that contains an identifier and a category.Example:
extra={"identifier":"<Record>something</Record>", category="entities"}
The identifier is used to check whether a message was shown before and should be a string. The category can be used to remove a specific group of messages from memory and the logger would show those messages again even when they are known.
- create_cache()
- filter(record)
Return whether the record shall be logged.
If either identifier of category is missing 1 is returned (logging enabled). If the record has both attributes, it is checked whether the combination was shown before (was_tagged). If so 0 is returned. Otherwise the combination is saved and 1 is returned
- hash(txt, identifier)
- reset(category)
- tag_msg(txt, identifier, category)
- was_tagged(digest)
caosadvancedtools.table_converter module
- caosadvancedtools.table_converter.from_table(spreadsheet, recordtype)
parses a pandas DataFrame to a list of records
- caosadvancedtools.table_converter.from_tsv(filename, recordtype)
parses a tsv file to a list of records
- caosadvancedtools.table_converter.generate_property_name(prop)
- caosadvancedtools.table_converter.main()
- caosadvancedtools.table_converter.to_table(container)
Create a table from the records in a container.
- caosadvancedtools.table_converter.to_tsv(filename, container)
caosadvancedtools.table_export module
Collect optional and mandatory data from LinkAhead records and prepare them for an export as a table, e.g., for the export to metadata repositories.
- class caosadvancedtools.table_export.BaseTableExporter(export_dict, record=None, raise_error_if_missing=False)
Bases:
object
Base exporter class from which all actual implementations inherit. It contains the basic structure with a dictionary for optional and mandatory keys, and the error handling. The actual logic for finding the values to the entries has to be implemented elsewhere. The final results are stored in the
info
dict.- collect_information()
Use the items of
export_dict
to collect the information for the export.
- prepare_csv_export(delimiter=',', print_header=False, skip_empty_optionals=False)
Return the values in self.info as a single-line string, separated by the delimiter. If header is true, a header line with the names of the entries, separated by the same delimiter is added. Header and body are separated by a newline character.
- Parameters:
delimiter (string, optional) – symbol that separates two consecutive entries, e.g. ‘,’ for .csv or ‘ ‘ for .tsv. Default is ‘,’.
print_header (bool, optional) – specify whether a header line with all entry names separated by the delimiter precedes the body. Default is False.
skip_empty_optionals (bool, True) – if this is true, optional entries without value will be skipped in the output string. Otherwise an empty field will be attached. Default is False.
- Raises:
TableExportError: – if mandatory entries are missing a value
- Returns:
a single string, either only the body line, or header and body separated by a newline character if header is True.
- Return type:
string
- exception caosadvancedtools.table_export.TableExportError(msg)
Bases:
LinkAheadException
Error that is raised in case of failing export, e.g., because of missing mandatory entries.
caosadvancedtools.table_importer module
This module allows to read table files like tsv and xls. They are converted to a Pandas DataFrame and checked whether they comply with the rules provided. For example, a list of column names that have to exist can be provided.
This module also implements some converters that can be applied to cell entries.
Those converters can also be used to apply checks on the entries.
- class caosadvancedtools.table_importer.CSVImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)
Bases:
TableImporter
- read_file(filename, sep=',', **kwargs)
- class caosadvancedtools.table_importer.TSVImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)
Bases:
CSVImporter
- read_file(filename, **kwargs)
- class caosadvancedtools.table_importer.TableImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)
Bases:
object
Abstract base class for importing data from tables.
- check_columns(df, filename=None)
Check whether all required columns exist.
Required columns are columns for which converters are defined.
- Raises:
- check_dataframe(df, filename=None, strict=False)
Check if the dataframe conforms to the restrictions.
Checked restrictions are: Columns, data types, uniqueness requirements.
- Parameters:
df (pandas.DataFrame) – The dataframe to be checked.
filename (string, optional) – The file name, only used for output in case of problems.
strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.
- check_datatype(df, filename=None, strict=False)
Check for each column whether non-null fields have the correct datatype.
Note
If columns are integer, but should be float, this method converts the respective columns in place. The same for columns that should have string value but have numeric value.
- Parameters:
strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.
- check_missing(df, filename=None)
Check in each row whether obligatory fields are empty or null.
Rows that have missing values are removed.
- Returns:
out – The input DataFrame with incomplete rows removed.
- Return type:
pandas.DataFrame
- check_unique(df, filename=None)
Check whether value combinations that shall be unique for each row are unique.
If a second row is found, that uses the same combination of values as a previous one, the second one is removed.
- read_file(filename, **kwargs)
- class caosadvancedtools.table_importer.XLSImporter(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None, convert_int_to_nullable_int=True)
Bases:
TableImporter
- read_file(filename, **kwargs)
- read_xls(filename, **kwargs)
Convert an xls file into a Pandas DataFrame.
The converters of the XLSImporter object are used.
Raises: DataInconsistencyError
- caosadvancedtools.table_importer.assure_name_format(name)
checks whether a string can be interpreted as ‘LastName, FirstName’
- caosadvancedtools.table_importer.check_reference_field(ent_id, recordtype)
- caosadvancedtools.table_importer.date_converter(val, fmt='%Y-%m-%d')
if the value is already a datetime, it is returned otherwise it converts it using format string
- caosadvancedtools.table_importer.datetime_converter(val, fmt='%Y-%m-%d %H:%M:%S')
if the value is already a datetime, it is returned otherwise it converts it using format string
- caosadvancedtools.table_importer.incomplete_date_converter(val, fmts=None)
if the value is already a datetime, it is returned otherwise it converts it using format string
- caosadvancedtools.table_importer.string_in_list(val, options, ignore_case=True)
Return the given value if it is contained in options, raise an error otherwise.
- Parameters:
- Returns:
val – The original value if it is contained in options
- Return type:
- Raises:
ValueError – If val is not contained in options.
- caosadvancedtools.table_importer.win_path_converter(val)
checks whether the value looks like a windows path and converts it to posix
- caosadvancedtools.table_importer.win_path_list_converter(val)
checks whether the value looks like a list of windows paths and converts it to posix paths
- caosadvancedtools.table_importer.yes_no_converter(val)
converts a string to True or False if possible.
Allowed filed values are yes and no.
caosadvancedtools.utils module
- caosadvancedtools.utils.check_win_path(path: str, filename: str = None)
check whether ‘/’ are in the path but no ‘’.
If that is the case, it is likely, that the path is not a Windows path.
- caosadvancedtools.utils.create_entity_link(entity: Entity, base_url: str = '')
creates a string that contains the code for an html link to the provided entity.
The text of the link is the entity name if one exists and the id otherwise.
- caosadvancedtools.utils.find_records_that_reference_ids(referenced_ids, rt='', step_size=50)
Returns a list with ids of records that reference entities with supplied ids
Sometimes a file or folder will be referenced in a README.md (e.g. in an Analysis) but not those files shall be referenced but the corresponding object (e.g. the Experiment). Thus the ids of all Records (of a suitable type) are collected that reference one or more of the supplied ids. This is done in chunks as the ids are passed in the header of the http request.
- caosadvancedtools.utils.get_referenced_files(glob: str, prefix: str = None, filename: str = None, location: str = None)
queries the database for files referenced by the provided glob
- Parameters:
glob (str) – the glob referencing the file(s)
prefix (str, optional) – the glob can be relative to some path, in that case that path needs to be given as prefix
filename (str, optional) – the file in which the glob is given (used for error messages)
location (str, optional) – the location in the file in which the glob is given (used for error messages)
- caosadvancedtools.utils.read_field_as_list(field)
E.g. in yaml headers entries can be single values or list. To simplify the work with those values, this function puts single values in a list.
- caosadvancedtools.utils.replace_path_prefix(path, old_prefix, new_prefix)
Replaces the prefix old_prefix in path with new_prefix.
Raises a RuntimeError when the path does not start with old_prefix.
- caosadvancedtools.utils.return_field_or_property(value, prop=None)
returns value itself of a property.
Typical in yaml headers is that a field might sometimes contain a single value and other times a dict itself. This function either returns the single value or (in case of dict as value) a value of the dict.
- caosadvancedtools.utils.set_log_level(level=10)
- caosadvancedtools.utils.string_to_person(person)
Creates a Person Record from a string.
The following formats are supported: -
<Firstname>
-<Lastname(s)>,
The part after the name can be used for an affiliation for example.
caosadvancedtools.webui_formatter module
- class caosadvancedtools.webui_formatter.WebUI_Formatter(*args, full_file=None, **kwargs)
Bases:
Formatter
allows to make logging to be nicely displayed in the WebUI
You can enable this as follows: logger = logging.getLogger(“<LoggerName>”) formatter = WebUI_Formatter(full_file=”path/to/file”) handler = logging.Handler() handler.setFormatter(formatter) logger.addHandler(handler)
- format(record)
Return the HTML formatted log record for display on a website.
This essentially wraps the text formatted by the parent class in html.
- Parameters:
record
- Raises:
RuntimeError – If the log level of the record is not supported. Supported log levels include logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, and logging.CRITICAL.
- Returns:
The formatted log record.
- Return type: