caosadvancedtools package
Subpackages
- caosadvancedtools.bloxberg package
- caosadvancedtools.cfoods package
- caosadvancedtools.converter package
- caosadvancedtools.models package
- caosadvancedtools.scifolder package
- Submodules
- caosadvancedtools.scifolder.analysis_cfood module
- caosadvancedtools.scifolder.experiment_cfood module
- caosadvancedtools.scifolder.generic_pattern module
- caosadvancedtools.scifolder.publication_cfood module
- caosadvancedtools.scifolder.result_table_cfood module
- caosadvancedtools.scifolder.simulation_cfood module
- caosadvancedtools.scifolder.software_cfood module
- caosadvancedtools.scifolder.utils module
- caosadvancedtools.scifolder.withreadme module
- Module contents
- caosadvancedtools.serverside package
Submodules
caosadvancedtools.cache module
-
class
caosadvancedtools.cache.
AbstractCache
(db_file=None, force_creation=False) Bases:
abc.ABC
-
check_cache
() Check whether the cache in db file self.db_file exists and conforms to the latest database schema.
If it does not exist, it will be created using the newest database schema.
If it exists, but the schema is outdated, an exception will be raised.
-
abstract
create_cache
() Provide an overloaded function here that creates the cache in the most recent version.
-
abstract
get_cache_schema_version
() A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
-
get_cache_version
() Return the version of the cache stored in self.db_file. The version is stored as the only entry in colum schema of table version.
-
abstract
get_default_file_name
() Supply a default file name for the cache here.
-
run_sql_commands
(commands, fetchall=False) Run a list of SQL commands on self.db_file.
commands: list of sql commands (tuples) to execute fetchall: When True, run fetchall as last command and return the results.
Otherwise nothing is returned.
-
-
class
caosadvancedtools.cache.
Cache
(*args, **kwargs)
-
class
caosadvancedtools.cache.
IdentifiableCache
(db_file=None, force_creation=False) Bases:
caosadvancedtools.cache.AbstractCache
stores identifiables (as a hash of xml) and their respective ID.
This allows to retrieve the Record corresponding to an indentifiable without querying.
-
check_existing
(ent_hash) Check the cache for a hash.
ent_hash: The hash to search for.
Return the ID and the version ID of the hashed entity. Return None if no entity with that hash is in the cache.
-
create_cache
() Create a new SQLITE cache file in self.db_file.
Two tables will be created: - identifiables is the actual cache. - version is a table with version information about the cache.
-
get_cache_schema_version
() A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
-
get_default_file_name
() Supply a default file name for the cache here.
-
static
hash_entity
(ent) Format an entity as “pretty” XML and return the SHA256 hash.
-
insert
(ent_hash, ent_id, ent_version) Insert a new cache entry.
ent_hash: Hash of the entity. Should be generated with Cache.hash_entity ent_id: ID of the entity ent_version: Version string of the entity
-
insert_list
(hashes, entities) Insert the ids of entities into the cache
The hashes must correspond to the entities in the list
-
update_ids_from_cache
(entities) sets ids of those entities that are in cache
A list of hashes corresponding to the entities is returned
-
validate_cache
(entities=None) Runs through all entities stored in the cache and checks whether the version still matches the most recent version. Non-matching entities will be removed from the cache.
- entities: When set to a db.Container or a list of Entities
the IDs from the cache will not be retrieved from the CaosDB database, but the versions from the cache will be checked against the versions contained in that collection. Only entries in the cache that have a corresponding version in the collection will be checked, all others will be ignored. Useful for testing.
Return a list of invalidated entries or an empty list if no elements have been invalidated.
-
-
class
caosadvancedtools.cache.
UpdateCache
(db_file=None, force_creation=False) Bases:
caosadvancedtools.cache.AbstractCache
stores unauthorized inserts and updates
If the Guard is set to a mode that does not allow an insert or update, the insert or update can be stored in this cache such that it can be authorized and performed later.
-
create_cache
() initialize the cache
-
get
(run_id, querystring) returns the pending updates for a given run id
run_id: the id of the crawler run querystring: the sql query
-
get_cache_schema_version
() A method that has to be overloaded that sets the version of the SQLITE database schema. The schema is saved in table version column schema.
Increase this variable, when changes to the cache tables are made.
-
get_default_file_name
() Supply a default file name for the cache here.
-
get_inserts
(run_id) returns the pending updates for a given run id
run_id: the id of the crawler run
-
static
get_previous_version
(cont) Retrieve the current, unchanged version of the entities that shall be updated, i.e. the version before the update
-
get_updates
(run_id) returns the pending updates for a given run id
run_id: the id of the crawler run
-
insert
(cont, run_id, insert=False) Insert a pending, unauthorized insert or update
-
-
caosadvancedtools.cache.
cleanXML
(xml)
-
caosadvancedtools.cache.
get_pretty_xml
(cont)
-
caosadvancedtools.cache.
put_in_container
(stuff)
caosadvancedtools.cfood module
Defines how something that shall be inserted into CaosDB is treated.
CaosDB can automatically be filled with Records based on some structure, a file structure, a table or similar. The Crawler will iterate over the respective items and test for each item whether a CFood class exists that matches the file path, i.e. whether CFood class wants to treat that pariticular item. If one does, it is instanciated to treat the match. This occurs in basically three steps: 1. Create a list of identifiables, i.e. unique representation of CaosDB Records (such as an experiment belonging to a project and a date/time). 2. The identifiables are either found in CaosDB or they are created. 3. The identifiables are update based on the date in the file structure.
-
class
caosadvancedtools.cfood.
AbstractCFood
(item) Bases:
object
Abstract base class for Crawler food (CFood).
-
attach
(item)
-
collect_information
() The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
-
abstract
create_identifiables
() should set the instance variable Container with the identifiables
-
looking_for
(item) returns True if item can be added to this CFood.
Typically a CFood exists for a file and defines how to deal with the file. However, sometimes additional files “belong” to a CFood. E.g. an experiment CFood might match against a README file but labnotes.txt also shall be treated by the cfood (and not a special cfood created for labnotes.txt) This function can be used to define what files shall be ‘attached’.
To be overwritten by subclasses
-
classmethod
match_item
(item) Matches an item found by the crawler against this class. Returns True if the item shall be treated by this class, i.e. if this class matches the item.
- Parameters
item (object) – iterated by the crawler
be overwritten by subclasses! (To) –
-
static
remove_property
(entity, prop)
-
static
set_parents
(entity, names)
-
static
set_property
(entity, prop, value, datatype=None)
-
abstract
update_identifiables
() Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
-
-
class
caosadvancedtools.cfood.
AbstractFileCFood
(crawled_path, *args, **kwargs) Bases:
caosadvancedtools.cfood.AbstractCFood
-
property
crawled_file
-
classmethod
get_re
() Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
-
looking_for
(crawled_file) returns True if crawled_file can be added to this CFood.
Typically a CFood exists for a file and defines how to deal with the file. However, sometimes additional files “belong” to a CFood. E.g. an experiment CFood might match against a README file but labnotes.txt also shall be treated by the cfood (and not a special cfood created for labnotes.txt) This function can be used to define what files shall be ‘attached’.
-
classmethod
match_item
(path) Matches the regular expression of this class against file names
- Parameters
path (str) – The path of the file that shall be matched.
-
static
re_from_extensions
(extensions) Return a regular expression which matches the given file extensions.
Useful for inheriting classes.
- Parameters
extensions (iterable<str>) – An iterable with the allowed extensions.
- Returns
out – The regular expression, starting with
.*\.
and ending with the EOL dollar character. The actual extension will be accessible in the :py:attribute:`pattern group name<python:re.Pattern.groupindexe>`ext
.- Return type
-
property
-
class
caosadvancedtools.cfood.
CMeal
Bases:
object
CMeal groups equivalent items and allow their collected insertion.
Sometimes there is no one item that can be used to trigger the creation of some Record. E.g. if a collection of image files shall be referenced from one Record that groups them, it is unclear which image should trigger the creation of the Record.
CMeals are grouped based on the groups in the used regular expression. If, in the above example, all the images reside in one folder, all groups of the filename match except that for the file name should match. The groups that shall match need to be listed in the matching_groups class property. Subclasses will overwrite this property.
This allows to use has_suitable_cfood in the match_item function of a CFood to check whether the necessary CFood was already created. In order to allow this all instances of a CFood class are tracked in the existing_instances class member.
Subclasses must have a cls.get_re function and a match member variable (see AbstractFileCFood)
-
classmethod
all_groups_equal
(m1, m2)
-
belongs_to_meal
(item)
-
existing_instances
= []
-
static
get_re
()
-
classmethod
has_suitable_cfood
(item) checks whether the required cfood object already exists.
item : the crawled item
-
matching_groups
= []
-
classmethod
-
class
caosadvancedtools.cfood.
FileGuide
Bases:
object
-
access
(path) should be replaced by a function that adds a prefix to paths to allow to access caosdb files locally
This default just returns the unchanged path.
-
-
class
caosadvancedtools.cfood.
RowCFood
(item, unique_cols, recordtype, **kwargs) Bases:
caosadvancedtools.cfood.AbstractCFood
-
create_identifiables
() should set the instance variable Container with the identifiables
-
update_identifiables
() Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
-
-
caosadvancedtools.cfood.
add_files
(filemap) add to the file cache
-
caosadvancedtools.cfood.
assure_has_description
(entity, description, to_be_updated=None, force=False) Checks whether entity has the description that is passed.
If this is the case this function ends. Otherwise the entity is assigned a new description. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
-
caosadvancedtools.cfood.
assure_has_parent
(entity, parent, to_be_updated=None, force=False, unique=True) Checks whether entity has a parent with name parent.
If this is the case this function ends. Otherwise the entity is assigned a new parent. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
-
caosadvancedtools.cfood.
assure_has_property
(entity, name, value, to_be_updated=None, datatype=None, setproperty=False) Checks whether entity has a property name with the value value.
If this is the case this function ends. Otherwise the entity is assigned a new parent.
Note that property matching occurs based on names.
If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
setproperty: boolean, if True, overwrite existing properties.
-
caosadvancedtools.cfood.
assure_name_is
(entity, name, to_be_updated=None, force=False) Checks whether entity has the name that is passed.
If this is the case this function ends. Otherwise the entity is assigned a new name. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
-
caosadvancedtools.cfood.
assure_object_is_in_list
(obj, containing_object, property_name, to_be_updated=None, datatype=None) Checks whether obj is one of the values in the list property property_name of the supplied entity containing_object.
If this is the case this function returns. Otherwise the entity is added to the property property_name and the entity containing_object is added to the supplied list to_be_updated in order to indicate, that the entity containing_object should be updated. If none is submitted the update will be conducted in-place.
If the property is missing, it is added first and then the entity is added/updated.
If obj is a list, every element is added
-
caosadvancedtools.cfood.
assure_parents_are
(entity, parents, to_be_updated=None, force=False, unique=True) Checks whether entity has the provided parents (and only those).
If this is the case this function ends. Otherwise the entity is assigned the new parents and the old ones are discarded.
Note that parent matching occurs based on names.
If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
parents: single string or list of strings
-
caosadvancedtools.cfood.
assure_property_is
(entity, name, value, datatype=None, to_be_updated=None, force=False) Checks whether entity has a Property name with the given value.
If this is the case this function ends. Otherwise the entity is assigned a new property or an existing one is updated.
If the list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
-
caosadvancedtools.cfood.
assure_special_is
(entity, value, kind, to_be_updated=None, force=False) Checks whether entity has the name or description that is passed.
If this is the case this function ends. Otherwise the entity is assigned a new name. The list to_be_updated is supplied, the entity is added to the list in order to indicate, that the entity entity should be updated. Otherwise it is directly updated
-
caosadvancedtools.cfood.
get_entity
(name) Returns the entity with a given name, preferably from a local cache.
If the local cache does not contain the entity, retrieve it from CaosDB.
-
caosadvancedtools.cfood.
get_entity_for_path
(path)
-
caosadvancedtools.cfood.
get_ids_for_entities_with_names
(entities)
-
caosadvancedtools.cfood.
get_property
(name) Returns the property with a given name, preferably from a local cache.
If the local cache does not contain the record type, try to retrieve it from CaosDB. If it does not exist, see whether it could be a record type used as a property.
-
caosadvancedtools.cfood.
get_record
(name) Returns the record with a given name, preferably from a local cache.
If the local cache does not contain the record, try to retrieve it from CaosDB.
-
caosadvancedtools.cfood.
get_recordtype
(name) Returns the record type with a given name, preferably from a local cache.
If the local cache does not contain the record type, try to retrieve it from CaosDB. If it does not exist, add it to the data model problems
-
caosadvancedtools.cfood.
insert_id_based_on_name
(entity)
caosadvancedtools.collect_datamodel module
caosadvancedtools.crawler module
Crawls a file structure and inserts Records into CaosDB based on what is found.
CaosDB can automatically be filled with Records based on some file structure. The Crawler will iterate over the files and test for each file whether a CFood exists that matches the file path. If one does, it is instanciated to treat the match. This occurs in basically three steps: 1. create a list of identifiables, i.e. unique representation of CaosDB Records (such as an experiment belonging to a project and a date/time) 2. the identifiables are either found in CaosDB or they are created. 3. the identifiables are update based on the date in the file structure
-
class
caosadvancedtools.crawler.
Crawler
(cfood_types, use_cache=False, abort_on_exception=True, interactive=True, hideKnown=False, debug_file=None, cache_file=None) Bases:
object
-
check_matches
(matches)
-
collect_cfoods
() This is the first phase of the crawl. It collects all cfoods that shall be processed. The second phase is iterating over cfoods and updating CaosDB. This separate first step is necessary in order to allow a single cfood being influenced by multiple crawled items. E.g. the FileCrawler can have a single cfood treat multiple files.
This is a very basic implementation and this function should be overwritten by subclasses.
The basic structure of this function should be, that what ever is being processed is iterated and each cfood is checked whether the item ‘matches’. If it does, a cfood is instantiated passing the item as an argument. The match can depend on the cfoods already being created, i.e. a file migth no longer match because it is already treaded by an earlier cfood.
should return cfoods, tbs and errors_occured. # TODO do this via logging? tbs text returned from traceback errors_occured True if at least one error occured
-
crawl
(security_level=0, path=None)
-
static
create_query_for_identifiable
(ident) uses the properties of ident to create a query that can determine whether the required record already exists.
-
static
find_existing
(entity) searches for an entity that matches the identifiable in CaosDB
Characteristics of the identifiable like, properties, name or id are used for the match.
-
static
find_or_insert_identifiables
(identifiables) Sets the ids of identifiables (that do not have already an id from the cache) based on searching CaosDB and retrieves those entities. The remaining entities (those which can not be retrieved) have no correspondence in CaosDB and are thus inserted.
-
iteritems
() generates items to be crawled with an index
-
static
save_form
(changes, path, run_id) Saves an html website to a file that contains a form with a button to authorize the given changes.
The button will call the crawler with the same path that was used for the current run and with a parameter to authorize the changes of the current run.
changes: The CaosDB entities in the version after the update. path: the path defining the subtree that is crawled
-
static
send_mail
(changes, filename) calls sendmail in order to send a mail to the curator about pending changes
changes: The CaosDB entities in the version after the update. filename: path to the html site that allow the authorization
execute the pending updates of a specific run id.
This should be called if the updates of a certain run were authorized.
run_id: the id of the crawler run
-
-
class
caosadvancedtools.crawler.
FileCrawler
(files, **kwargs) Bases:
caosadvancedtools.crawler.Crawler
-
iteritems
() generates items to be crawled with an index
-
static
query_files
(path)
-
-
class
caosadvancedtools.crawler.
TableCrawler
(table, unique_cols, recordtype, **kwargs) Bases:
caosadvancedtools.crawler.Crawler
-
iteritems
() generates items to be crawled with an index
-
-
caosadvancedtools.crawler.
apply_list_of_updates
(to_be_updated, update_flags={}, update_cache=None, run_id=None) Updates the to_be_updated Container, i.e., pushes the changes to CaosDB after removing possible duplicates. If a chace is provided, uauthorized updates can be cached for further authorization.
- to_be_updateddb.Container
Container with the entities that will be updated.
- update_flagsdict, optional
Dictionary of CaosDB server flags that will be used for the update. Default is an empty dict.
- update_cacheUpdateCache or None, optional
Cache in which the intended updates will be stored so they can be authorized afterwards. Default is None.
- run_idString or None, optional
Id with which the pending updates are cached. Only meaningful if update_cache is provided. Default is None.
-
caosadvancedtools.crawler.
get_value
(prop) Returns the value of a Property
- Parameters
prop (The property of which the value shall be returned.) –
- Returns
out
- Return type
The value of the property; if the value is an entity, its ID.
-
caosadvancedtools.crawler.
separated
(text)
caosadvancedtools.datainconsistency module
Implements an error to be used when there is a problem with the data to be read. I.e. something that users of CaosDB need to fix.
-
exception
caosadvancedtools.datainconsistency.
DataInconsistencyError
Bases:
ValueError
caosadvancedtools.datamodel_problems module
Implements a class for finding and storing missing entities, either record types or properties, that are missing in a data model. They can be inserted by hand or gueesed from possible exceptions when inserting or updating entities with missing parents and/or properties.
-
class
caosadvancedtools.datamodel_problems.
DataModelProblems
Bases:
object
Collect and store missing RecordTypes and Properties.
-
static
add
(ent) Add a missing record type or property.
-
static
evaluate_exception
(e) Take a TransactionError, see whether it was caused by datamodel problems, and update missing parents and/or properties if this was the case. Afterwards, raise the exception.
- Parameters
e (TransactionError) – TransactionError, the children of which are checked for possible datamodel problems.
-
missing
= {}
-
static
caosadvancedtools.example_cfood module
-
class
caosadvancedtools.example_cfood.
ExampleCFood
(crawled_path, *args, **kwargs) Bases:
caosadvancedtools.cfood.AbstractFileCFood
-
create_identifiables
() should set the instance variable Container with the identifiables
-
classmethod
get_re
() Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
-
update_identifiables
() Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
-
caosadvancedtools.guard module
caosadvancedtools.import_from_xml module
This file allows to import a dataset stored in a xml representation and corresponding files.
The export should have been done with export_related.py
-
caosadvancedtools.import_from_xml.
create_dummy_file
(text='Please ask the administrator for this file.')
-
caosadvancedtools.import_from_xml.
defineParser
()
-
caosadvancedtools.import_from_xml.
import_xml
(filename, rerun=False, interactive=True) filename: path to the xml file with the data to be inserted rerun: boolean; if true, files are not inserted as paths would conflict.
caosadvancedtools.loadFiles module
-
caosadvancedtools.loadFiles.
combine_ignore_files
(caosdbignore, localignore, dirname=None) appends the contents of localignore to caosdbignore and saves the result and returns the name
-
caosadvancedtools.loadFiles.
compile_file_list
(caosdbignore, localpath) creates a list of files that contain all files under localpath except those excluded by caosdbignore
-
caosadvancedtools.loadFiles.
convert_size
(size)
-
caosadvancedtools.loadFiles.
create_re_for_file_list
(files, localroot, remoteroot) creates a regular expression that matches file paths contained in the files argument and all parent directories. The prefix localroot is replaced by the prefix remoteroot.
-
caosadvancedtools.loadFiles.
loadpath
(path, include, exclude, prefix, dryrun, forceAllowSymlinks, caosdbignore=None, localpath=None)
-
caosadvancedtools.loadFiles.
main
(argv=None) Command line options.
caosadvancedtools.pandoc_header_tools module
-
exception
caosadvancedtools.pandoc_header_tools.
MetadataFileMissing
(filename, *args, **kwargs) Bases:
Exception
-
exception
caosadvancedtools.pandoc_header_tools.
NoValidHeader
(filename, *args, **kwargs) Bases:
Exception
-
exception
caosadvancedtools.pandoc_header_tools.
ParseErrorsInHeader
(filename, reason, *args, **kwargs) Bases:
Exception
-
caosadvancedtools.pandoc_header_tools.
add_header
(filename, header_dict=None) Add a header to an md file.
If the file does not exist it will be created.
If header_dict is a dictionary and not None the header will be created based on the keys and values of that dictionary.
-
caosadvancedtools.pandoc_header_tools.
clean_header
(header)
-
caosadvancedtools.pandoc_header_tools.
get_header
(filename, add_header=False) Open an md file identified by filename and read out the yaml header.
filename can also be a folder. In this case folder/README.md will be used for getting the header.
If a header is found a tuple is returned: (first yaml header line index, last+1 yaml header line index, header)
Otherwise, if add_header is True, a header is added and the function is called again.
The header is normalized in the following way:
If the value to a key is a string, a list with that string as only element is returned.
From https://pandoc.org/MANUAL.html:
A YAML metadata block is a valid YAML object, delimited by a line of three hyphens (—) at the top and a line of three hyphens (—) or three dots (…) at the bottom. A YAML metadata block may occur anywhere in the document, but if it is not at the beginning, it must be preceded by a blank line.
-
caosadvancedtools.pandoc_header_tools.
kw_present
(header, kw) Check whether keywords are present in the header.
-
caosadvancedtools.pandoc_header_tools.
save_header
(filename, header_data) Save a header identified by the tuple header_data to the file identified by filename.
filename can also be a folder. In this case folder/README.md will be used for getting the header.
caosadvancedtools.structure_mapping module
-
class
caosadvancedtools.structure_mapping.
EntityMapping
Bases:
object
map local entities to entities on the server
the dict to_existing maps _cuid property to entity objects the dict to_target maps id property to entity objects
-
add
(target, existing)
-
-
caosadvancedtools.structure_mapping.
collect_existing_structure
(target_structure, existing_root, em) recursively collects existing entities
The collected entities are those that correspond to the ones in target_structure.
em: EntityMapping
-
caosadvancedtools.structure_mapping.
update_matched_entity
(em, updating, target_record, existing_record) update the Record existing in the server according to the Record supplied as target_record
-
caosadvancedtools.structure_mapping.
update_structure
(em, updating: caosdb.common.models.Container, target_structure: caosdb.common.models.Record) compare the existing records with the target record tree created from the h5 object
- Parameters
existing_structure – retrieved entity; e.g. the top level identifiable
target_structure (db.Record) – A record which may have references to other records. Must be a DAG.
caosadvancedtools.suppressKnown module
-
class
caosadvancedtools.suppressKnown.
SuppressKnown
(db_file=None) Bases:
logging.Filter
This filter allows to suppress log messages that were shown before.
The python logging module can be used as normal. This Filter needs to be added to the appropriate Logger and logging calls (e.g. to warning, info etc.) need to have an additional extra argument. This argument should be a dict that contains an identifier and a category. Example: `extra={“identifier”:”<Record>something</Record>”,
category=”entities”}
The identifier is used to check whether a message was shown before and should be a string. The category can be used to remove a specific group of messages from memory and the logger would show those messages again even when they are known.
-
create_cache
()
-
filter
(record) Return whether the record shall be logged.
If either identifier of category is missing 1 is returned (logging enabled). If the record has both attributes, it is checked whether the combination was shown before (was_tagged). If so 0 is returned. Otherwise the combination is saved and 1 is returned
-
hash
(txt, identifier)
-
reset
(category)
-
tag_msg
(txt, identifier, category)
-
was_tagged
(digest)
-
caosadvancedtools.table_converter module
-
caosadvancedtools.table_converter.
from_table
(spreadsheet, recordtype) parses a pandas DataFrame to a list of records
-
caosadvancedtools.table_converter.
from_tsv
(filename, recordtype) parses a tsv file to a list of records
-
caosadvancedtools.table_converter.
generate_property_name
(prop)
-
caosadvancedtools.table_converter.
to_table
(container) creates a table from the records in a container
-
caosadvancedtools.table_converter.
to_tsv
(filename, container)
caosadvancedtools.table_export module
Collect optional and mandatory data from CaosDB records and prepare them for an export as a table, e.g., for the export to metadata repositories.
-
class
caosadvancedtools.table_export.
BaseTableExporter
(export_dict, record=None, raise_error_if_missing=False) Bases:
object
Base exporter class from which all actual implementations inherit. It contains the basic structure with a dictionary for optional and mandatory keys, and the error handling. The actual logic for finding the values to the entries has to be implemented elsewhere. The final results are stored in the info dict.
-
collect_information
() Use the items of export_dict to collect the information for the export.
-
prepare_csv_export
(delimiter=',', print_header=False, skip_empty_optionals=False) Return the values in self.info as a single-line string, separated by the delimiter. If header is true, a header line with the names of the entries, separated by the same delimiter is added. Header and body are separated by a newline character.
- Parameters
delimiter (string, optional) – symbol that separates two consecutive entries, e.g. ‘,’ for .csv or ‘ ‘ for .tsv. Default is ‘,’.
print_header (bool, optional) – specify whether a header line with all entry names separated by the delimiter precedes the body. Default is False.
skip_empty_optionals (bool, True) – if this is true, optional entries without value will be skipped in the output string. Otherwise an empty field will be attached. Default is False.
- Raises
TableExportError: – if mandatory entries are missing a value
- Returns
a single string, either only the body line, or header and body separated by a newline character if header is True.
- Return type
string
-
-
exception
caosadvancedtools.table_export.
TableExportError
(msg) Bases:
caosdb.exceptions.CaosDBException
Error that is raised in case of failing export, e.g., because of missing mandatory entries.
caosadvancedtools.table_importer module
This module allows to read table files like tsv and xls. They are converted to a Pandas DataFrame and checked whether they comply with the rules provided. For example, a list of column names that have to exist can be provided.
This module also implements some converters that can be applied to cell entries.
Those converters can also be used to apply checks on the entries.
-
class
caosadvancedtools.table_importer.
CSVImporter
(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None) Bases:
caosadvancedtools.table_importer.TableImporter
-
read_file
(filename, sep=',', **kwargs)
-
-
class
caosadvancedtools.table_importer.
TSVImporter
(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None) Bases:
caosadvancedtools.table_importer.TableImporter
-
read_file
(filename, **kwargs)
-
-
class
caosadvancedtools.table_importer.
TableImporter
(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None) Bases:
object
Abstract base class for importing data from tables.
-
check_columns
(df, filename=None) Check whether all required columns exist.
Required columns are columns for which converters are defined.
- Raises
-
check_dataframe
(df, filename=None, strict=False) Check if the dataframe conforms to the restrictions.
Checked restrictions are: Columns, data types, uniqueness requirements.
- Parameters
df (pandas.DataFrame) – The dataframe to be checked.
filename (string, optional) – The file name, only used for output in case of problems.
strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.
-
check_datatype
(df, filename=None, strict=False) Check for each column whether non-null fields have the correct datatype.
Note
If columns are integer, but should be float, this method converts the respective columns in place.
- Parameters
strict (boolean, optional) – If False (the default), try to convert columns, otherwise raise an error.
-
check_missing
(df, filename=None) Check in each row whether obligatory fields are empty or null.
Rows that have missing values are removed.
- Returns
out – The input DataFrame with incomplete rows removed.
- Return type
pandas.DataFrame
-
check_unique
(df, filename=None) Check whether value combinations that shall be unique for each row are unique.
If a second row is found, that uses the same combination of values as a previous one, the second one is removed.
-
read_file
(filename, **kwargs)
-
-
class
caosadvancedtools.table_importer.
XLSImporter
(converters, obligatory_columns=None, unique_keys=None, datatypes=None, existing_columns=None) Bases:
caosadvancedtools.table_importer.TableImporter
-
read_file
(filename, **kwargs)
-
read_xls
(filename, **kwargs) Convert an xls file into a Pandas DataFrame.
The converters of the XLSImporter object are used.
Raises: DataInconsistencyError
-
-
caosadvancedtools.table_importer.
assure_name_format
(name) checks whether a string can be interpreted as ‘LastName, FirstName’
-
caosadvancedtools.table_importer.
check_reference_field
(ent_id, recordtype)
-
caosadvancedtools.table_importer.
date_converter
(val, fmt='%Y-%m-%d') if the value is already a datetime, it is returned otherwise it converts it using format string
-
caosadvancedtools.table_importer.
datetime_converter
(val, fmt='%Y-%m-%d %H:%M:%S') if the value is already a datetime, it is returned otherwise it converts it using format string
-
caosadvancedtools.table_importer.
incomplete_date_converter
(val, fmts={'%Y': '%Y', '%Y-%m': '%Y-%m', '%Y-%m-%d': '%Y-%m-%d'}) if the value is already a datetime, it is returned otherwise it converts it using format string
-
caosadvancedtools.table_importer.
string_in_list
(val, options, ignore_case=True) Return the given value if it is contained in options, raise an error otherwise.
- Parameters
- Returns
val – The original value if it is contained in options
- Return type
- Raises
ValueError – If val is not contained in options.
-
caosadvancedtools.table_importer.
win_path_converter
(val) checks whether the value looks like a windows path and converts it to posix
-
caosadvancedtools.table_importer.
win_path_list_converter
(val) checks whether the value looks like a list of windows paths and converts it to posix paths
-
caosadvancedtools.table_importer.
yes_no_converter
(val) converts a string to True or False if possible.
Allowed filed values are yes and no.
caosadvancedtools.utils module
-
caosadvancedtools.utils.
check_win_path
(path, filename=None) check whether ‘/’ are in the path but no ‘’.
If that is the case, it is likely, that the path is not a Windows path.
Parameters: path: path to be checked filename: if the path is located in a file, this parameter can be used to
direct the user to the file where the path is located.
-
caosadvancedtools.utils.
create_entity_link
(entity: caosdb.common.models.Entity, base_url: str = '') creates a string that contains the code for an html link to the provided entity.
The text of the link is the entity name if one exists and the id otherwise.
-
caosadvancedtools.utils.
find_records_that_reference_ids
(referenced_ids, rt='', step_size=50) Returns a list with ids of records that reference entities with supplied ids
Sometimes a file or folder will be referenced in a README.md (e.g. in an Analysis) but not those files shall be referenced but the corresponding object (e.g. the Experiment). Thus the ids of all Records (of a suitable type) are collected that reference one or more of the supplied ids. This is done in chunks as the ids are passed in the header of the http request.
-
caosadvancedtools.utils.
get_referenced_files
(glob, prefix=None, filename=None, location=None) queries the database for files referenced by the provided glob
Parameters: glob: the glob referencing the file(s) prefix: the glob can be relative to some path, in that case that path needs
to be given as prefix
filename: the file in which the glob is given (used for error messages) location: the location in the file in which the glob is given (used for
error messages)
-
caosadvancedtools.utils.
read_field_as_list
(field) E.g. in yaml headers entries can be single values or list. To simplify the work with those values, this function puts single values in a list.
-
caosadvancedtools.utils.
replace_path_prefix
(path, old_prefix, new_prefix) Replaces the prefix old_prefix in path with new_prefix.
Raises a RuntimeError when the path does not start with old_prefix.
-
caosadvancedtools.utils.
return_field_or_property
(value, prop=None) returns value itself of a property.
Typical in yaml headers is that a field might sometimes contain a single value and other times a dict itself. This function either returns the single value or (in case of dict as value) a value of the dict.
-
caosadvancedtools.utils.
set_log_level
(level)
-
caosadvancedtools.utils.
string_to_person
(person) Creates a Person Record from a string.
The following formats are supported: - <Firstname> <Lastname> <*> - <Lastname(s)>,<Firstname(s)>,<*>
The part after the name can be used for an affiliation for example.
caosadvancedtools.webui_formatter module
-
class
caosadvancedtools.webui_formatter.
WebUI_Formatter
(*args, full_file=None, **kwargs) Bases:
logging.Formatter
allows to make logging to be nicely displayed in the WebUI
You can enable this as follows: logger = logging.getLogger(“<LoggerName>”) formatter = WebUI_Formatter(full_file=”path/to/file”) handler = logging.Handler() handler.setFormatter(formatter) logger.addHandler(handler)
-
format
(record) Return the HTML formatted log record for display on a website.
This essentially wraps the text formatted by the parent class in html.
- Parameters
record –
- Raises
RuntimeError – If the log level of the record is not supported. Supported log levels include logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, and logging.CRITICAL.
- Returns
The formatted log record.
- Return type
-