caosadvancedtools.scifolder package

Submodules

caosadvancedtools.scifolder.analysis_cfood module

class caosadvancedtools.scifolder.analysis_cfood.AnalysisCFood(*args, **kwargs)

Bases: AbstractFileCFood, WithREADME

collect_information()

The CFood collects information for further processing.

Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.

To be overwritten by subclasses

create_identifiables()

should set the instance variable Container with the identifiables

static get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

static name_beautifier(name)

a function that can be used to rename the project. I.e. if the project in CaosDB shall be named differently than in the folder structure. Use discouraged.

update_identifiables()

Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

win_paths = []

caosadvancedtools.scifolder.experiment_cfood module

class caosadvancedtools.scifolder.experiment_cfood.ExperimentCFood(*args, **kwargs)

Bases: AbstractFileCFood, WithREADME

collect_information()

The CFood collects information for further processing.

Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.

To be overwritten by subclasses

static create_identifiable_experiment(match)
create_identifiables()

should set the instance variable Container with the identifiables

static get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

static name_beautifier(x)
update_identifiables()

Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

win_paths = []

caosadvancedtools.scifolder.generic_pattern module

this module contains regular expressions neeeded for the standard file structure

caosadvancedtools.scifolder.publication_cfood module

class caosadvancedtools.scifolder.publication_cfood.PublicationCFood(*args, **kwargs)

Bases: AbstractFileCFood, WithREADME

collect_information()

The CFood collects information for further processing.

Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.

To be overwritten by subclasses

create_identifiables()

should set the instance variable Container with the identifiables

static get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

update_identifiables()

Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

win_paths = []
caosadvancedtools.scifolder.publication_cfood.folder_to_type(name)

caosadvancedtools.scifolder.result_table_cfood module

class caosadvancedtools.scifolder.result_table_cfood.ResultTableCFood(*args, **kwargs)

Bases: AbstractFileCFood

create_identifiables()

should set the instance variable Container with the identifiables

static get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

static name_beautifier(x)
property_name_re = re.compile('^(?P<pname>.+?)\\s*(\\[\\s?(?P<unit>.*?)\\s?\\] *)?$')
table_re = 'result_table_(?P<recordtype>.*).csv$'
update_identifiables()

Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

win_paths = []

caosadvancedtools.scifolder.simulation_cfood module

class caosadvancedtools.scifolder.simulation_cfood.SimulationCFood(*args, **kwargs)

Bases: AbstractFileCFood, WithREADME

collect_information()

The CFood collects information for further processing.

Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.

To be overwritten by subclasses

create_identifiables()

should set the instance variable Container with the identifiables

static get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

update_identifiables()

Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

win_paths = []

caosadvancedtools.scifolder.software_cfood module

class caosadvancedtools.scifolder.software_cfood.SoftwareCFood(*args, **kwargs)

Bases: AbstractFileCFood, WithREADME

collect_information()

The CFood collects information for further processing.

Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.

To be overwritten by subclasses

create_identifiables()

should set the instance variable Container with the identifiables

static get_re()

Returns the regular expression used to identify files that shall be processed

This function shall be implemented by subclasses.

update_identifiables()

Changes the identifiables as needed and adds changed identifiables to self.to_be_updated

win_paths = []

caosadvancedtools.scifolder.utils module

caosadvancedtools.scifolder.utils.add_value_list(header, df, name)
caosadvancedtools.scifolder.utils.create_files_list(df, ftype)
caosadvancedtools.scifolder.utils.get_entity_ids_from_include_file(prefix, file_path)

reads version ids from include file

caosadvancedtools.scifolder.utils.get_files_referenced_by_field(globs, prefix='', final_glob=None)

returns all file entities at paths described by given globs

This function assumes that the supplied globs is a list of filenames, directories or globs.

prefix should be the path of the crawled file to supply a context for relative paths.

caosadvancedtools.scifolder.utils.get_xls_header(filepath)

This function reads an xlsx file and creates a dictionary analogue to the one created by the yaml headers in README.md files read with the get_header function of caosdb-advancedtools. As xlsx files lack the hierarchical structure, the information that can be provided is less complex. See the possibility to use the xlsx files as a less powerfull version for people who are not comfortable with the README.md files.

The xlsx file has a defined set of rows. In each row a list of entries can be given. This structure is converted to a dictionary with a fix structure.

caosadvancedtools.scifolder.utils.is_filename_allowed(path, recordtype)
caosadvancedtools.scifolder.utils.parse_responsibles(header)

Extract the responsible person(s) from the yaml header.

If field responsible is a list every entry from that list will be added as a person. Currently only the format <Firstname> <Lastname> <*> is supported. If it is a simple string, it is added as the only person.

caosadvancedtools.scifolder.utils.reference_records_corresponding_to_files(record, recordtypes, globs, path, to_be_updated, property_name)

caosadvancedtools.scifolder.withreadme module

class caosadvancedtools.scifolder.withreadme.DataModel(results: str = 'results', scripts: str = 'scripts', sources: str = 'sources', date: str = 'date', Project: str = 'Project', Analysis: str = 'Analysis', identifier: str = 'identifier', responsible: str = 'responsible', revisionOf: str = 'revisionOf', Experiment: str = 'Experiment', Publication: str = 'Publication', Simulation: str = 'Simulation', binaries: str = 'binaries', sourcecode: str = 'sourceCode', description: str = 'description')

Bases: object

Analysis: str = 'Analysis'
Experiment: str = 'Experiment'
Project: str = 'Project'
Publication: str = 'Publication'
Simulation: str = 'Simulation'
binaries: str = 'binaries'
date: str = 'date'
description: str = 'description'
identifier: str = 'identifier'
responsible: str = 'responsible'
results: str = 'results'
revisionOf: str = 'revisionOf'
scripts: str = 'scripts'
sourcecode: str = 'sourceCode'
sources: str = 'sources'
class caosadvancedtools.scifolder.withreadme.HeaderField(key, model)

Bases: object

class caosadvancedtools.scifolder.withreadme.WithREADME

Bases: object

convert_path(el)

converts the path in el to unix type

el can be a dict of a string. If el is dict it must have a file key

returns: same type as el

convert_win_paths()
find_referenced_files(fields)

iterates over given fields in the header and searches for files

if the field contains a glob. The file entities are attached

property header
reference_files_from_header(record)

adds properties that reference the files collected in ref_files

ref_files is expected to be a list of (files, description, recordtype) tuples, where files is the list of file entities, description the description that shall be added to each and recordtype the recordtype that the files shall get as parent. files may be an empty list and description and recordtype may be None.

The files will be grouped according to the keys used in ref_files and the record types. The record types take precedence.

reference_included_records(record, fields, to_be_updated)

iterates over given fields in the header and searches for files

if the field contains a glob. The file entities are attached

caosadvancedtools.scifolder.withreadme.get_description(value)
caosadvancedtools.scifolder.withreadme.get_glob(field)

takes a field which must be a list of globs or dicts.

if it is a dict, it must have either an include or a file key

caosadvancedtools.scifolder.withreadme.get_rt(value)

Module contents