caosadvancedtools.scifolder package
Submodules
caosadvancedtools.scifolder.analysis_cfood module
- class caosadvancedtools.scifolder.analysis_cfood.AnalysisCFood(*args, **kwargs)
Bases:
AbstractFileCFood
,WithREADME
- collect_information()
The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
- create_identifiables()
should set the instance variable Container with the identifiables
- static get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- static name_beautifier(name)
a function that can be used to rename the project. I.e. if the project in CaosDB shall be named differently than in the folder structure. Use discouraged.
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- win_paths = []
caosadvancedtools.scifolder.experiment_cfood module
- class caosadvancedtools.scifolder.experiment_cfood.ExperimentCFood(*args, **kwargs)
Bases:
AbstractFileCFood
,WithREADME
- collect_information()
The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
- static create_identifiable_experiment(match)
- create_identifiables()
should set the instance variable Container with the identifiables
- static get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- static name_beautifier(x)
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- win_paths = []
caosadvancedtools.scifolder.generic_pattern module
this module contains regular expressions neeeded for the standard file structure
caosadvancedtools.scifolder.publication_cfood module
- class caosadvancedtools.scifolder.publication_cfood.PublicationCFood(*args, **kwargs)
Bases:
AbstractFileCFood
,WithREADME
- collect_information()
The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
- create_identifiables()
should set the instance variable Container with the identifiables
- static get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- win_paths = []
- caosadvancedtools.scifolder.publication_cfood.folder_to_type(name)
caosadvancedtools.scifolder.result_table_cfood module
- class caosadvancedtools.scifolder.result_table_cfood.ResultTableCFood(*args, **kwargs)
Bases:
AbstractFileCFood
- create_identifiables()
should set the instance variable Container with the identifiables
- static get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- static name_beautifier(x)
- property_name_re = re.compile('^(?P<pname>.+?)\\s*(\\[\\s?(?P<unit>.*?)\\s?\\] *)?$')
- table_re = 'result_table_(?P<recordtype>.*).csv$'
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- win_paths = []
caosadvancedtools.scifolder.simulation_cfood module
- class caosadvancedtools.scifolder.simulation_cfood.SimulationCFood(*args, **kwargs)
Bases:
AbstractFileCFood
,WithREADME
- collect_information()
The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
- create_identifiables()
should set the instance variable Container with the identifiables
- static get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- win_paths = []
caosadvancedtools.scifolder.software_cfood module
- class caosadvancedtools.scifolder.software_cfood.SoftwareCFood(*args, **kwargs)
Bases:
AbstractFileCFood
,WithREADME
- collect_information()
The CFood collects information for further processing.
Often CFoods need information from files or even from the database in order to make processing decision. It is intended that this function is called after match. Thus match can be used without connecting to the database.
To be overwritten by subclasses
- create_identifiables()
should set the instance variable Container with the identifiables
- static get_re()
Returns the regular expression used to identify files that shall be processed
This function shall be implemented by subclasses.
- update_identifiables()
Changes the identifiables as needed and adds changed identifiables to self.to_be_updated
- win_paths = []
caosadvancedtools.scifolder.utils module
- caosadvancedtools.scifolder.utils.add_value_list(header, df, name)
- caosadvancedtools.scifolder.utils.create_files_list(df, ftype)
- caosadvancedtools.scifolder.utils.get_entity_ids_from_include_file(prefix, file_path)
reads version ids from include file
- caosadvancedtools.scifolder.utils.get_files_referenced_by_field(globs, prefix='', final_glob=None)
returns all file entities at paths described by given globs
This function assumes that the supplied globs is a list of filenames, directories or globs.
prefix should be the path of the crawled file to supply a context for relative paths.
- caosadvancedtools.scifolder.utils.get_xls_header(filepath)
This function reads an xlsx file and creates a dictionary analogue to the one created by the yaml headers in README.md files read with the get_header function of caosdb-advancedtools. As xlsx files lack the hierarchical structure, the information that can be provided is less complex. See the possibility to use the xlsx files as a less powerfull version for people who are not comfortable with the README.md files.
The xlsx file has a defined set of rows. In each row a list of entries can be given. This structure is converted to a dictionary with a fix structure.
- caosadvancedtools.scifolder.utils.is_filename_allowed(path, recordtype)
- caosadvancedtools.scifolder.utils.parse_responsibles(header)
Extract the responsible person(s) from the yaml header.
If field responsible is a list every entry from that list will be added as a person. Currently only the format <Firstname> <Lastname> <*> is supported. If it is a simple string, it is added as the only person.
- caosadvancedtools.scifolder.utils.reference_records_corresponding_to_files(record, recordtypes, globs, path, to_be_updated, property_name)
caosadvancedtools.scifolder.withreadme module
- class caosadvancedtools.scifolder.withreadme.DataModel(results: str = 'results', scripts: str = 'scripts', sources: str = 'sources', date: str = 'date', Project: str = 'Project', Analysis: str = 'Analysis', identifier: str = 'identifier', responsible: str = 'responsible', revisionOf: str = 'revisionOf', Experiment: str = 'Experiment', Publication: str = 'Publication', Simulation: str = 'Simulation', binaries: str = 'binaries', sourcecode: str = 'sourceCode', description: str = 'description')
Bases:
object
- class caosadvancedtools.scifolder.withreadme.WithREADME
Bases:
object
- convert_path(el)
converts the path in el to unix type
el can be a dict of a string. If el is dict it must have a file key
returns: same type as el
- convert_win_paths()
- find_referenced_files(fields)
iterates over given fields in the header and searches for files
if the field contains a glob. The file entities are attached
- property header
- reference_files_from_header(record)
adds properties that reference the files collected in ref_files
ref_files is expected to be a list of (files, description, recordtype) tuples, where files is the list of file entities, description the description that shall be added to each and recordtype the recordtype that the files shall get as parent. files may be an empty list and description and recordtype may be None.
The files will be grouped according to the keys used in ref_files and the record types. The record types take precedence.
- reference_included_records(record, fields, to_be_updated)
iterates over given fields in the header and searches for files
if the field contains a glob. The file entities are attached
- caosadvancedtools.scifolder.withreadme.get_description(value)
- caosadvancedtools.scifolder.withreadme.get_glob(field)
takes a field which must be a list of globs or dicts.
if it is a dict, it must have either an include or a file key
- caosadvancedtools.scifolder.withreadme.get_rt(value)