Other utilities in LinkAhead Advanced User Tools

The table file importer

The LinkAhead Advanced user tools provide a generic TableImporter class which reads different table file formats (at the time of writing of this documentation, .xls(x), .csv, and .tsv) and converts them into pandas.DataFrame objects. It provides helper functions for converting column values (e.g., converting the string values “yes” or “no” to True or False), checking the presence of obligatory columns in a table and whether those have missing values, and datatype checks.

The base class TableImporter provides the general verification methods, while each subclass like XLSXImporter or CSVImporter implements its own read_file function that is used to convert a given table file into a pandas.DataFrame.

Empty fields in integer columns

Reading in table files that have integer-valued columns with missing data can result in datatype contradictions (see the Pandas documentation on nullable integers) since the default value for missing fields, numpy.nan, is a float. This is why from version 0.11 and above, the TableImporter uses pandas.Int64Dtype as the default datatype for all integer columns which allows for empty fields while keeping all actual data integer-valued. This behavior can be changed by initializing the TableImporter with convert_int_to_nullable_int=False in which case a DataInconsistencyError is raised when an empty field is encountered in a column with an non-nullable integer datatype.

The loadfiles module and executable

For making files available to the LinkAhead server as File entities (see also the server’s file server documentation), the LinkAhead Advanced User tools provide the loadFiles module and a linkahead-loadfiles executable. Both operate on a path as seen by the LinkAhead server (i.e., a path within the Docker container in the typical LinkAhead Control setup) and can be further specified to exclude or exclude specific files. In the typical setup, where a directory is mounted as an extroot into the Docker container by LinkAhead control, running

linkahead-loadfiles /opt/caosdb/mnt/extroot

makes all files available. Execute

linkahead-loadfiles --help

for more information and examples.