Other utilities in LinkAhead Advanced User Tools
The table file importer
The LinkAhead Advanced user tools provide a generic
TableImporter
class which reads
different table file formats (at the time of writing of this documentation,
.xls(x), .csv, and .tsv) and converts them into pandas.DataFrame
objects. It provides helper functions for converting column values (e.g.,
converting the string values “yes” or “no” to True
or False
), checking
the presence of obligatory columns in a table and whether those have missing
values, and datatype checks.
The base class TableImporter
provides the general verification methods, while each subclass like
XLSXImporter
or
CSVImporter
implements its own
read_file
function that is used to convert a given table file into a
pandas.DataFrame
.
Empty fields in integer columns
Reading in table files that have integer-valued columns with missing data can
result in datatype contradictions (see the Pandas documentation on nullable
integers) since
the default value for missing fields, numpy.nan
, is a float. This is why
from version 0.11 and above, the TableImporter
uses
pandas.Int64Dtype
as the default datatype for all integer columns
which allows for empty fields while keeping all actual data integer-valued. This
behavior can be changed by initializing the TableImporter
with
convert_int_to_nullable_int=False
in which case a
DataInconsistencyError
is
raised when an empty field is encountered in a column with an non-nullable
integer datatype.
The loadfiles module and executable
For making files available to the LinkAhead server as File entities
(see also the server’s file server
documentation), the LinkAhead Advanced User tools provide the
loadFiles
module and a
linkahead-loadfiles
executable. Both operate on a path as seen by
the LinkAhead server (i.e., a path within the Docker container in the
typical LinkAhead Control setup) and can be further specified to
exclude or exclude specific files. In the typical setup, where a
directory is mounted as an extroot into the Docker container by
LinkAhead control, running
linkahead-loadfiles /opt/caosdb/mnt/extroot
makes all files available. Execute
linkahead-loadfiles --help
for more information and examples.