Other utilities in LinkAhead Advanced User Tools
The table file importer
The LinkAhead Advanced user tools provide a generic
TableImporter
class which reads
different table file formats (at the time of writing of this documentation,
.xls(x), .csv, and .tsv) and converts them into pandas.DataFrame
objects. It provides helper functions for converting column values (e.g.,
converting the string values “yes” or “no” to True
or False
), checking
the presence of obligatory columns in a table and whether those have missing
values, and datatype checks.
The base class TableImporter
provides the general verification methods, while each subclass like
XLSXImporter
or
CSVImporter
implements its own
read_file
function that is used to convert a given table file into a
pandas.DataFrame
.
Empty fields in integer columns
Reading in table files that have integer-valued columns with missing data can
result in datatype contradictions (see the Pandas documentation on nullable
integers) since
the default value for missing fields, numpy.nan
, is a float. This is why
from version 0.11 and above, the TableImporter
uses
pandas.Int64Dtype
as the default datatype for all integer columns
which allows for empty fields while keeping all actual data integer-valued. This
behavior can be changed by initializing the TableImporter
with
convert_int_to_nullable_int=False
in which case a
DataInconsistencyError
is
raised when an empty field is encountered in a column with an non-nullable
integer datatype.