Other utilities in LinkAhead Advanced User Tools

The table file importer

The LinkAhead Advanced user tools provide a generic TableImporter class which reads different table file formats (at the time of writing of this documentation, .xls(x), .csv, and .tsv) and converts them into pandas.DataFrame objects. It provides helper functions for converting column values (e.g., converting the string values “yes” or “no” to True or False), checking the presence of obligatory columns in a table and whether those have missing values, and datatype checks.

The base class TableImporter provides the general verification methods, while each subclass like XLSXImporter or CSVImporter implements its own read_file function that is used to convert a given table file into a pandas.DataFrame.

Empty fields in integer columns

Reading in table files that have integer-valued columns with missing data can result in datatype contradictions (see the Pandas documentation on nullable integers) since the default value for missing fields, numpy.nan, is a float. This is why from version 0.11 and above, the TableImporter uses pandas.Int64Dtype as the default datatype for all integer columns which allows for empty fields while keeping all actual data integer-valued. This behavior can be changed by initializing the TableImporter with convert_int_to_nullable_int=False in which case a DataInconsistencyError is raised when an empty field is encountered in a column with an non-nullable integer datatype.