This document describes the SQL tables used by CaosDB and how they represent the internal data structure. # Tables # ## entities ## All entities (RecordTypes, Records, Properties, ...) have an entry here. The columns are: - `id` :: The (unique) ID of this entity. - `name` :: Name of the entity. - `description` :: A description of the entity. - `role` :: The role of the entity (e.g. data type, RecordType, Property, Record, ...). - `acl` :: Access control ID. ## isa ## Inheritance as in "A is a B" is stored in this table. The columns: - `child` :: The child entity. - `parent` :: The parent entity. - `type` :: The inheritance type (either `INHERITANCE` or `SUBTYPING` (what's "subtyping"?)). **TODO** Why is this table empty on the default example? ## isa_cache ## Here, information about nested inheritance is cached (upon creation?). - `child` :: The child entity. - `parent` :: The parent entity. - `rpath` :: Possible intermediate steps on a path from `child` to `parent`. See the *Inheritance* section for more details. ## foo_data ## Property data is stored here, in tables named after the data type: - date_data - datetime_data - double_data - integer_data - text_data - enum_data - name_data - null_data - reference_data These tables share (generally) the same set of columns: - `domain_id` :: The domain of an item, see the *multipurpose subdomains* section. For "normal" data, this is 0. - `entity_id` :: The entity to which this property belongs. In most cases, this is a "normal" entity like a `Record` or `RecordType`. - `property_id` :: In most cases, the property to which the value belongs. - `value` :: The value of the property. - `status` :: The importance of the property (?). - `pidx` :: The property index, becomes nonzero when a property occurs more than once in an entity. ### `name_data` ### This table is used to name RecordTypoes, Records, Properties etc. The column `property_id` has the value 20, because that is the entity ID for names: ```sql > SELECT * FROM entities WHERE id=20; +----+-------------------+----------+------+ | id | description | role | acl | +----+-------------------+----------+------+ | 20 | Name of an entity | PROPERTY | 0 | +----+-------------------+----------+------+ ``` ## data_type ## The type of properties is stored here. The columns are: - `domain_id` :: Property identitification, same as for plain data? - `entity_id` :: Property identitification, same as for plain data? - `property_id` :: Property identitification, same as for plain data? - `datatype` :: The data type of the property, a reference to an entity. ## files ## - `file_id` :: ID. - `path` :: (Relative) path to the file. - `size` :: Size in bytes. - `hash` :: Hash of the file contents, as binary. **TODO** Which algorithm? - `checked_timestamp` :: Timestamp when last checked? ## `feature_config` ## +-------------------+---------+ | _key | _value | +-------------------+---------+ | ENTITY_VERSIONING | ENABLED | +-------------------+---------+ A key-value store for features. Use the `is_feature_config` procedure to check for the status of specific features. ## Versioning tables ## There are a few extra tables for implementing versioning. Also there is a special column `value_iversion` in the `reference_data` table which stores the version of the referenced entity at the time of the version. If `value_iversion` is NULL, no specific version is stored. The `_iversion` is an incremental version counter, starting implicitly at 1. Previous versions of entities are stored, along with their `_iversion`, in the `archive_*` tables. ### `transactions` ### Replacement for `transaction_log`, holds ID, user and time of transaction: - `srid` :: Server request ID, used to identify transactions - `username` :: User name - `realm` :: Realm for which the user name is valid - `seconds` :: Time of transaction: seconds - `nanos` :: Time of transaction: sub-second time resolution ### `entity_version` ### Versioning info for entities: - `entity_id` :: persistent ID of the entity - `hash` :: Hash, for future use. - `version` :: External version string, may be globally unique. Should be used by API calls. - `_iversion` :: Version ID for this entity for internal use, typically an incremental counter, starting with 1 - `_ipparent` :: Primary parent ID (internal) for this version, i.e. predecessor - `srid` :: Server request / transaction ID which created this version ### `archive_footable` ### Older (i.e. not current) data, from previous versions of an entity which were stored in `footable`. - `domain_id` :: Same as in `footable`. - `entity_id` :: Same as in `footable`. - `property_id` :: Same as in `footable`. - `value` :: The value at the given version. - `status` :: The value at the given version. - `pidx` :: Same as in `footable`. - `_iversion` :: Version index of this version. - ... :: Data type specific columns may also be there. ## TODO ## - entity_acl - groups - logging - desc_overrides - name_overrides - passwd - passwords - permissions - query_template_def :: User-defined query templates? - roles :: possible user roles - stats :: benchmarking? - transaction_log :: benchmarking? - units_lin_con :: fraction and unit conversion? - user_info :: Metadata for (local?) user accounts - user_roles :: User permissions? # Inheritance caching # The `isa_cache` table does not only store direct inheritances, but also nested parentships. This is done by adding intermediate inheritance steps in the `rpath` column: - For *direct inheritance*, the `rpath` column contains the child itself. - For all other *possible paths* between the child and parent, it contains the entities between, separated by the `>` character. ## Example ## Consider this inheritance structure and the resulting table: ``` 0 Read this from top to bottom: "1 is a 0", / \ "2 is a 0", and so on. 1 2 \ / 3 | 4 | 5 ``` |-------|--------|-------| | child | parent | rpath | |-------|--------|-------| | 1 | 0 | 1 | | 2 | 0 | 2 | | 3 | 0 | 1 | | 3 | 0 | 2 | | 3 | 1 | 3 | | 3 | 2 | 3 | | 4 | 0 | 3>1 | | 4 | 0 | 3>2 | | 4 | 1 | 3 | | 4 | 2 | 3 | | 4 | 3 | 4 | | 5 | 0 | 4>3>1 | | 5 | 0 | 4>3>2 | | 5 | 1 | 4>3 | | 5 | 2 | 4>3 | | 5 | 3 | 4 | | 5 | 4 | 5 | |-------|--------|-------| # Multipurpose subdomains # Multipurpose subdomains are generically used to work with composite properties. Currently implemented examples are: - Properties with units (if multiple Properties exist) - Lists - Name overrides ## Example ## Let's have a look at this *Record* (simplified XML): ```xml Hello World 23 V1, V2, V3, V4, ... ```