# CaosDB Query Language Examples See syntax specification in [CaosDB Query Language Syntax](query-syntax). In this chapter, the CaosDB Query Language (CQL) is presented as a means of formulating search commands, commonly referred to as queries. It is highly recommended that you experiment with the examples provided, such as those found on https://demo.indiscale.com. An interactive tour is also available on this public instance, which includes a comprehensive overview of the query language. Therefore, it is suggested that you begin there and subsequently proceed with this more detailed explanation. ## Introduction Queries typically start with the keyword `FIND`, followed by a description of what you want to find. For example, you can search for all musical instruments with `FIND MusicalInstrument`. *Note*, the CQL is case **in**sensitive. We will write keywords of CQL in all caps to illustrate what parts are part of the language. The most common way is to provide a RecordType name after `FIND` (as in the example above). However, you can also use the name of some other entity: `FIND 'My first guitar'`. *Note*, that we put the name here in quotes. Spaces are used in CQL as separator of words. Thus, if something contains quotes, like the name here, it needs to be quoted. While queries like the last one are great to get an impression of the data, often we need to be more specific. Therefore, queries can include various conditions to restrict the result set. Example: `FIND MusicalAnalysis WITH quality_factor>0.5 AND date IN 2019`. The keyword `WITH` signifies that for each Record of the type `MusicalAnalysis`, an assessment is made to determine whether it possesses a Property labelled `quality_factor` that exceeds 0.5, as well as another Property labelled `date` that may correspond to any day within the year 2019. In order to make CQL easier to learn and to remember we designed it to be close to natural spoken English language. For example, you can write `FIND Guitar WHICH HAS A PROPERTY price`. Here, "HAS A PROPERTY" is what we call syntactic sugar. It lets the query role off the tongue more easily than `FIND Guitar WHICH price` but it is actually not needed and does not change the meaning of the query. In fact, you could also write `FIND Guitar WITH price`. If you are only interested in the number of Entities that match your query, you can replace `FIND` with `COUNT` and the query will only return the number of Entities in the result set. Sometimes the list of Records that you get using a `FIND` query is not what you need; especially if you want to export a subset of the data for the analysis with some external tool. `SELECT` queries offer to represent the query result in a tabular form. If you replace the `FIND` keyword of a query with `SELECT x, y, z FROM`, then CaosDB will return the result as tabular data. For example, instead of `FIND Guitar`, try out `SELECT name, electric FROM Guitar` As you can see, those queries are design to allow very specific data requests. If you do not want/need to be as specific you can omit the first keyword (`FIND` or `SELECT`) which creates a search for anything that has a text Property with something like your expression. For example, the query "John" will search for any Records that has a text property that contains this string. With this, we conclude our introduction of CQL. You now know about the basic elements. The following will cover the various aspects in more detail and you will for example learn how you can use references among Records, or meta data like the creation time of a Record to restrict the query result set. ## What am I searching for? We already learned, that we can provide the name of a RecordType after the `FIND` keyword. Let's call this part of the query "entity expression". In general, we need to identify with the entity expression one or more entities via their name, CaosDB ID or a pattern. - `FIND Guitar` - `FIND Guit*` ('*' represents none, one or more characters) - `FIND <<[gG]ui.*>>` (a regular expression surrounded by _<<_ and '>>'. see below) - `FIND 110` The result set will contain Entities that are either identified by the entity expression directly (i.e. they have the name or the given ID) or the have such an Entity as parent. As you know, CaosDB distincts among different Entity roles: - Entity - Record - RecordType - Property - File You can provide the role directly after the `FIND` keyword and before the entity expression: `FIND RECORD Guitar`. The result set will then restricted to Entities with that role. ## Conditions / Filters ### POV - Property-Operator-Value The following queries are equivalent and will restrict the result set to records which have a property named _pname1_ that has a value _val1_. `FIND ename.pname1=val1` `FIND ename WITH pname1=val1` `FIND ename WHICH HAS A PROPERTY pname1=val1` `FIND ename WHICH HAS A pname1=val1` Again, the resultset can be restricted to any other entity role as well: `FIND RECORDTYPE ename WHICH HAS A pname1=val1` _currently known operators:_ `=, !=, <=, <, >=, >` (and cf. next paragraphes!) #### Special Operator: LIKE The _LIKE_ can be used with wildcards. The `*` is a wildcard for any (possibly empty) sequence of characters. Examples: `FIND ename WHICH HAS A pname1 LIKE va*` `FIND ename WHICH HAS A pname1 LIKE va*1` `FIND ename WHICH HAS A pname1 LIKE *al1` _Note:_ The _LIKE_ operator will only produce expectable results with text properties. #### Special Case: References In general a reference can be addressed just like a POV filter. So `FIND ename1 WITH pname1=ename2` will also return any record named _ename1_ which references the entity with name or id _ename2_ via a reference property named _pname1_. However, it will also return any entity with a text property of that name with the string value _ename2_. In order to restrict the result set to reference properties one may make use of special reference operators: _reference operators:_ `->, REFERENCES, REFERENCE TO` The query looks like this: `FIND ename1 WHICH HAS A pname1 REFERENCE TO ename2` `FIND ename1 WHICH HAS A pname1->ename2` #### Special Case: DateTime _DateTime operators:_ `=, !=, <, >, IN, NOT IN` ##### `d1=d2`: Equivalence relation. * ''True'' iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively). * ''False'' iff they have the same DateTime flavor but have different fields defined or fields with differing values. * ''Undefined'' otherwise. Examples: * `2015-04-03=2015-04-03T00:00:00` is undefined. * `2015-04-03T00:00:00=2015-04-03T00:00:00.0` is undefined (second precision vs. nanosecond precision). * `2015-04-03T00:00:00.0=2015-04-03T00:00:00.0` is true. * `2015-04-03T00:00:00=2015-04-03T00:00:00` is true. * `2015-04=2015-05` is false. * `2015-04=2015-04` is true. ##### `d1!=d2`: Intransitive, symmetric relation. * ''True'' iff `d1=d2` is false. * ''False'' iff `d1=d2` is true. * ''Undefined'' otherwise. Examples: * `2015-04-03!=2015-04-03T00:00:00` is undefined. * `2015-04-03T00:00:00!=2015-04-03T00:00:00.0` is undefined. * `2015-04-03T00:00:00.0!=2015-04-03T00:00:00.0` is false. * `2015-04-03T00:00:00!=2015-04-03T00:00:00` is false. * `2015-04!=2015-05` is true. * `2015-04!=2015-04` is false. ##### `d1>d2`: Transitive, non-symmetric relation. Semantics depend on the flavors of d1 and d2. If both are... ###### [UTCDateTime](specification/Datatype.html#datetime) * ''True'' iff the time of d1 is after the the time of d2 according to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time). * ''False'' otherwise. ###### [SemiCompleteDateTime](specification/Datatype.html#datetime) * ''True'' iff `d1.ILB>d2.EUB` is true or `d1.ILB=d2.EUB` is true. * ''False'' iff `d1.EUB2014` is true. * `2015-04>2014` is true. * `2015-01-01T20:15.00>2015-01-01T20:14` is true. * `2015-04>2015` is undefined. * `2015>2015-04` is undefined. * `2015-01-01T20:15>2015-01-01T20:15:15` is undefined. * `2014>2015` is false. * `2014-04>2015` is false. * `2014-01-01>2015-01-01T20:15:30` is false. ##### `d1d2.EUB}} is true or {{{d1.ILB=d2.EUB` is true. * ''Undefined'' otherwise. Examples: * `2014<2015` is true. * `2014-04<2015` is true. * `2014-01-01<2015-01-01T20:15:30` is true. * `2015-04<2015` is undefined. * `2015<2015-04` is undefined. * `2015-01-01T20:15<2015-01-01T20:15:15` is undefined. * `2015<2014` is false. * `2015-04<2014` is false. * `2015-01-01T20:15.00<2015-01-01T20:14` is false. ##### `d1 IN d2`: Transitive, non-symmetric relation. Semantics depend on the flavors of d1 and d2. If both are... ###### [SemiCompleteDateTime](specification/Datatype.html#datetime) * ''True'' iff (`d1.ILB>d2.ILB` is true or `d1.ILB=d2.ILB` is true) and (`d1.EUB val1` `FIND ename . > val1` `FIND ename.>val1` And for references... `FIND ename1 WHICH HAS A REFERENCE TO ename2` `FIND ename1 WHICH REFERENCES ename2` `FIND ename1 . -> ename2` `FIND ename1.->ename2` The following query returns records which have a _pname1_ property with any value. `FIND ename WHICH HAS A PROPERTY pname1` `FIND ename WHICH HAS A pname1` `FIND ename WITH pname1` `FIND ename WITH A pname1` `FIND ename WITH A PROPERTY pname1` `FIND ename WITH PROPERTY pname1` `FIND ename . pname1` `FIND ename.pname1` ### TransactionFilter `FIND ename WHICH (sugar|negated_sugar)? (NOT)? (CREATED|INSERTED|UPDATED) (by_clause time_clause?| time_clause by_clause?)` *Definition* sugar:: `HAS BEEN` | `HAVE BEEN` | `HAD BEEN` | `WAS` | `IS` negated_sugar:: `HAS NOT BEEN` | `HASN'T BEEN` | `WAS NOT` | `WASN'T` | `IS NOT` | `ISN'T` | `HAVN'T BEEN` | `HAVE NOT BEEN` | `HADN'T BEEN` | `HAD NOT BEEN` by_clause:: `BY (ME | username | SOMEONE ELSE (BUT ME)? | SOMEONE ELSE BUT username)` datetime:: A datetime string of the form `YYYY[-MM[-DD(T| )[hh[:mm[:ss[.nnn][(+|-)zzzz]]]]]]` or `TODAY`. time_clause:: `[AT|ON|IN|BEFORE|AFTER|UNTIL|SINCE] (datetime) ` *Examples* `FIND ename WHICH HAS BEEN CREATED BY ME ON 2014-12-24` `FIND ename WHICH HAS BEEN CREATED BY SOMEONE ELSE ON 2014-12-24` `FIND ename WHICH HAS BEEN CREATED BY erwin ON 2014-12-24` `FIND ename WHICH HAS BEEN UPDATED BY SOMEONE ELSE BUT erwin ON 2014-12-24` `FIND ename WHICH HAS BEEN INSERTED BY erwin` `FIND ename WHICH HAS BEEN INSERTED SINCE 2021-04` Note that `SINCE` and `UNTIL` are inclusive, while `BEFORE` and `AFTER` are not. ### File Location Search for file objects by their location: `FIND FILE WHICH IS STORED AT a/certain/path/` #### Wildcards _STORED AT_ can be used with wildcards similar to unix wildcards. * `*` matches any characters or none at all, but not the directory separator `/` * `**` matches any character or none at all. * A leading `*` is a shortcut for `/**` * Asterisks directly between two other asterisks are ignored: `***` is the same as `**`. * Escape character: `\` (E.g. `\\` is a literal backslash. `\*` is a literal star. But `\\*` is a literal backslash followed by a wildcard.) Examples: Find any files ending with `.acq`: `FIND FILE WHICH IS STORED AT *.acq` or `FIND FILE WHICH IS STORED AT **.acq` or `FIND FILE WHICH IS STORED AT /**.acq` Find files stored one directory below `/data/`, ending with `.acq`: `FIND FILE WHICH IS STORED AT /data/*/*.acq` Find files stored in `/data/`, ending with `.acq`: `FIND FILE WHICH IS STORED AT /data/*.acq` Find files stored in a directory at any depth in the tree below `/data/`, ending with `.acq`: `FIND FILE WHICH IS STORED AT /data/**.acq` Find any file in a directory which begins with `2016-02`: `FIND FILE WHICH IS STORED AT */2016-02*/*` ### Back References The back reference filters for entities that are referenced by another entity. The following query returns records of the type _ename1_ which are referenced by _ename2_ entities via the reference property _pname1_. * `FIND ename1 WHICH IS REFERENCED BY ename2 AS A pname1` * `FIND ename1 WITH @ ename2 / pname1` * `FIND ename1 . @ ename2 / pname1` One may omit the property specification: * `FIND ename1 WHICH IS REFERENCED BY ename2` * `FIND ename1 WHICH HAS A PROPERTY @ ename2` * `FIND ename1 WITH @ ename2` * `FIND ename1 . @ ename2` ### Nested queries, or filtering by sub-properties ### Nested queries can easily be searched by simply concatenating `WHICH` or `WITH` expressions: * `FIND ename WHICH HAS A pname WHICH HAS A subpname=val` * For example: `FIND AN experiment WHICH HAS A camera WHICH HAS A 'serial number'= 1234567890` ### Combining Filters with Propositional Logic Any result set can be filtered by logically combining POV filters or back reference filters: #### Conjunction (AND) As we saw above, we can combine conditions: `FIND MusicalAnalysis WHICH HAS quality_factor>0.5 AND date IN 2019` In general, the conjunction takes the form `FIND WHICH AND `. You can also use `&` instead of `AND` or chain more than two conditions. If you mix conjunctions with disjunctions, you need to add brackets to define the priority. For example: `FIND WHICH ( AND ) OR `. `FIND Guitar WHICH REFERENCES Manufacturer AND price` is a combination of a reference filter and a POV filter. For readability, you can also write `FIND Guitar WHICH REFERENCES Manufacturer AND WHICH HAS A price`. However, the additional "WHICH HAS A" is purely cosmetic (syntactic sugar). #### Disjunction (OR) The rules for disjunctions (`OR` or `|`) are the same as for conjunctions, see above. * `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A PROPERTY pname2=val2 Or A PROPERTY...` * `FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A pname2=val2 OR ...` * `FIND ename1 . pname1=val1 | pname2=val2 | ...` #### Negation (NOT) You can negate any filter by prefixing the filter with `NOT` or `!`: `FIND WHICH NOT `. There are many syntactic sugar alternatives which are treated the same as "NOT": - `DOES NOT HAVE` - `ISN'T` - and many more * `FIND ename1 WHICH DOES NOT HAVE A PROPERTY pname1=val1` * `FIND ename1 WHICH DOESN'T HAVE A pname1=val1` * `FIND ename1 . NOT pname2=val2` * `FIND ename1 . !pname2=val2` #### Parentheses Basically, you can put parantheses around filter expressions and con- or disjunctions. - `FIND Guitar WHICH (REFERENCES Manufacturer AND WHICH HAS A price)`. - `FIND Guitar WHICH (REFERENCES Manufacturer) AND (WHICH HAS A price)`. For better readability, the above query can be written as: - `FIND Guitar WHICH (REFERENCES Manufacturer AND HAS A price)`. Note, that without syntactic sugar this query looks like: - `FIND Guitar WHICH (REFERENCES Manufacturer AND price)`. * `FIND ename1 WHICH HAS A pname1=val1 AND DOESN'T HAVE A pname2 FROM`: `SELECT p1, p2, p3 FROM ename` However, the `SELECT` query can also return properties of referenced entities and thereby are a means of joining entities together and return a custom view or projection: `SELECT Conductor.Last Name FROM Experiment` would return the conductor's last name, when `Conductor` is a reference property of `Experiment` and `Last Name` is a property of the `Conductor` records. ### Selectors Selectors are strings of entity names which are separated by `.` (dot). E.g. `Conductor.Last Name` or `Conductor.Address` or even `Experiment.Conductor.Last name`. Selectors in a `SELECT` queries are separated by `,` (comma). E.g. `Conductor.First Name, Conductor.Last Name`. ### Evaluation of Selectors The query will return all those properties which have the same name as specified by the selector (case-insensitive). However, `SELECT` queries are also capable of subtyping in the selectors: `SELECT Person FROM Experiment` would return all `Person` properties but all `Conductors` as well, if `Conductor` is a child of `Person`. Note: When a property `responsible` with data type `Person` exists, the above `SELECT` statement would not include records that use this property (since `responsible` is not a child of Person). ## Versioning Since Caosdb 0.2 entities are optionally version controlled. The query language will be extended to include versioning in the future. A current minimal implementation introduces the `ANY VERSION OF` modifier which can be used to return all matching versions in the results of `COUNT`, `FIND`, and `SELECT` queries. ### Example * `FIND ANY VERSION OF RECORD WITH pname=value` returns the all past and present versions of records with `pname = value`. ### Scope and current limitations * The `ANY VERSION OF` modifier currently the only expression for taking the versioning into account when using the query language. * Subproperties are not supported yet, e.g. `FIND ANY VERSION OF ENTITY WHICH IS REFERENCED BY ename WITH ...`. This applies to all cases where you specify properties of *referenced* entities or *referencing* entities. ## Configuration In CaosDB Server implementations before version 0.9.0, the `FIND ename` query would return any entity with that name and all children, regardless of the entity's role. Basically, `FIND ename` *was* equivalent to `FIND ENTITY ename`. Since 0.9.0 the default behavior has changed and now `FIND ename` is equivalent to `FIND RECORD ename`. This default is, however, configurable via the `FIND_QUERY_DEFAULT_ROLE` server property. See [Server Configuration](./administration/configuration). ## Future * Additional versioning queries: * Add `(LATEST|LAST|OLDEST|NEWEST|FIRST) VERSION OF` modifiers. * Add `(ANY|LATEST|LAST|OLDEST|NEWEST|FIRST) VERSION (BEFORE|AFTER) (||) OF` modifier. * Add support for subproperties, e.g. `FIND ANY VERSION OF ENTITY WHICH IS REFERENCED BY ename WITH ...`. * Find deleted entities: `FIND ename WHICH WAS DELETED (BY ME | ON 2014-12-24)` * *More Logic*, especially `ANY`, `ALL`, `NONE`, and `SUCH THAT` key words (and equivalents) for logical quantisation: `FIND ename1 SUCH THAT ALL ename2 WHICH HAVE A REFERENCE TO ename1 HAVE A pname=val`. This is like `FIND experiment SUCH THAT ALL person WHICH ARE REFERENCED BY THIS experiment AS conductor HAVE AN 'academic title'=professor.`