CaosDB Query Language

WIP This is going to be the specification. CQL tutorials are in the webui

Example queries

Simple FIND Query

The following query will return any entity which has the name ename and all its children. FIND ename

The following queries are equivalent and will return any entity which has the name ename and all its children, but only if they are genuin records. Of course, the returned set of entities (henceforth referred to as resultset) can also be restricted to recordtypes, properties and files.

FIND RECORD ename

FIND RECORDS ename

Wildcards use * for any characters or none at all. Wildcards for single characters (like the ‘_’ wildcard from mysql) are not implemented yet.

FIND RECORD en* returns any entity which has a name beginning with en.

Regular expressions must be surrounded by << and ‘>>’:

FIND RECORD <<e[aemn]{2,5}>>

FIND RECORD <<[cC]am_[0-9]*>>

TODO (Timm): Describe escape sequences like \\  , \*, \<< and \>>.

Currently, wildcards and regular expressions are only available for the simple-find-part of the query, i. e. no wildcards/regexps for filters.

Simple COUNT Query

This query counts entities which have certain properties.

COUNT ename will return the number of entities which have the name ename and all their children.

The syntax of the COUNT queries is equivalent to the FIND queries in any respect (this also applies to wildcards and regular expressions) but one: The prefix is to be COUNT instead of FIND.

Unlike the FIND queries, the COUNT queries do not return any entities. The result of the query is the number of entities which would be returned if the query was a FIND query.

Filters

POV - Property-Operator-Value

The following queries are equivalent and will restrict the result set to entities which have a property named pname1 that has a value val1.

FIND ename.pname1=val1

FIND ename WITH pname1=val1

FIND ename WHICH HAS A PROPERTY pname1=val1

FIND ename WHICH HAS A pname1=val1

Again, the resultset can be restricted to records:

FIND RECORD ename WHICH HAS A pname1=val1

currently known operators: =, !=, <=, <, >=, > (and cf. next paragraphes!)

Special Operator: LIKE

The LIKE can be used with wildcards. The * is a wildcard for any (possibly empty) sequence of characters. Examples:

FIND RECORD ename WHICH HAS A pname1 LIKE va*

FIND RECORD ename WHICH HAS A pname1 LIKE va*1

FIND RECORD ename WHICH HAS A pname1 LIKE *al1

Note: The LIKE operator is will only produce expectable results with text properties.

Special Case: References

In general a reference can be addressed just like a POV filter. So

FIND ename1.pname1=ename2

will also return any entity named ename1 which references the entity with name or id ename2 via a reference property named pname1. However, it will also return any entity with a text property of that name with the string value ename2. In order to restrict the result set to reference properties one may make use of special reference operators:

reference operators: ->, REFERENCES, REFERENCE TO

The query looks like this:

FIND ename1 WHICH HAS A pname1 REFERENCE TO ename2

FIND ename1 WHICH HAS A pname1->ename2

Time Special Case: DateTime

DateTime operators: =, !=, <, >, IN, NOT IN

d1=d2: Equivalence relation.
  • ‘’True’’ iff d1 and d2 are equal in every respect (same DateTime flavor, same fields are defined/undefined and all defined fields are equal respectively).
  • ‘’False’’ iff they have the same DateTime flavor but have different fields defined or fields with differing values.
  • ‘’Undefined’’ otherwise.

Examples:

  • 2015-04-03=2015-04-03T00:00:00 is undefined.
  • 2015-04-03T00:00:00=2015-04-03T00:00:00.0 is undefined (second precision vs. nanosecond precision).
  • 2015-04-03T00:00:00.0=2015-04-03T00:00:00.0 is true.
  • 2015-04-03T00:00:00=2015-04-03T00:00:00 is true.
  • 2015-04=2015-05 is false.
  • 2015-04=2015-04 is true.
d1!=d2: Intransitive, symmetric relation.
  • ‘’True’’ iff d1=d2 is false.
  • ‘’False’’ iff d1=d2 is true.
  • ‘’Undefined’’ otherwise.

Examples:

  • 2015-04-03!=2015-04-03T00:00:00 is undefined.
  • 2015-04-03T00:00:00!=2015-04-03T00:00:00.0 is undefined.
  • 2015-04-03T00:00:00.0!=2015-04-03T00:00:00.0 is false.
  • 2015-04-03T00:00:00!=2015-04-03T00:00:00 is false.
  • 2015-04!=2015-05 is true.
  • 2015-04!=2015-04 is false.
d1>d2: Transitive, non-symmetric relation.

Semantics depend on the flavors of d1 and d2. If both are…

UTCDateTime
SemiCompleteDateTime
  • ‘’True’’ iff d1.ILB>d2.EUB is true or d1.ILB=d2.EUB is true.
  • ‘’False’’ iff d1.EUB<d2.ILB}} is true or {{{d1.EUB=d2.ILB is true.
  • ‘’Undefined’’ otherwise.

Examples:

  • 2015>2014 is true.
  • 2015-04>2014 is true.
  • 2015-01-01T20:15.00>2015-01-01T20:14 is true.
  • 2015-04>2015 is undefined.
  • 2015>2015-04 is undefined.
  • 2015-01-01T20:15>2015-01-01T20:15:15 is undefined.
  • 2014>2015 is false.
  • 2014-04>2015 is false.
  • 2014-01-01>2015-01-01T20:15:30 is false.
d1<d2: Transitive, non-symmetric relation.

Semantics depend on the flavors of d1 and d2. If both are…

UTCDateTime
  • ‘’True’’ iff the time of d1 is before the the time of d2 according to UTC
  • ‘’False’’ otherwise.
SemiCompleteDateTime
  • ‘’True’’ iff d1.EUB<d2.ILB is true or d1.EUB=d2.ILB is true.
  • ‘’False’’ iff d1.ILB>d2.EUB}} is true or {{{d1.ILB=d2.EUB is true.
  • ‘’Undefined’’ otherwise.

Examples:

  • 2014<2015 is true.
  • 2014-04<2015 is true.
  • 2014-01-01<2015-01-01T20:15:30 is true.
  • 2015-04<2015 is undefined.
  • 2015<2015-04 is undefined.
  • 2015-01-01T20:15<2015-01-01T20:15:15 is undefined.
  • 2015<2014 is false.
  • 2015-04<2014 is false.
  • 2015-01-01T20:15.00<2015-01-01T20:14 is false.
d1 IN d2: Transitive, non-symmetric relation.

Semantics depend on the flavors of d1 and d2. If both are…

SemiCompleteDateTime
  • ‘’True’’ iff (d1.ILB>d2.ILB is true or d1.ILB=d2.ILB is true) and (d1.EUB<d2.EUB is true or d1.EUB=d2.EUB is true).
  • ‘’False’’ otherwise.

Examples:

  • 2015-01-01 IN 2015 is true.
  • 2015-01-01T20:15:30 IN 2015-01-01 is true.
  • 2015-01-01T20:15:30 IN 2015-01-01T20:15:30 is true.
  • 2015 IN 2015-01-01 is false.
  • 2015-01-01 IN 2015-01-01T20:15:30 is false.
d1 NOT IN d2: Non-symmetric relation.

Semantics depend on the flavors of d1 and d2. If both are…

SemiCompleteDateTime
  • ‘’True’’ iff d1.ILB IN d2.ILB is false.
  • ‘’False’’ otherwise.

Examples:

  • 2015 NOT IN 2015-01-01 is true.
  • 2015-01-01 NOT IN 2015-01-01T20:15:30 is true.
  • 2015-01-01 NOT IN 2015 is false.
  • 2015-01-01T20:15:30 NOT IN 2015-01-01 is false.
  • 2015-01-01T20:15:30 NOT IN 2015-01-01T20:15:30 is false.
Note

These semantics follow a three-valued logic with ‘’true’’, ‘’false’’ and ‘’undefined’’ as truth values. Only ‘’true’’ is truth preserving. I.e. only those expressions which evaluate to ‘’true’’ pass the POV filter. FIND ... WHICH HAS A somedate=2015-01 only returns entities for which somedate=2015-01 is true. On the other hand, FIND ... WHICH DOESN'T HAVE A somedate=2015-01 returns entities for which somedate=2015-01 is false or undefined. Shortly put, NOT d1=d2 is not equivalent to d1!=d2. The latter assertion is stronger.

Omitting the Property or the Value

One doesn’t have to specify the property or the value at all. The following query filters the result set for entities which have any property with a value greater than val1.

FIND ename WHICH HAS A PROPERTY > val1

FIND ename . > val1

FIND ename.>val1

And for references…

FIND ename1 WHICH HAS A REFERENCE TO ename2

FIND ename1 WHICH REFERENCES ename2

FIND ename1 . -> ename2

FIND ename1.->ename2

The following query returns entities which have a pname1 property with any value.

FIND ename WHICH HAS A PROPERTY pname1

FIND ename WHICH HAS A pname1

FIND ename WITH pname1

FIND ename . pname1

FIND ename.pname1

TransactionFilter

Definition

sugar:: HAS BEEN | HAVE BEEN | HAD BEEN | WAS | IS |

negated_sugar:: HAS NOT BEEN | HASN'T BEEN | WAS NOT | WASN'T | IS NOT | ISN'T | HAVN'T BEEN | HAVE NOT BEEN | HADN'T BEEN | HAD NOT BEEN

by_clause:: BY (ME | username | SOMEONE ELSE (BUT ME)? | SOMEONE ELSE BUT username)

date:: A date string of the form YYYY-MM-DD

datetime:: A datetime string of the form YYYY-MM-DD hh:mm:ss

time_clause:: ON ($date|$datetime) Here is plenty of room for more syntactic sugar, e.g. a TODAY keyword, and more funcionality, e.g. ranges.

FIND ename WHICH ($sugar|$negated_sugar)? (NOT)? (CREATED|INSERTED|UPDATED|DELETED) (by_clause time_clause?| time_clause by_clause?)

Examples

FIND ename WHICH HAS BEEN CREATED BY ME ON 2014-12-24

FIND ename WHICH HAS BEEN CREATED BY SOMEONE ELSE ON 2014-12-24

FIND ename WHICH HAS BEEN CREATED BY erwin ON 2014-12-24

FIND ename WHICH HAS BEEN CREATED BY SOMEONE ELSE BUT erwin ON 2014-12-24

FIND ename WHICH HAS BEEN CREATED BY erwin

FIND ename . CREATED BY erwin ON

File Location

Search for file objects by their location:

FIND FILE WHICH IS STORED AT a/certain/path/

Wildcards

STORED AT can be used with wildcards similar to unix wildcards.

  • * matches any characters or none at all, but not the directory separator /
  • ** matches any character or none at all.
  • A leading * is a shortcut for /**
  • Asterisks directly between two other asterisks are ignored: *** is the same as **.
  • Escape character: \ (E.g. \\ is a literal backslash. \* is a literal star. But \\* is a literal backslash followed by a wildcard.)

Examples:

Find any files ending with .acq: FIND FILE WHICH IS STORED AT *.acq or FIND FILE WHICH IS STORED AT **.acq or FIND FILE WHICH IS STORED AT /**.acq

Find files stored one directory below /data/, ending with .acq: FIND FILE WHICH IS STORED AT /data/*/*.acq

Find files stored in /data/, ending with .acq: FIND FILE WHICH IS STORED AT /data/*.acq

Find files stored in a directory at any depth in the tree below /data/, ending with .acq: FIND FILE WHICH IS STORED AT /data/**.acq

Find any file in a directory which begins with 2016-02: FIND FILE WHICH IS STORED AT */2016-02*/*

Back References

The back reference filters for entities that are referenced by another entity. The following query returns entities of the type ename1 which are referenced by ename2 entities via the reference property pname1.

  • FIND ename1 WHICH IS REFERENCED BY ename2 AS A pname1
  • FIND ename1 WITH @ ename2 / pname1
  • FIND ename1 . @ ename2 / pname1

One may omit the property specification:

  • FIND ename1 WHICH IS REFERENCED BY ename2
  • FIND ename1 WHICH HAS A PROPERTY @ ename2
  • FIND ename1 WITH @ ename2
  • FIND ename1 . @ ename2

Combining Filters with Propositional Logic

Any result set can be filtered by logically combining POV filters or back reference filters:

Conjunction (AND)

  • FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A PROPERTY pname2=val2 AND A PROPERTY...
  • FIND ename1 WHICH HAS A PROPERTY pname1=val1 AND A pname2=val2 AND ...
  • FIND ename1 . pname1=val1 & pname2=val2 & ...

Disjunction (OR)

  • FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A PROPERTY pname2=val2 Or A PROPERTY...
  • FIND ename1 WHICH HAS A PROPERTY pname1=val1 OR A pname2=val2 OR ...
  • FIND ename1 . pname1=val1 | pname2=val2 | ...

Negation (NOT)

  • FIND ename1 WHICH DOES NOT HAVE A PROPERTY pname1=val1
  • FIND ename1 WHICH DOESN'T HAVE A pname1=val1
  • FIND ename1 . NOT pname2=val2
  • FIND ename1 . !pname2=val2

… and combinations with parentheses

  • FIND ename1 WHICH HAS A pname1=val1 AND DOESN'T HAVE A pname2<val2 AND ((WHICH HAS A pname3=val3 AND A pname4=val4) OR DOES NOT HAVE A (pname5=val5 AND pname6=val6))
  • FIND ename1 . pname1=val1 & !pname2<val2 & ((pname3=val3 & pname4=val4) | !(pname5=val5 & pname6=val6))
  • FIND ename1.pname1=val1&!pname2<val2&((pname3=val3&pname4=val4)|!(pname5=val5&pname6=val6))

A Few Important Expressions

  • A:: The indistinct article. This is only syntactic suger. Equivalent expressions: A, AN
  • AND:: The logical and. Equivalent expressions: AND, &
  • FIND:: The beginning of the query.
  • NOT:: The logical negation. Equivalent expressions: NOT, DOESN'T HAVE A PROPERTY, DOES NOT HAVE A PROPERTY, DOESN'T HAVE A, DOES NOT HAVE A, DOES NOT, DOESN'T, IS NOT, ISN'T, !
  • OR:: The logical or. Equivalent expressions: OR, |
  • RECORD,RECORDTYPE,FILE,PROPERTY:: Role expression for restricting the result set to a specific role.
  • WHICH:: The marker for the beginning of the filters. Equivalent expressions: WHICH, WHICH HAS A, WHICH HAS A PROPERTY, WHERE, WITH, .
  • REFERENCE:: This one is tricky: REFERENCE TO expresses a the state of having a reference property. REFERENCED BY expresses the state of being referenced by another entity.
  • COUNT:: COUNT works like FIND but doesn’t return the entities.

Select Queries

In contrast to FIND queries, which always return the complete entity, there are SELECT queries which only return the entity with only those properties which are specified in the query. The syntax is very similar to FIND queries - just replace the FIND by SELECT <comma separated list of selectors> FROM:

SELECT p1, p2, p3 FROM Record ename

However, the SELECT query can also return properties of referenced entities and thereby are a means of joining entities together and return a custom view or projection:

SELECT Conductor.Last Name FROM Experiment

would return the conductor’s last name, when Conductor is a reference property of Experiment and Last Name is a property of the Conductor records.

Selectors

Selectors are strings of entity names which are separated by . (dot). E.g. Conductor.Last Name or Conductor.Address or even Experiment.Conductor.Last name. Selectors in a SELECT queries are separated by , (comma). E.g. Conductor.First Name, Conductor.Last Name.

Evaluation of Selectors

The query will return all those properties which have the same name as specified by the selector (case-insensitive). However, SELECT queries are also capable of subtyping in the selectors:

SELECT Person FROM Experiment would return all Person properties but all Conductors as well, if Conductor is a child of Person.

Note: When a property responsible with data type Person exists, the above SELECT statement would not include records that use this property (since responsible is not a child of Person).

Versioning

Since Caosdb 0.2 entities are optionally version controlled. The query language will be extended to include versioning in the future. A current minimal implementation introduces the ANY VERSION OF modifier which can be used to return all matching versions in the results of COUNT, FIND, and SELECT queries.

Example

  • FIND ANY VERSION OF RECORD WITH pname=value returns the all past and present versions of records with pname =  value.

Scope and current limitations

  • The ANY VERSION OF modifier currently the only expression for taking the versioning into account when using the query language.
  • Subproperties are not supported yet, e.g. FIND ANY VERSION OF ENTITY WHICH IS REFERENCED BY ename WITH .... This applies to all cases where you specify properties of referenced entities or referencing entities.

Future

  • Add (LATEST|LAST|OLDEST|NEWEST|FIRST) VERSION OF modifiers.
  • Add (ANY|LATEST|LAST|OLDEST|NEWEST|FIRST) VERSION (BEFORE|AFTER) (<timestamp>|<transaction id>|<entity@version>) OF modifier.
  • Add support for subproperties, e.g. FIND ANY VERSION OF ENTITY WHICH IS REFERENCED BY ename WITH ....

Future

  • Sub Queries (or Sub Properties): FIND ename WHICH HAS A pname WHICH HAS A subpname=val. This is like: FIND AN experiment WHICH HAS A camera WHICH HAS A 'serial number'= 1234567890
  • More Logic, especially ANY, ALL, NONE, and SUCH THAT key words (and equivalents) for logical quantisation: FIND ename1 SUCH THAT ALL ename2 WHICH HAVE A REFERENCE TO ename1 HAVE A pname=val. This is like FIND experiment SUCH THAT ALL person WHICH ARE REFERENCED BY THIS experiment AS conductor HAVE AN 'academic title'=professor.