Fileserver

Info

There are several ways to utilize the file server component of CaosDB. It is possible to upload a file or a whole folder including subfolders via HTTP and the drop off box. It is possible to download a file via HTTP identified by its ID or by its path in the internal file system. Furthermore, it is possible to get the files metadata via HTTP as an xml.

File upload

Drop off box

The drop off box is a directory on the CaosDB server’s local file system, specified in the server.conf file in the server’s basepath (something like ~/CaosDB/server/server.conf). The key in the server.conf is called dropoffbox. Since the drop off box directory is writable for all, users can push their files or complete folders via a mv or a cp (recommended!) in that folder. The server deletes files older than their maximum lifetime (24 hours by default, specified in server.conf). But within their lifetime a user can prompt the server to pick up the file (or folder) from the drop off box in order to transfer it to the internal file system.

Now, the user may send a pick up request to POST http://host:port/mpidsserver/FilesDropOff with a similar body:

    <Post>
      <File pickup="$path_dropoffbox" destination="$path_filesystem" description="$description" generator="$generator"/>
      ...
    </Post>

where

  • $path_dropoffbox is the actual relative path of the dropped file or folder in the DropOffBox,

  • $path_filesystem is the designated relative path of that object in the internal file system,

  • $description is a description of the file to be uploaded,

  • $generator is the tool or client used for pushing this file.

After a successful pick up the server will return:

    <Response>
      <File description="$description" path="$path" id="$id" checksum="$checksum" size="$size" />
      ...
    </Response>

where

  • $id is the new generated id of that file and

  • $path is the path of the submitted file or folder relative to the file system’s root.

HTTP upload stream

Files

There is an example on file upload using cURL described in detail in the curl section of this documentation.

File upload via HTTP is implemented in a rfc1867 consistent way. This is a de-facto standard that defines a file upload as a part of an HTML form submission. This concept shall not be amplified here. But it has to be noticed that this protocol is not designed for uploads of complete structured folders. Therefore the CaosDB file components have to impose that structure on the upload protocol.

CaosDB’s file upload resource does exclusively accept POST requests of MIME media type multipart/form-data. The first part of each POST body is expected to be a form-data text field, containing information about the files to be uploaded. It has to meet the following requirements:

  • Content-type: text/plain; charset=UTF-8

  • Content-disposition: form-data; name="FileRepresentation"

If the content type of the first part is not text/plain; charset=UTF-8 the server will return error 418. If the body is not actually encoded in UTF-8 the servers behaviour is not defined. If the field name of the first part is not FileRepresentation the server will return error 419.

The body of that first part is to be an xml document of the following form:

    <Post>
      <File upload="$temporary_identifier" destination="$path_filesystem" description="$description" checksum="$checksum" size="$size"/>
      ...
    </Post>

where

  • $temporary_identifier is simply a arbitrary name, which will be used to identify this <File> tag with a uploaded file in the other form-data parts.

  • $path_filesystem is the designated relative path of that object in the internal file system,

  • $description is a description of the file to be uploaded,

  • $size is the files size in bytes,

  • $checksum is a SHA-512 Hash of the file.

The other parts (which must be at least one) may have any appropriate media type. application/octet-stream is a good choice for it is the default for any upload file according to rfc1867. Their field name may be any name meeting the requirements of rfc1867 (most notably they must be unique within this POST). But in order to identify the corresponding xml file representation of each file the filename parameter of the content-disposition header has to be set to the proper $temporary_identifier. The Content-disposition type must be form-data:

  • Content-disposition: form-data; name="$any_name"; filename="$temporary_identifier"

Finally the body of these parts have to contain the file encoded in the proper Content-Transfer-Encoding.

If a file part has a filename parameter which doesn’t occur in the xml file representation the server will return error 420. The file will not be stored anywhere. If an xml file representation has no corresponding file to be uploaded (i.e. there is no part with the same filename) the server will return error 421. Some other error might occur if the checksum, the size, the destination etc. are somehow corrupted.

Folders

Uploading folders works in a similar way. The first part of the multipart/form-data document is to be the representation of the folders:

    <Post>
      <File upload="$temporary_identifier" destination="$path_filesystem" description="$description" checksum="$checksum" size="$size"/>
      ...
    </Post>

The root folder is represented by a part which has a header of the form:

  • Content-disposition: form-data; name="$any_name"; filename="$temporary_identifier/" The slash at the end of the filename indicates that this is a folder, not a file. Consequently, the body of this part will be ignored and should be empty. Any file with the name $filename in the root folder is represented by a part which has a header of the form:

  • Content-disposition: form-data; name="$any_name"; filename="$temporary_identifier/$filename" Any sub folder with the name $subfolder is represented by a part which has a header of the form:

  • Content-disposition: form-data; name="$any_name"; filename="$temporary_identifier/$subfolder/"

Likewise, a complete directory tree can be transfered by appending the structure to the filename header field.

Example: Given the structure

    rootfolder/
    rootfolder/file1
    rootfolder/subfolder/
    rootfolder/subfolder/file2

an upload document would have the following form:

    ... (HTTP Header)
    Content-type: multipart/form-data, boundary=AaB03x
    
    --AaB03x
    content-disposition: form-data; name="FileRepresentation"
    
    <Post>
      <File upload="tmp1234" destination="$path_filesystem" description="$description" checksum="$checksum" size="$size"/>
    </Post>
    
    --AaB03x
    content-disposition: form-data; name="random_name1"; filename="temp1234/"
    
    --AaB03x
    content-disposition: form-data; name="random_name1"; filename="temp1234/file1"
    
    Hello, world! This is file1.
    
    --AaB03x
    content-disposition: form-data; name="random_name1"; filename="temp1234/subfolder/"
    
    --AaB03x
    content-disposition: form-data; name="random_name1"; filename="temp1234/subfolder/file2"
    
    Hello, world! This is file2.
    
    --AaB03x--

(Timm 2014-06-17: to be continued)

Consistency checks

To start a consistency check on either the complete file system or a subdirectory, add the fileStorageConsistency flag to a retrieve query. In a GET request, simply add ...?fileStorageConsistency=<OPTIONS> to the URL. Possible options are (currently, only one of them?):

  • -t <TIMEOUT> :: The timeout for the query (in seconds?)

  • -c <TESTCASE> :: To trigger internal test cases.

  • <PATH> :: The path in the file system where searching files should start. If omitted or \, the full file system will be checked.

One example, using curl and an existing cookie: curl -X GET -G -b cookie.txt -d "fileStorageConsistency=Analysis/VideoAnalysis/masks/" --insecure "https://<SERVER>/Entity/12345"