Documentation

Edit document on Github

Go directly to the API documentation or to the publications.

Introduction

Usually, database systems either overwrite data in-place or do a copy-on-write operation followed by the removal of the outdated data. The latter may be some time later from a background process. Data, however, naturally evolves. It is often of great value to keep the history of our data. We, for instance, might record the payroll of an employee on the first of March in 2019. Let’s say it’s 5000€ / month. Then as of the fifteenth of April, we notice, that the recorded payroll was wrong and correct it to 5300€. Now, what’s the answer to what the salary was on March, first in 2019? Database Systems, which only preserve the most recent version, don’t even know that the payroll wasn’t right. Our answer to this question depends on what source we consider most authoritative: the record or reality? The fact that they disagree effectively splits this payroll event into two tracks of time, rendering neither source entirely accurate. Temporal database systems such as SirixDB help answer questions such as these easily. We provide at all times the transaction time, which SirixDB sets, once a transaction commits (when is a fact valid in the database / the record). Application or valid time has to be set by the application itself (when is a fact valid in the real world/reality?).

Data Audits

Thus, one usage scenario for SirixDB is data auditing. Unlike other database systems, it is designed from the ground up to retain old data. It keeps track of past revisions in a specialized index structure for (bi)temporal data. SirixDB uses a novel sliding snapshot versioning algorithm by default to version data pages. It balances read- and write-performance while avoiding any peaks. SirixDB is very space-efficient. Depending on the versioning algorithm, it only copies changed records plus possibly a few more during writes. Thus, SirixDB, for instance, usually does not copy a whole database page if only a single record has changed. Instead, SirixDB syncs page-fragments to persistent storage during a commit. We can drop the requirement to cluster related nodes physically. Sequentially accessing physically dispersed nodes on flash-based storage will be in the same order of magnitude as accessing physically clustered nodes on a disk. SirixDB never allows to override or delete old revisions. A single read-write transaction appends data at all times. For sure, you can revert a resource to a specific revision and commit changes based on this version. All revisions in-between will be accessible for data audits. Thus, SirixDB can support answering questions such as who changed what and when.

Time Travel queries

Data audits are about how specific records have changed. Time Travel queries can answer questions like these. However, they also allow reconstructing records as they looked at a particular time or during a specific period. They also help us to analyze how the whole document changed over time. We might want to analyze the past to predict the future. Through additional temporal XPath axes and XQuery functions, SirixDB encourages you to look into how your data has evolved.

Fixing Application or Human Errors

For all of the use-cases we mentioned earlier: We can revert to a specific point in time where everything was in a known good state and commit the revision again. Or we might select a particular record, correct the error and commit a new revision.

SirixDB

SirixDB is a storage system, which brings versioning to a sub-file granular level while taking full advantage of flash-based drives as SSDs. As such, per revision as well as per page deltas are stored. Time-complexity for retrieval of records and the storage are logarithmic (O(log n)). Space complexity is linear (O(n)). Currently, we provide several APIs which are layered. A very low-level page-API, which handles the storage and retrieval of records on a per page-fragment level. A transactional cursor-based API to store and navigate through records (currently XML as well as JSON nodes) on top. A DOM-alike node layer for simple in-memory processing of these nodes, which is used by Brackit, a sophisticated XQuery processor. And last but not least a RESTful asynchronous HTTP-API. SirixDB provides

  1. The current revision of the resource or any subset thereof
  2. The full revision history of the resource or any subset thereof
  3. The full modification history of the resource or any subset thereof

SirixDB not only supports all XPath axes to query a resource in one revision but also temporal axes which facilitate navigation in time. A transactional cursor on a resource can be started either by specifying a specific revision number or by a given point in time. The latter starts a transaction on the revision number which was committed closest to the given timestamp.

sunburstview moves

You may find a quick overview about the main features useful.

API Documentation

We provide several APIs to interact with SirixDB.

  1. The transactional cursor API is a powerful low-level API.
  2. On top of this API we built a Brackit.org binding to provide the ability to use SirixDB with a more DOM-alike API with in-memory nodes and an XQuery API.
  3. We provide a powerful, asynchronous, non-blocking RESTful-API to interact with a SirixDB HTTP-server. Authorization is done via Keycloak.

Publications

Articles published on Baeldung:

Articles published on Medium:

SirixDB was forked from Treetank (which is not maintained anymore), but as a university project, it was subject to some publications.

A lot of the ideas still are based on the Ph.D. thesis of Marc Kramis: Evolutionary Tree-Structured Storage: Concepts, Interfaces, and Applications

As well as from Sebastian Graft’s work and thesis: Flexible Secure Cloud Storage

Other publications include: