Documentation
Go directly to the API documentation or to the publications.
Introduction
Usually, database systems either overwrite data in place or do a copy-on-write operation followed by removing the outdated data. The latter may be some time later from a background process. Data, however, naturally evolves. It is often of great value to keep the history of our data. For instance, we might record an employee’s payroll on the first of March in 2019. Let’s say it’s 5000€ / month. Then as of the fifteenth of April, we noticed that the recorded payroll was wrong and corrected it to 5300€. Now, what’s the answer to what the salary was on March, first in 2019? Database Systems, which only preserve the most recent version, don’t even know that the payroll wasn’t right. Our answer to this question depends on the most authoritative source: the record or reality. The fact that they disagree effectively splits this payroll event into two tracks of time, rendering neither source entirely accurate. Temporal database systems such as SirixDB help answer questions such as these easily. We provide at all times the transaction time, which SirixDB sets, once a transaction commits (when is a fact valid in the database / the record). Application or valid time has to be set by the application itself (when is a fact valid in the real world/reality?).
Data Audits
Thus, one usage scenario for SirixDB is data auditing. Unlike other database systems, it is designed from the ground up to retain old data. It keeps track of past revisions in a specialized index structure for (bi)temporal data. SirixDB uses a novel sliding snapshot versioning algorithm to version data pages by default. It balances read- and write performance while avoiding any peaks. SirixDB is very space-efficient. Depending on the versioning algorithm, it only copies changed records plus possibly a few more during writes. Thus, SirixDB, for instance, usually does not copy a whole database page if only a single record has changed. Instead, SirixDB syncs page fragments to persistent storage during a commit. We can drop the requirement to cluster related nodes physically. Sequentially accessing physically dispersed nodes on flash-based storage will be in the same order of magnitude as accessing physically clustered nodes on a disk. SirixDB never allows to override or delete old revisions. A single read-write transaction appends data at all times. You can revert a resource to a specific revision and commit changes based on this version. All revisions in-between will be accessible for data audits. Thus, SirixDB can support answering questions such as who changed what and when.
Time Travel queries
Data audits are about how specific records have changed. Time Travel queries can answer questions like these. However, they also allow reconstructing records as they looked at a particular time or during a specific period. They also help us to analyze how the whole document changed over time. We might want to analyze the past to predict the future. Through additional temporal JSONiq functions and XPath axes as well as XQuery functions, SirixDB encourages you to look into how your data has evolved.
Fixing Application or Human Errors
For all of the use cases we mentioned earlier: We can revert to a specific time when everything was in a known good state and commit the revision again. Or we might select a particular record, correct the error and commit a new revision.
SirixDB
SirixDB is a storage system which brings versioning to a sub-file granular level while taking full advantage of flash-based drives such as SSDs. As such, per revision as well as per page, deltas are stored. Time-complexity for retrieval of records and the storage are logarithmic (O(log n)
). Space complexity is linear (O(n)
). Currently, we provide several APIs which are layered. A very low-level page-API, which handles the storage and retrieval of records on a per page-fragment level. A transactional cursor-based API to store and navigate through records (currently XML and JSON nodes) on top. A DOM-alike node layer for simple in-memory processing of these nodes, which Brackit, a sophisticated XQuery processor, uses. And last but not least, a RESTful asynchronous HTTP-API. SirixDB provides
- The current revision of the resource or any subset thereof
- The full revision history of the resource or any subset thereof
- The full modification history of the resource or any subset thereof
SirixDB not only supports all XPath axes to query a resource in one revision but also temporal axes, which facilitate navigation in time. A resource’s transactional cursor can be started by specifying a specific revision number or by a given point in time. The latter starts a transaction on the revision number which was committed closest to the given timestamp.
You may find a quick overview of the main features useful.
API Documentation
We provide several APIs to interact with SirixDB.
- The transactional cursor API is a powerful low-level API.
- On top of this API, we built a Brackit.io binding to provide the ability to use SirixDB with a more DOM-alike API with in-memory nodes and an JSONiq API.
- We provide a powerful, asynchronous, non-blocking RESTful-API to interact with a SirixDB HTTP server. Authorization is done via Keycloak.
Publications
Articles published on Baeldung:
Articles published on Medium:
- Asynchronous, Temporal REST With Vert.x, Keycloak and Kotlin
- Pushing Database Versioning to Its Limits by Means of a Novel Sliding Snapshot Algorithm and Efficient Time Travel Queries
- How we built an asynchronous, temporal RESTful API based on Vert.x, Keycloak and Kotlin/Coroutines for Sirix.io (Open Source)
- Why Copy-on-Write Semantics and Node-Level-Versioning are Key to Efficient Snapshots
SirixDB was forked from Treetank (which is not maintained anymore), but it was subject to some publications as a university project.
A lot of the ideas still are based on the Ph.D. thesis of Marc Kramis: Evolutionary Tree-Structured Storage: Concepts, Interfaces, and Applications
As well as from Sebastian Graft’s work and thesis: Flexible Secure Cloud Storage
Other publications include:
- Versatile Key Management for Secure Cloud Storage (DISCCO12)
- A legal and technical perspective on secure cloud Storage (DFN Forum12)
- A Secure Cloud Gateway based upon XML and Web Services (ECOWS11, PhD Symposium)
- Treetank, Designing a Versioned XML Storage (XMLPrague11)
- Hecate, Managing Authorization with RESTful XML (WS-REST11)
- Rolling Boles, Optimal XML Structure Integrity for Updating Operations (WWW11, Poster)
- JAX-RX - Unified REST Access to XML Resources (TechReport10)
- Integrity Assurance for RESTful XML (WISM100)
- Temporal REST, How to really exploit XML (IADIS WWW/Internet08)
- Distributing XML with focus on parallel evaluation (DBISP2P08)