Architecture
SirixDB is a temporal, append-only database that never overwrites data. Every transaction commit creates an immutable snapshot. It uses copy-on-write with path copying to share unchanged data between revisions, keeping storage minimal. The storage engine is log-structured and optimized for flash drives — sequential writes only, no WAL, no compaction.
Node-Based Document Model
Unlike document databases that store JSON as opaque blobs, SirixDB decomposes each document into a tree of fine-grained nodes. Each node has a stable 64-bit nodeKey that never changes across revisions. Nodes are linked via parent, child, and sibling pointers, enabling O(1) navigation in any direction.
Field names are stored once in an in-memory dictionary and referenced by 32-bit keys, saving space when the same field appears thousands of times.
Each node type maps directly to a JSON construct: OBJECT, ARRAY, OBJECT_KEY, STRING_VALUE, NUMBER_VALUE, BOOLEAN_VALUE, and NULL_VALUE. Navigation between nodes is O(1) via stored pointers — no scanning required.
Copy-on-Write with Path Copying
When a transaction modifies data, SirixDB doesn’t rewrite existing pages. Instead, it copies only the modified page and its ancestor path to the root. All unchanged pages are shared between the old and new revision via pointers. This is the same principle used in persistent data structures and ZFS.
This means a revision that modifies a single record only writes the modified page plus its ancestor path — typically 3-4 pages. A 10 GB database with 1,000 revisions and 0.1% change each requires roughly 20 GB total, not 10 TB.
Physically, each resource is stored in two append-only files: a data file for page content (IndirectPages, RecordPages) and a revisions file for revision metadata (RevisionRootPages, UberPage references). The UberPage is conceptually at the top — it references the latest RevisionRootPage — and is always written last as an atomic operation. Even if a crash occurs mid-commit, the previous valid state is preserved.
Page Structure
Each resource is organized as a trie of pages. The RevisionRootPage is the entry point for a single revision, branching into subtrees for data, indexes, and metadata.
UberPage — The root entry point. Written last during a commit as an atomic operation. Contains a reference to the IndirectPage tree that addresses all revisions.
IndirectPages — Fan-out nodes (512 references each) that form the trie structure. Borrowed from ZFS, they store checksums in parent references for data integrity. Unused slots are tracked with a bitset for compact storage.
RevisionRootPage — Entry point for a single revision. Stores the author, timestamp, and optional commit message. Branches to the data trie, path summary, name dictionary, and index pages.
RecordPages — Leaf pages storing up to 1024 nodes each. These are the pages that get versioned by the sliding snapshot algorithm.
Sliding Snapshot Versioning
SirixDB doesn’t just copy entire pages on every change. It versions RecordPages at a sub-page level, storing only changed records. The sliding snapshot algorithm, developed by Marc Kramis, avoids the trade-off between read performance and write amplification that plagues traditional approaches.
| Strategy | Reads to reconstruct | Write cost per revision | Write spikes? |
|---|---|---|---|
| Full | 1 page | Entire page (all records) | No |
| Incremental | Up to all revisions | Only changed records | Yes (periodic full dump) |
| Differential | 2 pages | All changes since last full dump | Yes (growing deltas) |
| Sliding Snapshot | At most N fragments | Changed + expired records | No |
The sliding snapshot uses a window of size N (typically 3-5). Changed records are always written. Records older than N revisions that haven’t been written are carried forward. This guarantees that at most N page fragments need to be read to reconstruct any page — regardless of total revision count.
For details, see Marc Kramis’s thesis: Evolutionary Tree-Structured Storage: Concepts, Interfaces, and Applications.
Secondary Indexes
SirixDB supports three types of user-defined secondary indexes, all stored in the same versioned trie structure as the data. Indexes are part of the transaction and version with the data — the index at revision 42 always matches the data at revision 42.
Path Summary
Every resource maintains a compact path summary — a trie of all unique paths in the document. Each unique path gets a path class reference (PCR), a stable integer ID. Nodes in the main data tree reference their PCR, enabling efficient path-based lookups.
Index Types
| Index | Key | Use case |
|---|---|---|
| Name | Field name hash → node keys | Find all nodes named "email" regardless of path |
| Path | PCR → node keys | Find all nodes at path /users/[]/name |
| CAS | (PCR + typed value) → node keys | Find all users where age > 30 on path /users/[]/age |
CAS (content-and-structure) indexes are the most selective — they index both the path and the typed value, enabling efficient range queries. All indexes are stored in balanced binary search trees (Red-Black trees) within the same versioned page structure.
For the JSONiq API to create and query indexes, see the Function Reference.
Further Reading
- Evolutionary Tree-Structured Storage — Marc Kramis’s thesis describing the sliding snapshot algorithm
- SirixDB on GitHub — source code and detailed
docs/ARCHITECTURE.md - REST API documentation — HTTP interface for SirixDB
- JSONiq API — query language guide