Write Flow
This feature is experimental in v2.4.0
The SPDB Write Flow introduces a significant improvement in the performance of data writing in RocksDB. The SPDB Write Flow is a new approach to writing data in RocksDB that aims to reduce the amount of time spent holding global database mutexes and increase the level of parallelism for IO writes. This new approach is implemented in a dedicated SPDB write thread, which handles the memtable switch, flush requests, WAL switch, WBM threshold limit, write delay, and WAL trim.
In the previous write flow, each write was inserted into a matching write queue, and a relevant thread queue picked a write group leader. Other writes were then connected to this leader as group members, and the parallel work was done until the limit was reached. The leader was responsible for WAL writing and waiting for all group members to complete their task. The version progressed when the group completed its task. However, this approach had several disadvantages, such as the blocking of new writes due to the memtable switch status check and the serial execution caused by the many DB mutex points.
The new SPDB Write Flow algorithm consists of a thread dedicated to handling the write flow. This thread wakes up at a specified time and handles quiesce, which includes the memtable switch, flush requested, WAL switch, WBM threshold limit, write delay, and WAL trim. The thread then proceeds to handle the batch writes by inserting them into two containers, one for the current batch writes and one for the new batch writes that are being inserted.
Each batch write is responsible for writing to the memtable (if needed) and waiting for the complete batch group. The batch group leader is, by default, the first one unless new batches are inserted in parallel, and the one batch group limit size is not reached. The batch leader is responsible for the merged group WAL write and writing to WAL with or without synchronization. It is important to note that WAL writes can be completed before all the container memtable writes are completed, which is okay. However, we must wait for the memetable writes to be completed to progress the version. The flow handles merge/none memtable writes/none WAL writes in the same container, allowing for a fluent flow. Still, writes to the WAL should consider that we build the merged WAL batch (seq number).
The table below summarize the differences between RocksDB write flow and the Speedb write flow:
Text | RocksDB | Speedb |
---|---|---|
General | DBmutex on every write batch | RW lock - write lock when needed |
Writes to the WAL | Append, sync writes | Writing to a specific address space, parallel writes |
Checking the triggers | Part of the DB mutex on every write | Background, write lock only when needed |
Switch memtable/switch WAL/Trim WAL
| Part of the DB mutex on every write | Background, without any locks |
Writes rollback | Don’t needed (since writing first to the WAL) | Needed when write to the memtable failed |
To summaries, the new write flow enables parallel writes by the following changes:
- 1.Speedb changed the DB mutex to read/write-lock
- 2.Speedb write flow allows parallel writes to the memtable and the WAL. Ack is sent only after the data is written to both, but the writes are no longer serial.
- 3.The wall in the previous write flow is using append, meaning only single write is allowed at a time. Speedb's new write flow changed the way data is written to the WAL and now writes to a specific address in the file, a fact that enables the parallel writes to the WAL.
In the 2.4 release, the write flow feature is experimental. It currently consumes slightly more memory than usual and this will be fixed in the next release.
A comparison of RocksDB 7.7, Speedb 2.3 and Speedb 2.4 +write flow was performed.
This was tested with db_bench, using the configuration below:
- Number of objects: 1Billion
- Value size: 64b
- Write buffer size: 268MB
- Number of Threads: 50
- Number of CPU cores: 16
The graph illustrates the dramatic increase in writing performance when using small objects:

Last modified 2d ago