Sorted Hash Memtable
A new memtable type that improves seek and read while writing performance.
Last updated
Was this helpful?
A new memtable type that improves seek and read while writing performance.
Last updated
Was this helpful?
There are currently 5 representation options, with the default being SkipList.
We designed a new memtable representation algorithm that allows parallel writes without requiring the insertion of synchronization locks, also improving read and seek operations.
This new memtable is available in version 2.0.0 and later.
The Speedb memtable combines hash and sorted vectors (which are created in background thread).
The default hash size is 1 Million buckets, and the maximum size for each vector is 10K keys. (this can be tuned as part of column family creation properties).
The goal of the hash is to achieve direct access on read and write operations (O(1)).
The goal of the sorted vector is to improve seek operations, taking into consideration that writes are still progressing.
A new write is added to the hash and atomic write list.
Since the default hash size is 1 Million buckets, the possibility of more than 1 write on the same bucket is reduced (not including an overwrite operation). This gives us direct access without synchronization.
Elements are inserted into the write list atomically (the anchor pointer is an atomic exchange). The write list is not sorted - this occurs in the background.
In the background, the sorted thread divides the write list into sorted vectors if at least one of the following conditions occurs:
The vector size limit is reached (default 10,000)
A seek operation occurs
The vector will only be valid if it is sorted.
Read is done by direct access to the match memtable hash bucket. No synchronized lock is required.
The seek request signals the sorted thread to divide a new sorted vector by adding a special entry to the write list.
The seek request waits for the cutoff vector to be sorted. Each seek iterator has its own binary heap of shared sorted vectors relevant to the seek time creation, so there's no dependency on other iterators' progress.
When there are many seek operations, a situation could arise with many small sorted vectors.
In this case, the sorted thread's responsibility is to merge small continuous sorted vectors, so the seek iterator will be created from a small number of sorted vectors.
As mentioned above, the creation of this memtable is expensive. As such, we've created a CF background thread that creates a standby memtable, which allows us to switch memtables without wasting time.
High performance in a heavy write workload
A write request during a seek request is not blocked.
Sorted vectors are used on each seek operation without being copied
If no write occurs between two seek operations, both seek iterators will have the same sorted vector
The memtable constructor size is about 8mg (depends on the bucket size)
memtable creation is expensive (creating 1000000 buckets takes time)
The new sorted hash memtable was tested with db_bench, using the configuration below.
The following benchmark results compare the performance of Rocksdb v7.2.2 with skiplist compared to Speedb with the new sorted hash memtable code.
Note: More details about how we test performance you can read in the Performance Testing chapter.
Configuration:
Number of objects: 1Billion
Value size: 1KB
Write buffer size 64MB (default)
Number of Threads: 4
Max number of write buffers: 4
Number of CF: 1 (default)
Number of CPU cores: 16
Compression mode: none
*Up to 155% improvement in overwrite workload
Up to 68% improvement in 50% random read workload
Up to 15% improvement in random seek workload
./db_bench--memtablerep=speedb.HashSpdRepFactory It will use the default bucket size = 1000000 In order to change the default bucket size, use the following syntax: ./db_bench --memtablerep=speedb.HashSpdRepFactory:1000 (where 1000 is the bucket size)
This should be set in the DB Options object: