LogoLogo
Back to Speedb.io⭐ GitHubDiscord
  • 👋About
    • Speedb Use Cases
    • Speedb Communication Channels
    • Release Cadence
    • Releases
    • Roadmap
  • 💻Getting started
    • Speedb Quick Start Example
    • Dependencies
    • How to Compile Speedb
    • Drop-in Replacement
    • Use prebuilt binaries
    • How to contribute
      • Contribute code
      • Feature request process
      • Submit a pull request
      • Add or update documentation
      • Report bugs and other issues
      • Help with new and ongoing feature development
    • Kafka Streams: How to use Speedb instead of RocksDB?
  • ✨Speedb Features
    • Memory Tracking
    • Speedb Tuning Function
    • Table Pinning Policy
    • Snapshot Optimization
    • On Thread Start Callback
    • Write Flow
    • Global Delayed write
    • Live Configuration Changes
    • Report Index Size per Column Family
    • Proactive Flushing
    • Sorted Hash Memtable
    • Paired Bloom Filter
  • ➕Enhancements
    • Range Delete Improvement
    • Dynamic Delayed Writes
    • Reduce switch memtable latency
  • 🛠️Tools
    • Log Parser
    • DB_bench: Groups
    • Beezcli Tool
  • 🔦RocksDB Basics
  • 📈Performance testing
Powered by GitBook
On this page
  • Overview
  • Algorithm
  • Write Operation
  • Read Operation:
  • Seek Operation:
  • Preparing the memtable
  • Advantages
  • Disadvantages
  • Performance Results
  • Usage

Was this helpful?

Edit on GitHub
  1. Speedb Features

Sorted Hash Memtable

A new memtable type that improves seek and read while writing performance.

PreviousProactive FlushingNextPaired Bloom Filter

Last updated 1 year ago

Was this helpful?

Overview

There are currently 5 representation options, with the default being SkipList.

We designed a new memtable representation algorithm that allows parallel writes without requiring the insertion of synchronization locks, also improving read and seek operations.

This new memtable is available in version 2.0.0 and later.

Algorithm

The Speedb memtable combines hash and sorted vectors (which are created in background thread).

The default hash size is 1 Million buckets, and the maximum size for each vector is 10K keys. (this can be tuned as part of column family creation properties).

The goal of the hash is to achieve direct access on read and write operations (O(1)).

The goal of the sorted vector is to improve seek operations, taking into consideration that writes are still progressing.

A new write is added to the hash and atomic write list.

Write Operation

Hash insertion:

Since the default hash size is 1 Million buckets, the possibility of more than 1 write on the same bucket is reduced (not including an overwrite operation). This gives us direct access without synchronization.

Write list insertion:

Elements are inserted into the write list atomically (the anchor pointer is an atomic exchange). The write list is not sorted - this occurs in the background.

Sorted thread:

In the background, the sorted thread divides the write list into sorted vectors if at least one of the following conditions occurs:

  1. The vector size limit is reached (default 10,000)

  2. A seek operation occurs

The vector will only be valid if it is sorted.

Read Operation:

Read is done by direct access to the match memtable hash bucket. No synchronized lock is required.

Seek Operation:

The seek request signals the sorted thread to divide a new sorted vector by adding a special entry to the write list.

Note: New writes are not suspended!

The seek request waits for the cutoff vector to be sorted. Each seek iterator has its own binary heap of shared sorted vectors relevant to the seek time creation, so there's no dependency on other iterators' progress.

If no write occurs, the last sorted vector is used (a new one isn't created).

Merge sorted vectors:

When there are many seek operations, a situation could arise with many small sorted vectors.

In this case, the sorted thread's responsibility is to merge small continuous sorted vectors, so the seek iterator will be created from a small number of sorted vectors.

Preparing the memtable

As mentioned above, the creation of this memtable is expensive. As such, we've created a CF background thread that creates a standby memtable, which allows us to switch memtables without wasting time.

Advantages

  1. High performance in a heavy write workload

  2. A write request during a seek request is not blocked.

  3. Sorted vectors are used on each seek operation without being copied

  4. If no write occurs between two seek operations, both seek iterators will have the same sorted vector

Disadvantages

  1. The memtable constructor size is about 8mg (depends on the bucket size)

  2. memtable creation is expensive (creating 1000000 buckets takes time)

Performance Results

The new sorted hash memtable was tested with db_bench, using the configuration below.

The following benchmark results compare the performance of Rocksdb v7.2.2 with skiplist compared to Speedb with the new sorted hash memtable code.

Note: More details about how we test performance you can read in the Performance Testing chapter.

Configuration:

  • Number of objects: 1Billion

  • Value size: 1KB

  • Write buffer size 64MB (default)

  • Number of Threads: 4

  • Max number of write buffers: 4

  • Number of CF: 1 (default)

  • Number of CPU cores: 16

  • Compression mode: none

*Up to 155% improvement in overwrite workload

Up to 68% improvement in 50% random read workload

Up to 15% improvement in random seek workload

Usage

db_bench/db_stress

./db_bench--memtablerep=speedb.HashSpdRepFactory It will use the default bucket size = 1000000 In order to change the default bucket size, use the following syntax: ./db_bench --memtablerep=speedb.HashSpdRepFactory:1000 (where 1000 is the bucket size)

Configuring in the user’s application code

This should be set in the DB Options object:

Options options;
ConfigOptions config_options;
config_options.ignore_unknown_options = false;
config_options.ignore_unsupported_options = false;
Status s = MemTableRepFactory::CreateFromString(config_options, "speedb.HashSpdRepFactory", &options.memtable_factory);
assert(s.ok());

✨
RocksDB memtable
100% Overwrite workload
50% Random Read performance results
100% random seek workload