Links

Log Parser

Overview

Speedb's Log Parser is a tool that may be used to parse and process Speedb and RocksDB log files.
The tool extracts useful information from these logs and aids users in gaining insights about their systems with ease. It is expected to be a valuable tool for novices and experts alike.
The tool is written in the Python language and consists of a set of Python scripts.
It resides in a GitHub repository (https://github.com/speedb-io/log-parser). There is a README.md file in the root folder of the repository that describes how to install the tool, contribute to its development, etc.

Terminology

The following terms and abbreviations are used throughout this document:
· CF: Column-Family
· DB / DB-Wide: Applicable to the entire DB rather than to a specific CF.

Major Capabilities

  • Parses a single Speedb or RocksDB log file (parsing multiple logs in in the road-map)
  • Parses the log and processes the information about the following elements and entities (detailed description may be found in the sections below):
    • Metadata information about the instance that has generated this log file (e.g., library version generating the log) (General Section).
    • Speedb / RocksDB Options (Options Section):
      • All of the options with their values (db-wide and per column family).
      • Displays the difference between the options in the log and an applicable baseline.
    • Data about the size of the DB (DB-Size Section).
    • Flushes and Compactions (Flushes and Compactions sections respectively)
    • Information regarding DB read operations (Reads Section).
    • Information regarding seek DB operations (Seeks Section).
    • Warnings issued by the DB (Warnings Section).
    • Block cache statistics (Block-Cache-Stats Section)
    • Various Statistics: Counters, histograms, compaction stats, etc.
  • The tool generates multiple outputs. Details may be found in the sections that follow. The outputs include:
    • A short console output (the default output format)
    • A JSON file with detailed information.
    • CSV files containing information about the counters, counters histograms, compactions, and flushes statistics.
    • A detailed console output (the JSON file printed to the console).

Usage

udi@udi-speedb:~/log-parser$ python3 log_parser.py -h
usage: log_parser.py [-h] [-c {short,long}] [-j] [-o OUTPUT_FOLDER] [-l] log-file-path
positional arguments:
log-file-path A path to a log file to parse (default: None)
optional arguments:
-h, --help show this help message and exit
-c {short,long}, --console {short,long}
Print to console a summary (short) or a detailed (long) output (default: None)
-j, --generate-json Optionally generate a JSON file in the output folder's run-folder, with detailed information. If generated, it will be called log.json.(default: False)
-o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
The name of the folder where output files will be stored in SUB-FOLDERS named run_dddd, where 'dddd' is the run number (default: output_files)
-l, --generate-log Generate a log file for the parser's log messages (default: False)
Notes:
- The default it to print to the console in a short format.
- It is possible to specify both json and console outputs. Both will be generated.

Installation / Getting Started / Prerequisites

Please see the README.md file in the tool’s repository for this information.

Tool’s Outputs Description

Abbreviations, Conventions and Terms used in the log parser’s outputs

· tool: The log parser
· Log file / log: A RocksDB / Speedb information log. Not the WAL (Write-Ahead-Log)
· Parsed Log: The log file given to the log parser for parsing.
· db: Database
· db options: Options that are not specific to any cf. Applicable to the entire db.
· cf options: Options that are specific to an individual cf.
· cf (cf-s): Column Family (Column Families)
· Units:
o Size Units: B (Bytes), KB (Kilobytes), MB (Megabytes), TB (Terabytes)
o Numbers Units: K (Thousands), M (Millions), G (Billions).

Timestamps

Log timestamps (e.g., ‘2023/01/04-08:54:59.130735’) are in local time. The resolution is microseconds. All of the information displayed by the tool is using the timestamps from the parsed log as is.

Availability of Data

Log level

The info_log_level db option controls the minimal level of issued log traces that will actually be written to the log (INFO_LEVEL by default in production library builds). Using a higher logging level will result in a log file that has almost no useful information. In that case, the tool (as well as the log file itself) will be of little use in practice.

Availability of the information to parse and process

The tool relies solely on the information contained in the parsed log file. Consequently, the information it displays reflects that. For example, the tool can’t display the average size of a key or a value in the entire DB, or the total number of keys in the DB, as this information is not printed to the log file.
Other common cases in which data may not be available:
· Not having statistics (a configuration options)
· Rolled logs (see “Log Rolling” below)
· The number of cf-s (see “Number of Column Families and its implications” section below)
· Lack of applicable activity. For example, no flush or compaction for a cf.
When data is not available, the output of the log parser will reflect that. For example:
· “Filter Data Not Available”: When there is no information about the filter policy that was configured for a cf.
· "Data Unavailable (No Stats Available)": When the DB was configured not to use statistics.
In the sections that follow, where applicable, there is a description of the information elements in the log file that were used to generate the associated output. For example:
· Flush events and associated flush log traces
· DB Statistics Dump
· Counters and Histogram dumps

Options

DB Options and CF Options

There are options that may be configured individually for every cf. These are called cf options. Options that are not specific to a cf are called DB (or db-wide) options. DB Options apply to the entire DB.

Defaults

Speedb and RocksDB come with defaults for every option. A user may override any of these options when opening a db. A user may also override any of the cf options when creating a new cf explicitly.

Writing the options to the log

The options are written to the log in multiple cases:
· When a db is opened, the db and cf options are written to the log. This may occur when opening a new DB, or when recovering an existing db (with its existing cf-s and potentially newly created cf-s) from persistent storage.
· When a new cf is created, its options are written to the log.
· When a log is rolled, the db and cf options are written to the new log.
However, please see below for important information regarding applicable limitations.

Log Rolling

Log rolling (also called log rotation) is the process used to stop writing to an active log file, renaming it, and opening a new log file to which logging will be directed until the time comes to roll/rotate the log again. There are a few options that control this mechanism. In this document, log files that were rolled, are called rolled logs.
At the time of this writing, the defaults are to use a single log file (max_log_file_size option), effectively avoiding log rolling altogether. Some more information may be found here.

Opening an existing DB (DB Recovery)

When the DB is opened as part of db recovery (opening an existing db), a new log file is created. This is unrelated to log rolling described above, but results in multiple log files nevertheless.

Number of Column Families and its implications

A user may create any number of column families (cf-s).
However, while opening a DB, only the options of the first 10 (hard coded) cf-s will be printed to the log.
The log will contain the following information for the 11th cf onwards:
2023/06/06-12:56:30.438743 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000009] (ID 9)
2023/06/06-12:56:30.806215 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:30.806365 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000010] (ID 10)
2023/06/06-12:56:31.191732 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:31.191836 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000011] (ID 11)
2023/06/06-12:56:31.635102 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:31.635291 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000012] (ID 12)
2023/06/06-12:56:32.053358 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:32.053527 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000013] (ID 13)
2023/06/06-12:56:32.508987 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:32.509108 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000014] (ID 14)
So, we know that there are additional cf-s, and we know their names and id-s, but we do not know their options.

Unavailability of CF Names in Rolled Logs (Auto-Generated CF Names)

When a new DB is opened, or when a cf is created explicitly (via the CreateColumnFamily or CreateColumnFamilies APIs) , this is what the log contains (Only the first lines are shows):
2023/06/06-12:56:28.376394 322453 [/column_family.cc:620] --------------- Options for column family [default]:
2023/06/06-12:56:28.376399 322453 Options.comparator: leveldb.BytewiseComparator
2023/06/06-12:56:28.376400 322453 Options.merge_operator: 0x7fdfd8068640
2023/06/06-12:56:28.376402 322453 Options.compaction_filter: None
2023/06/06-12:56:28.376403 322453 Options.compaction_filter_factory: None
The first line contains the name of the cf (“default” in this case)
However, when a new log file is created as part of log rolling, this first line is not printed to the log => the tool has no (simple and safe) way of knowing to which cf the options belong. As the “default” cf always exists and is the first one, the tool assumes the first to be “default”, but not for the rest.
In addition, there is no information about the skipped cf-s (those that are > 10, see above).
The rest of the log contains traces that may contain the names of the cf-s, but that is of no use in the association of options with their cf-s.
The log parser handles this by auto-generating cf names for the cf-s whose names are unknown, but for which there are options.
The auto generated names have the following format: UNKNOWN-CF-#<Number>
The following snapshot shows an example:
As a consequence of what is described above, when the tool parses a log that is created as part of log rolling and there are more than 10 cf-s, the tool doesn’t know how many cf-s there are.
This will be indicated to the user by displaying the following information in the console short output, or the json’s General object:
1
Num CF-s (*) : Can't be accurately determined
2
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
3
| Column Family | Size | Avg. Key Size | Avg. Value Size | Compaction Style | Compression | Filter-Policy |
4
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
5
| default | Data Unavailable | 24 B | 100 B | kCompactionStyleLevel | Snappy | rocksdb.internal.FastLocalBloomFilter (24.0) |
6
| column_family_name_000001 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
7
| column_family_name_000002 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
8
| column_family_name_000003 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
9
| column_family_name_000004 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
10
| column_family_name_000005 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
11
| column_family_name_000006 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
12
| column_family_name_000007 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
13
| column_family_name_000008 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
14
| column_family_name_000009 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
15
| column_family_name_000010 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
16
| column_family_name_000012 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
17
| column_family_name_000013 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
18
| column_family_name_000014 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
19
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
20
21
(*) Please see the 'Ability to determine the number of cf-s' section in the log parser's documentation for more information

The Output Folder

Every run of the log parser generates multiple output files. All of the files generated in a single run are placed under a single folder. The folder’s location and name are determined as follows:
· The user may specify a parent folder via the ‘-o’ command line parameter.
· If the user doesn’t specify an output parent folder, ‘output_files’ will be used by default.
· If the parent folder doesn’t exist, it will be created.
· Under the parent folder, the parser will create a folder named ‘run_dddd’ and place the run’s output files under that folder. ‘dddd’ are 4 digits that compose the run’s number. They are incremented every run and wrap around if reaching ‘9999’.
· If the parent folder already contains ‘run_dddd’ folders, the tool will detect the largest one (converting ‘dddd’ to its numeric equivalent - N), create a new folder ‘run_mmmm’ where int('mmmm') = N+1
An example to put it all together:
The user specified ‘-o Output' as an argument when running the tool. There is no folder named ‘Output’ => The tool will create a new folder named ‘Output’, and the output files will be under ‘Output/run_0001/’
The user re-runs the tool, again specifying ‘-o Output’:
The run’s output files will be under ‘Output/run_0002’

Console Output

There are 2 console output flavors:
1. Short: A concise summary of the major information elements. May be used to get a basic understanding of the db’s operation according to the parsed log.
2. Detailed: The contents of the JSON file, printed on the console (please see the JSON file description section for more details). This option is available for power users that wish to use only the console. To be of use, these users will probably use JSON command-line filtering tools such as JQ.

Console Short Output Description

Example Output

1
udi@udi-speedb:~/log-parser$ python3 log_parser.py test/input_files/LOG_speedb
2
Log file: file:///home/udi/log-parser/test/input_files/LOG_speedb
3
Baseline Log: file:///home/udi/log-parser/baseline_logs/LOG-speedb-2.3.0
4
Counters CSV Is in file:///home/udi/log-parser/output_files/run_0016/counters.csv
5
Human Readable Counters Histograms CSV Is in file:///home/udi/log-parser/output_files/run_0016/histograms_human_readable.csv
6
Compactions Stats CSV Is in file:///home/udi/log-parser/output_files/run_0016/compactions_stats.csv
7
Compactions CSV Is in file:///home/udi/log-parser/output_files/run_0016/compactions.csv
8
Flushes CSV Is in file:///home/udi/log-parser/output_files/run_0016/flushes.csv
9
10
Parsing of: /home/udi/log-parser/test/input_files/LOG_speedb
11
============================================================
12
Name : /home/udi/log-parser/test/input_files/LOG_speedb
13
Start Time : 2023/04/09-12:27:35.901086
14
End Time : 2023/04/09-12:37:36.500378
15
Log Time Span : 0d 00h 10m 00s
16
Creator : Speedb
17
Version : 2.4.0 [8ef7469a7f8d1d100a7a187b3c682a48777b7047]
18
DB Size (*) : 111.6 MB
19
Num Keys Written : 48.0 M
20
Avg. Written Key Size : 24 B
21
Avg. Written Value Size : 64 B
22
Num Warnings : 0
23
Error Messages : No Error Messages
24
Fatal Messages : No Fatal Messages
25
Ingest (*) : 3.9 GB
26
Ingest Rate : 6.76 MBps
27
Statistics : Available
28
Writes : 10.0% (48033559/480337639)
29
Reads : 90.0% (432304080/480337639)
30
Seeks : 0 (No Seek Operations)
31
Deleted (Flushed) Entries: 0 (No Delete Operations)
32
Num CF-s (**) : 3
33
---------------------------------------------------------------------------------------------------------------------------------------
34
| Column Family | Size (*) | Avg. Key Size | Avg. Value Size | Compaction Style | Compression | Filter-Policy |
35
---------------------------------------------------------------------------------------------------------------------------------------
36
| default | 40.8 MB | 24 B | 64 B | kCompactionStyleLevel | NoCompression | bloomfilter (10.0) |
37
| column_family_name_000001 | 38.3 MB | 24 B | 64 B | kCompactionStyleLevel | NoCompression | bloomfilter (10.0) |
38
| column_family_name_000002 | 32.6 MB | 24 B | 64 B | kCompactionStyleLevel | NoCompression | bloomfilter (10.0) |
39
---------------------------------------------------------------------------------------------------------------------------------------
40
41
(*) Data is calculated at: 2023/04/09-12:37:27.398344
42
(**) Please see the log parser's documentation for more information

Output Fields Description

Field Name
Meaning
Comments
1
Title (“Parsing of:…”)
Parsed Log full path
2
Name
Parsed Log full path
3
Start Time
Time of first log entry
4
End Time
Time of last log entry
5
Log Time Span
The time difference between the Start Time and the End Time
The value is expressed in number of days, hours, minutes, and seconds.
6
Creator
The creator of the library that generated the parsed log
Currently may be either Speedb or RocksDB
7
Version
The library’s version [Git Hash]
8
DB Size (*)
The total size of all of the SST files in the database
The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….”
9
Num Keys Written
The total number of keys written to the DB
The value of the rocksdb.number.keys.written counter if available,
otherwise, extracted from cumulative writes information (DB Stats)
If none of the above is available: “Data Unavailable”
10
Avg. Written Key Size
The average size of a written key
Calculated from information in table_file_creation events
If there are no such events, “Data Unavailable”
11
Avg. Written Value Size
The average size of a written value
Calculated from information in table_file_creation events
If there are no such events, “Data Unavailable”
12
Error Messages
The messages in severity “ERROR” as they appear in the log
“No Error Messages” if there are no errors in the log
13
Fatal Messages
The messages in severity “FATAL” as they appear in the log
“No Fatal Messages” if there are no errors in the log
14
Ingest (*)
Total ingest
Calculated from cumulative writes information.
The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….”
If data is not available: “No Ingest Info Available”
15
Ingest Rate
Ingest Rate in Mega-Bytes per second
Calculated from cumulative writes information.
The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….”
If data is not available: “No Ingest Info Available”
16
Statistics
Whether statistics are available or not
17
Writes
The total number of write operations
<Percentage> (<Count> / <Total Ops>)
Total Ops: Total number of operations (write + read + seek)
Count: Total number of writes
Percentage: Percentage of writes out of Total Ops
Calculated from the rocksdb.number.keys.written, rocksdb.number.keys.read, and rocksdb.number.db.seek counters.
If statistics are not available: “Data Unavailable (No Statistics)”
18
Reads
The total number of read operations
(Same as for the Writes field, but for reads)
Same as for Writes
19
Seeks
The total number of seek operations
(Same as for the Writes field, but for seeks)
Same as for Writes
20
Deletes
<Percentage> (<Num Deletes> / <Num Entries>)
Num Entries: The total number of flushed entries
Num Deletes: The total number of deletes in
Percentage: Percentage of deletes out of Num Entries
Information is gathered from Flush events in the log.
If no such events exist: “Data Unavailable (No Flush Started Events)”
21
Num CF-s
Number of cf-s or
“Can't be accurately determined”
The log parser may not be able to know for sure the number of cf-s in the DB using the parsed log file. In that case it will show the text “Can't be accurately determined”. Please see the “Ability to determine the number of cf-s“ for more information.
22
23
Column Families Information Table
24
Column Family
The name of the CF
Will display any CF for which there was information in the log (TBD - See XXXX)
25
Size
The total size of all of the SST files of the CF
If no data available to calculate the size: “Data Unavailable”
26
Avg. Key Size
Average value of a key in newly created SST-s of the CF
Gathered from table_file_creation events in the log
27
Avg. Value Size
Average value of a value in newly created SST-s of the CF
Gathered from table_file_creation events in the log
28
Compaction Style
The compaction style used in this CF
Taken from the options for the CF. The values are as they appear in the log.
If not known: “UNKNOWN”
29
Compression
The compression type used in this CF
Taken from the options for the CF. The values are as they appear in the log.
If not known: “UNKNOWN”
30
Filter-Policy (<BPK>)
The filter policy used in this CF (If any)
<BPK> - The average BPK for the filter.
Taken from the options for the CF. The values are as they appear in the log.
If not known: “UNKNOWN”

JSON File

Upon the user’s request, a JSON file is generated in the output folder.
The JSON file is a text file. It is best viewed by a JSON viewer. The following screen shots were taken from the Firefox browser that has a built-in JSON viewer.
JSON is a hierarchical format. The following sections describe the contents of the JSON file accordingly.

Top Level JSON Objects

The JSON file contains the following top-level objects (please see the following sections for details on every object):
  • General: The same information that is displayed in the short console output.
  • Options:
    • The differences between the options in the log and an applicable baseline version (if available).
    • All of the DB-Wide and per-cf options in the log
  • DB-Size: Summary information of ingest data and per-cf and level size growth.
  • Flushes: Per CF flush-related information..
  • Compactions: Per CF compactions-related information.
  • Reads: Get / Multi-Get operations related information
  • Seeks: Seek operations related information
  • Warnings: Warnings statistics.
  • Block-Cache-Stats: Statistics about the use of the block-cache.
  • CSV-s: Paths of generated CSV-s.

JSON Top-Level Objects Detailed Description

General

This includes the same information that is displayed in the short console output. Please see that section for more details.

Options

The options object consists of 2 sub-objects:
1. Diff
2. All Options
Notes about the display of Options
· WBM Pseudo-Options: The DB-Wide options contain pseudo-options for the Write Buffer Manager. They are not true Speedb / Rocksdb options. They are configuration parameters given to the WBM during its construction. They are displayed in the log together with the “official” options as follows:
1
2023/04/09-12:27:35.901291 30528 Options.db_write_buffer_size: 0
2
2023/04/09-12:27:35.901299 30528 Options.write_buffer_manager: 0x55ded4834000
3
wbm.size: 107374182
4
wbm.cache: 0x55deca6980f0
5
wbm.allow_stalls: 1
6
wbm.initiate_flushes: 1
7
2023/04/09-12:27:35.901304 30528 Options.access_hint_on_compaction_start: 1
And this is how they are displayed in the JSON (the ‘write_buffer_manager_’ prefix was added):
· Block-Based-Table-Options (CF Options sub-object)
CF Options consist of a set of top-level options. One of these options is the table_factory. The tool assumes the table_factory is of type BlockBasedTableFactory (Block-Based-Table-Factory). The options for this entity are displayed separately, under a sub-object called “Block-Based Table”.
This is an example from a log file that shows how these options are printed to the log:
1
2023/04/09-12:27:35.901772 30528 Options.table_factory: BlockBasedTable
2
2023/04/09-12:27:35.901929 30528 table_factory options: flush_block_policy_factory: FlushBlockBySizePolicyFactory (0x55deca6d9c60)
3
cache_index_and_filter_blocks: 1
4
cache_index_and_filter_blocks_with_high_priority: 1
5
pin_l0_filter_and_index_blocks_in_cache: 0
6
pin_top_level_index_and_filter: 1
7
metadata_cache_options:
8
top_level_index_pinning: 3
9
partition_pinning: 0
10
unpartitioned_pinning: 0
11
index_type: 0
12
data_block_index_type: 0
13
index_shortening: 1
14
data_block_hash_table_util_ratio: 0.750000
15
checksum: 1
16
no_block_cache: 0
17
block_cache: 0x55deca6980f0
18
block_cache_name: LRUCache
  • block_cache_options: These are displayed in the log as follows:
1
block_cache_name: LRUCache
2
block_cache_options:
3
capacity : 2147483648
4
num_shard_bits : 4
5
strict_capacity_limit : 0
6
memory_allocator : None
7
high_pri_pool_ratio: 0.600
8
low_pri_pool_ratio: 0.000
9
block_cache_compressed: (nil)
In the JSON, a ‘block_cache_’ prefix is used as follows:
· metadata_cache_options: These are displayed in the log as follows:
1
metadata_cache_options:
2
top_level_index_pinning: 3
3
partition_pinning: 0
4
unpartitioned_pinning: 0
5
index_type: 0
In the JSON, a ‘metadata_cache_’ prefix is used as follows:
· Options that are pointers
Some options are pointers (values that start with ‘0x’ and only contain hexadecimal digits). It is impossible to know from the log the real entity that the pointer points to. A pointer may or may not be initialized. Uninitialized pointers are displayed in the log in multiple ways (e.g., ;(nil)', ‘None’, etc.).
The value of an initialized pointer is meaningless in and of itself. Its value is the address of the associated entity in the address space of its containing process, and is unique within that process. Its only use, in the context of log files, is the ability to understand that the same entity is shared. For example, if the same block cache is shared between multiple cf-s, then all of them will have the same value for the block cache’s pointer option.
When the value of an initialized pointer is displayed in the JSON file, it will be displayed as “Pointer (<pointer value>).
Uninitialized pointers will be displayed as “Pointer (Uninitialise)”.
Options Diff
Overview
This object contains the differences between the options in the log and the default options in an applicable baseline version. The baseline version is the closest available version of the applicable creator of the library (RocksDB or Speedb).
This object contains the following sub-objects:
1. Baseline: The version that is used as the baseline with its creator in parentheses.
2. Baseline Log: The full path to the baseline log file.
3. DB: The diff in the db options.
4. CF-s: Per CF options diff. This object consists of 2 sub-objects:
1. CF-s (Common): Contains the options that are identical in all of the cf-s in the parsed log but are different than the corresponding option in the baseline.
2. CF-s (Specific): Contains the options that are not identical in the parsed log, and the corresponding option in the baseline.
Please see the section “CF-s Options Diff - CF-s (Common) Sub-Object” below for more details.
There are 3 cases with respect to a diff between the baseline and the parsed log:
1. The option exists in the baseline but was removed in the version that generated the parsed log.
2. The option doesn’t exist in the baseline but was added to the version that generated the parsed log.
3. The option exists in both, but the values are different.
Notes
· The Options Diff object will only show options in which there is a difference.
· Every entry in the diff contains 2 lines, the first for the baseline (called “Baseline”), and the second for the parsed log (called “Parsed Log”).
· When an option doesn’t exist, “Missing” will be displayed.
· If an option was renamed in a version, it will be deemed as a new option and it will appear twice, first for the old name (missing in parsed logs) and then for the new name (missing in the baseline).
· Pointers handling:
o The value of initialized pointers will always be different in the baseline and the parsed log. They will be deemed equal for the purposes of the comparison.
o If both pointers are uninitialized, they will be deemed equal.
· In all other cases, the values will be displayed as they appear in the log
DB Options Diff
This is an example of a DB options diff:
CF-s Options Diff - CF-s (Common) Sub-Object
The following two snapshots are an example of this sub-object:
Notes:
· As noted above, the Common sub-object contains options whose value is identical in all of the cf-s of the parsed log. So, for example, the memtable_factory is "speedb.HashSpdRepFactory" in all the cf-s of the parsed log. Its value is “SkipListFactory” in the baseline.
· It consists of two sub-objects:
o CF: The cf options that are not part of the block-based-table-format options
o Block-Based Table: The block-based-table-format options.
CF-s Options Diff - CF-s (Specific) Sub-Object
Notes:
  • As noted above, this sub-object only contains options that are not identical in all of the cf-s of the parsed log. Please note however that all such options are included in this sub-object, even if its value is the same as the corresponding option in the baseline.
  • Unknown-CF-#<I>”: As explained in the “Ability to determine the number of cf-s” section, these represent cf-s whose name can’t be determined, that have options at the top of the parsed log.
All Options
This object lists all of the options that appear in the log file, in the same order:
As in the options diff sub-object, there are sub-objects for the db-wide options and the cf-s options:
The sub-objects themselves, contain additional sub-objects, using the same principles described above for the options diff sub-object:
The CF-s (Common) sub-object contains all of the options that are identical in all of the cf-s of the parsed log.
The CF-s (Specific) sub-object contains all of the options that are not identical in all of the cf-s of the parsed log. For example:
As may be seen in this example, the “default” cf has options that are not the same as the corresponding options in the “column_family_name_000001” and the “column_family_name_000002” cf-s.

DB-Size Sub-Object

This sub-object consists of the following:
  • Ingest: Ingest information taken from the last cumulative writes log trace:
    • Ingest:
    • Ingest Rate:
    • Ingest Time:
  • CF-s Growth: Per CF and level report on the difference in the size from the start of the log to its end. The information is obtained from compaction stats dumps.
    • Per CF and level, the information is displayed as <Start Size> -> <End Size> (<Difference>)
    • The “Sum” entry shows the total for the CF (sum of all levels of the CF).

Flushes Object

This Object displays per-cf information about flushes.
Per CF, there is information for all of the flushes that occurred in the CF.
The following information is displayed, per CF:
Name
Meaning
Source
L0->L1 Write-Amp
The write-amplification for Level0 → Level1 for this CF.
Compaction Stats
Per <Flush Reason> Sub-Object
Sub-object per flush reason in the flushes for this CF
All of the fields below are in the context of a CF and flush reason.
Flush Events
Sizes Histogram
A histogram of the number of flushes per total data size range (bucket) in a flush.
The numbers are the number of flushes of all flushes for this cf and flush reason.
For example, in the snapshot above, there were 31 flushes whose total data size was more than 31 MB).
Num Flushes
The number of flushes
Min Duration
The minimum duration that a single flush took to complete
Max Duration
The maximum duration that a single flush took to complete
Min Num Memtables
The minimum number of memtables that were part of a single flush.
Max Num Memtables
The maximum number of memtables that were part of a single flush.
Min Total Data Size
The minimum total data size in a single flush.
Max Total Data Size
The maximum total data size in a single flush.

Compactions Object

This Object displays per-cf information about compactions.
All of the information in this sub-objects is based on traces of compaction jobs (events and associatedlog traces), and compaction level stats dumps.
The sub-object consists of the following:
· Largest compaction size of all compactions in the log.
· Per CF compactions information (see below).
Per CF Information
The per-cf compactions information is based on all of the compactions that occurred in the CF.
The following information is displayed, per CF:
Name
Meaning
Per <Flush Reason> Sub-Object
Sub-object per flush reason in the flushes for this CF
All of the fields below are in the context of a CF and flush reason.
Num Compactions
The number of compactions
Min Compactions BW
Minimum write rate of a compaction
Max Compactions BW
Maximum write rate of a compaction
Comp
The elapsed time of all compactions
Comp Merge CPU
The elapsed CPU time of all compactions
Per Level Write-Amp
Write amplication per level, and their total (SUM)

Reads Object

This object contains information about read operations performed by the user (Get and Multi-Get) and associated aspects.
Get Histogram
This is the last dump of the rocksdb.db.get.micros histogram.
It will be available only when statistics are enabled
Multi-Get Histogram
This is the last dump of the rocksdb.db.multiget.micros histogram.
It will be available only when statistics are enabled
Per CF Read Latency
Per CF, displays information about read performance across all of the CF’s levels. The information is obtained from “File Read Latency Histogram By Level” dumps:
· Num Reads: Total number of reads.
· Avg. Read Latency: The average read latency
· Max Read Latency: The maximum latency
· Read % of All CF-s: The percentage of reads performed on this CF relative to the total number of reads on all CF-s (the snapshot shows a single CF so the percentage is 100%).
Filter Effectiveness
Consists of:
  • CF-s: Per CF:
    • Filter-Policy: The type of filter used (if any).
    • Avg. BPK: The average effective BPK for that filter
  • Counters: Global filters counters for all filters (available only if statistics are available):
    • False-Positive Rate: The effective false positive rate of all of the filters. This is displayed as a “1 in N” as it is the convention.
    • False-Positives: The number of times the filters answered “Key May Exist”, but the key wasn’t actually found.