Log Parser
- Notes about the display of Options
- Options Diff
- Overview
- DB Options Diff
- CF-s Options Diff - CF-s (Common) Sub-Object
- CF-s Options Diff - CF-s (Specific) Sub-Object
- All Options
- Get Histogram
- Multi-Get Histogram
- Per CF Read Latency
- Filter Effectiveness
- Top-level sub-objects
- Caches top-level sub-object
- DB Counters top-level sub-object
- Detailed top-level sub-object
- histograms_human_readable.csv
- histograms_tools.csv
- compactions_stats.csv
- compactions.csv
- flushes.csv
- Reporting Bugs and Proposing
- Known Bugs
- Contributing to the tool’s development
- Getting Advice or Discussing the Tool
- The Future (the Features)
Speedb's Log Parser is a tool that may be used to parse and process Speedb and RocksDB log files.
The tool extracts useful information from these logs and aids users in gaining insights about their systems with ease. It is expected to be a valuable tool for novices and experts alike.
The tool is written in the Python language and consists of a set of Python scripts.
It resides in a GitHub repository (https://github.com/speedb-io/log-parser). There is a README.md file in the root folder of the repository that describes how to install the tool, contribute to its development, etc.
The following terms and abbreviations are used throughout this document:
· CF: Column-Family
· DB / DB-Wide: Applicable to the entire DB rather than to a specific CF.
- Parses a single Speedb or RocksDB log file (parsing multiple logs in in the road-map)
- Parses the log and processes the information about the following elements and entities (detailed description may be found in the sections below):
- Metadata information about the instance that has generated this log file (e.g., library version generating the log) (General Section).
- Speedb / RocksDB Options (Options Section):
- All of the options with their values (db-wide and per column family).
- Displays the difference between the options in the log and an applicable baseline.
- Data about the size of the DB (DB-Size Section).
- Flushes and Compactions (Flushes and Compactions sections respectively)
- Information regarding DB read operations (Reads Section).
- Information regarding seek DB operations (Seeks Section).
- Warnings issued by the DB (Warnings Section).
- Block cache statistics (Block-Cache-Stats Section)
- Various Statistics: Counters, histograms, compaction stats, etc.
- The tool generates multiple outputs. Details may be found in the sections that follow. The outputs include:
- A short console output (the default output format)
- A JSON file with detailed information.
- CSV files containing information about the counters, counters histograms, compactions, and flushes statistics.
- A detailed console output (the JSON file printed to the console).
udi@udi-speedb:~/log-parser$ python3 log_parser.py -h
usage: log_parser.py [-h] [-c {short,long}] [-j] [-o OUTPUT_FOLDER] [-l] log-file-path
positional arguments:
log-file-path A path to a log file to parse (default: None)
optional arguments:
-h, --help show this help message and exit
-c {short,long}, --console {short,long}
Print to console a summary (short) or a detailed (long) output (default: None)
-j, --generate-json Optionally generate a JSON file in the output folder's run-folder, with detailed information. If generated, it will be called log.json.(default: False)
-o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
The name of the folder where output files will be stored in SUB-FOLDERS named run_dddd, where 'dddd' is the run number (default: output_files)
-l, --generate-log Generate a log file for the parser's log messages (default: False)
Notes:
- The default it to print to the console in a short format.
- It is possible to specify both json and console outputs. Both will be generated.
· tool: The log parser
· Log file / log: A RocksDB / Speedb information log. Not the WAL (Write-Ahead-Log)
· Parsed Log: The log file given to the log parser for parsing.
· db: Database
· db options: Options that are not specific to any cf. Applicable to the entire db.
· cf options: Options that are specific to an individual cf.
· cf (cf-s): Column Family (Column Families)
· Units:
o Size Units: B (Bytes), KB (Kilobytes), MB (Megabytes), TB (Terabytes)
o Numbers Units: K (Thousands), M (Millions), G (Billions).
Log timestamps (e.g., ‘2023/01/04-08:54:59.130735’) are in local time. The resolution is microseconds. All of the information displayed by the tool is using the timestamps from the parsed log as is.
The
info_log_level
db option controls the minimal level of issued log traces that will actually be written to the log (INFO_LEVEL
by default in production library builds). Using a higher logging level will result in a log file that has almost no useful information. In that case, the tool (as well as the log file itself) will be of little use in practice.The tool relies solely on the information contained in the parsed log file. Consequently, the information it displays reflects that. For example, the tool can’t display the average size of a key or a value in the entire DB, or the total number of keys in the DB, as this information is not printed to the log file.
Other common cases in which data may not be available:
· Not having statistics (a configuration options)
· Rolled logs (see “Log Rolling” below)
· The number of cf-s (see “Number of Column Families and its implications” section below)
· Lack of applicable activity. For example, no flush or compaction for a cf.
When data is not available, the output of the log parser will reflect that. For example:
· “Filter Data Not Available”: When there is no information about the filter policy that was configured for a cf.
· "Data Unavailable (No Stats Available)": When the DB was configured not to use statistics.
In the sections that follow, where applicable, there is a description of the information elements in the log file that were used to generate the associated output. For example:
· Flush events and associated flush log traces
· DB Statistics Dump
· Counters and Histogram dumps
There are options that may be configured individually for every cf. These are called cf options. Options that are not specific to a cf are called DB (or db-wide) options. DB Options apply to the entire DB.
Speedb and RocksDB come with defaults for every option. A user may override any of these options when opening a db. A user may also override any of the cf options when creating a new cf explicitly.
The options are written to the log in multiple cases:
· When a db is opened, the db and cf options are written to the log. This may occur when opening a new DB, or when recovering an existing db (with its existing cf-s and potentially newly created cf-s) from persistent storage.
· When a new cf is created, its options are written to the log.
· When a log is rolled, the db and cf options are written to the new log.
However, please see below for important information regarding applicable limitations.
Log rolling (also called log rotation) is the process used to stop writing to an active log file, renaming it, and opening a new log file to which logging will be directed until the time comes to roll/rotate the log again. There are a few options that control this mechanism. In this document, log files that were rolled, are called rolled logs.
At the time of this writing, the defaults are to use a single log file (
max_log_file_size
option), effectively avoiding log rolling altogether. Some more information may be found here.When the DB is opened as part of db recovery (opening an existing db), a new log file is created. This is unrelated to log rolling described above, but results in multiple log files nevertheless.
A user may create any number of column families (cf-s).
However, while opening a DB, only the options of the first 10 (hard coded) cf-s will be printed to the log.
The log will contain the following information for the 11th cf onwards:
2023/06/06-12:56:30.438743 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000009] (ID 9)
2023/06/06-12:56:30.806215 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:30.806365 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000010] (ID 10)
2023/06/06-12:56:31.191732 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:31.191836 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000011] (ID 11)
2023/06/06-12:56:31.635102 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:31.635291 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000012] (ID 12)
2023/06/06-12:56:32.053358 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:32.053527 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000013] (ID 13)
2023/06/06-12:56:32.508987 322453 [/column_family.cc:625] (skipping printing options)
2023/06/06-12:56:32.509108 322453 [/db_impl/db_impl.cc:3317] Created column family [column_family_name_000014] (ID 14)
So, we know that there are additional cf-s, and we know their names and id-s, but we do not know their options.
When a new DB is opened, or when a cf is created explicitly (via the
CreateColumnFamily
or CreateColumnFamilies
APIs) , this is what the log contains (Only the first lines are shows):2023/06/06-12:56:28.376394 322453 [/column_family.cc:620] --------------- Options for column family [default]:
2023/06/06-12:56:28.376399 322453 Options.comparator: leveldb.BytewiseComparator
2023/06/06-12:56:28.376400 322453 Options.merge_operator: 0x7fdfd8068640
2023/06/06-12:56:28.376402 322453 Options.compaction_filter: None
2023/06/06-12:56:28.376403 322453 Options.compaction_filter_factory: None
The first line contains the name of the cf (“default” in this case)
However, when a new log file is created as part of log rolling, this first line is not printed to the log => the tool has no (simple and safe) way of knowing to which cf the options belong. As the “default” cf always exists and is the first one, the tool assumes the first to be “default”, but not for the rest.
In addition, there is no information about the skipped cf-s (those that are > 10, see above).
The rest of the log contains traces that may contain the names of the cf-s, but that is of no use in the association of options with their cf-s.
The log parser handles this by auto-generating cf names for the cf-s whose names are unknown, but for which there are options.
The auto generated names have the following format: UNKNOWN-CF-#<Number>
The following snapshot shows an example:

As a consequence of what is described above, when the tool parses a log that is created as part of log rolling and there are more than 10 cf-s, the tool doesn’t know how many cf-s there are.
This will be indicated to the user by displaying the following information in the console short output, or the json’s General object:
1
Num CF-s (*) : Can't be accurately determined
2
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
3
| Column Family | Size | Avg. Key Size | Avg. Value Size | Compaction Style | Compression | Filter-Policy |
4
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
5
| default | Data Unavailable | 24 B | 100 B | kCompactionStyleLevel | Snappy | rocksdb.internal.FastLocalBloomFilter (24.0) |
6
| column_family_name_000001 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
7
| column_family_name_000002 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
8
| column_family_name_000003 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
9
| column_family_name_000004 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
10
| column_family_name_000005 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
11
| column_family_name_000006 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
12
| column_family_name_000007 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
13
| column_family_name_000008 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
14
| column_family_name_000009 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
15
| column_family_name_000010 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
16
| column_family_name_000012 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
17
| column_family_name_000013 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
18
| column_family_name_000014 | Data Unavailable | 24 B | 100 B | UNKNOWN | UNKNOWN | rocksdb.internal.FastLocalBloomFilter (24.0) |
19
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
20
21
(*) Please see the 'Ability to determine the number of cf-s' section in the log parser's documentation for more information

Every run of the log parser generates multiple output files. All of the files generated in a single run are placed under a single folder. The folder’s location and name are determined as follows:
· The user may specify a parent folder via the ‘-o’ command line parameter.
· If the user doesn’t specify an output parent folder, ‘output_files’ will be used by default.
· If the parent folder doesn’t exist, it will be created.
· Under the parent folder, the parser will create a folder named ‘run_dddd’ and place the run’s output files under that folder. ‘dddd’ are 4 digits that compose the run’s number. They are incremented every run and wrap around if reaching ‘9999’.
· If the parent folder already contains ‘run_dddd’ folders, the tool will detect the largest one (converting ‘dddd’ to its numeric equivalent - N), create a new folder ‘run_mmmm’ where int('mmmm') = N+1
An example to put it all together:
The user specified ‘-o Output' as an argument when running the tool. There is no folder named ‘Output’ => The tool will create a new folder named ‘Output’, and the output files will be under ‘Output/run_0001/’
The user re-runs the tool, again specifying ‘-o Output’:
The run’s output files will be under ‘Output/run_0002’
There are 2 console output flavors:
1. Short: A concise summary of the major information elements. May be used to get a basic understanding of the db’s operation according to the parsed log.
2. Detailed: The contents of the JSON file, printed on the console (please see the JSON file description section for more details). This option is available for power users that wish to use only the console. To be of use, these users will probably use JSON command-line filtering tools such as JQ.
1
udi@udi-speedb:~/log-parser$ python3 log_parser.py test/input_files/LOG_speedb
2
Log file: file:///home/udi/log-parser/test/input_files/LOG_speedb
3
Baseline Log: file:///home/udi/log-parser/baseline_logs/LOG-speedb-2.3.0
4
Counters CSV Is in file:///home/udi/log-parser/output_files/run_0016/counters.csv
5
Human Readable Counters Histograms CSV Is in file:///home/udi/log-parser/output_files/run_0016/histograms_human_readable.csv
6
Compactions Stats CSV Is in file:///home/udi/log-parser/output_files/run_0016/compactions_stats.csv
7
Compactions CSV Is in file:///home/udi/log-parser/output_files/run_0016/compactions.csv
8
Flushes CSV Is in file:///home/udi/log-parser/output_files/run_0016/flushes.csv
9
10
Parsing of: /home/udi/log-parser/test/input_files/LOG_speedb
11
============================================================
12
Name : /home/udi/log-parser/test/input_files/LOG_speedb
13
Start Time : 2023/04/09-12:27:35.901086
14
End Time : 2023/04/09-12:37:36.500378
15
Log Time Span : 0d 00h 10m 00s
16
Creator : Speedb
17
Version : 2.4.0 [8ef7469a7f8d1d100a7a187b3c682a48777b7047]
18
DB Size (*) : 111.6 MB
19
Num Keys Written : 48.0 M
20
Avg. Written Key Size : 24 B
21
Avg. Written Value Size : 64 B
22
Num Warnings : 0
23
Error Messages : No Error Messages
24
Fatal Messages : No Fatal Messages
25
Ingest (*) : 3.9 GB
26
Ingest Rate : 6.76 MBps
27
Statistics : Available
28
Writes : 10.0% (48033559/480337639)
29
Reads : 90.0% (432304080/480337639)
30
Seeks : 0 (No Seek Operations)
31
Deleted (Flushed) Entries: 0 (No Delete Operations)
32
Num CF-s (**) : 3
33
---------------------------------------------------------------------------------------------------------------------------------------
34
| Column Family | Size (*) | Avg. Key Size | Avg. Value Size | Compaction Style | Compression | Filter-Policy |
35
---------------------------------------------------------------------------------------------------------------------------------------
36
| default | 40.8 MB | 24 B | 64 B | kCompactionStyleLevel | NoCompression | bloomfilter (10.0) |
37
| column_family_name_000001 | 38.3 MB | 24 B | 64 B | kCompactionStyleLevel | NoCompression | bloomfilter (10.0) |
38
| column_family_name_000002 | 32.6 MB | 24 B | 64 B | kCompactionStyleLevel | NoCompression | bloomfilter (10.0) |
39
---------------------------------------------------------------------------------------------------------------------------------------
40
41
(*) Data is calculated at: 2023/04/09-12:37:27.398344
42
(**) Please see the log parser's documentation for more information
| Field Name | Meaning | Comments |
1 | Title (“Parsing of:…”) | Parsed Log full path | |
2 | Name | Parsed Log full path | |
3 | Start Time | Time of first log entry | |
4 | End Time | Time of last log entry | |
5 | Log Time Span | The time difference between the Start Time and the End Time | The value is expressed in number of days, hours, minutes, and seconds. |
6 | Creator | The creator of the library that generated the parsed log | Currently may be either Speedb or RocksDB |
7 | Version | The library’s version [Git Hash] | |
8 | DB Size (*) | The total size of all of the SST files in the database | The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….” |
9 | Num Keys Written | The total number of keys written to the DB | The value of the rocksdb.number.keys.written counter if available,otherwise, extracted from cumulative writes information (DB Stats) If none of the above is available: “Data Unavailable” |
10 | Avg. Written Key Size | The average size of a written key | Calculated from information in table_file_creation eventsIf there are no such events, “Data Unavailable” |
11 | Avg. Written Value Size | The average size of a written value | Calculated from information in table_file_creation eventsIf there are no such events, “Data Unavailable” |
12 | Error Messages | The messages in severity “ERROR” as they appear in the log | “No Error Messages” if there are no errors in the log |
13 | Fatal Messages | The messages in severity “FATAL” as they appear in the log | “No Fatal Messages” if there are no errors in the log |
14 | Ingest (*) | Total ingest | Calculated from cumulative writes information. The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….” If data is not available: “No Ingest Info Available” |
15 | Ingest Rate | Ingest Rate in Mega-Bytes per second | Calculated from cumulative writes information. The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….” If data is not available: “No Ingest Info Available” |
16 | Statistics | Whether statistics are available or not | |
17 | Writes | The total number of write operations <Percentage> (<Count> / <Total Ops>) Total Ops: Total number of operations (write + read + seek) Count: Total number of writes Percentage: Percentage of writes out of Total Ops | Calculated from the rocksdb.number.keys.written , rocksdb.number.keys.read , and rocksdb.number.db.seek counters.If statistics are not available: “Data Unavailable (No Statistics)” |
18 | Reads | The total number of read operations (Same as for the Writes field, but for reads) | Same as for Writes |
19 | Seeks | The total number of seek operations (Same as for the Writes field, but for seeks) | Same as for Writes |
20 | Deletes | <Percentage> (<Num Deletes> / <Num Entries>) Num Entries: The total number of flushed entries Num Deletes: The total number of deletes in Percentage: Percentage of deletes out of Num Entries | Information is gathered from Flush events in the log. If no such events exist: “Data Unavailable (No Flush Started Events)” |
21 | Num CF-s | Number of cf-s or |