Log Parser
JSON Top-Level Objects Detailed Description
Notes about the display of Options
Options Diff
Overview
DB Options Diff
CF-s Options Diff - CF-s (Common) Sub-Object
CF-s Options Diff - CF-s (Specific) Sub-Object
All Options
Get Histogram
Multi-Get Histogram
Per CF Read Latency
Filter Effectiveness
Top-level sub-objects
Caches top-level sub-object
DB Counters top-level sub-object
Detailed top-level sub-object
histograms_human_readable.csv
histograms_tools.csv
compactions_stats.csv
compactions.csv
flushes.csv
Reporting Bugs and Proposing
Known Bugs
Contributing to the tool’s development
Getting Advice or Discussing the Tool
The Future (the Features)
Overview
Speedb's Log Parser is a tool that may be used to parse and process Speedb and RocksDB log files.
The tool extracts useful information from these logs and aids users in gaining insights about their systems with ease. It is expected to be a valuable tool for novices and experts alike.
The tool is written in the Python language and consists of a set of Python scripts.
It resides in a GitHub repository (https://github.com/speedb-io/log-parser). There is a README.md file in the root folder of the repository that describes how to install the tool, contribute to its development, etc.
Terminology
The following terms and abbreviations are used throughout this document:
· CF: Column-Family
· DB / DB-Wide: Applicable to the entire DB rather than to a specific CF.
Major Capabilities
Parses a single Speedb or RocksDB log file (parsing multiple logs in in the road-map)
Parses the log and processes the information about the following elements and entities (detailed description may be found in the sections below):
Metadata information about the instance that has generated this log file (e.g., library version generating the log) (General Section).
Speedb / RocksDB Options (Options Section):
All of the options with their values (db-wide and per column family).
Displays the difference between the options in the log and an applicable baseline.
Data about the size of the DB (DB-Size Section).
Flushes and Compactions (Flushes and Compactions sections respectively)
Information regarding DB read operations (Reads Section).
Information regarding seek DB operations (Seeks Section).
Warnings issued by the DB (Warnings Section).
Block cache statistics (Block-Cache-Stats Section)
Various Statistics: Counters, histograms, compaction stats, etc.
The tool generates multiple outputs. Details may be found in the sections that follow. The outputs include:
A short console output (the default output format)
A JSON file with detailed information.
CSV files containing information about the counters, counters histograms, compactions, and flushes statistics.
A detailed console output (the JSON file printed to the console).
Usage
Installation / Getting Started / Prerequisites
Please see the README.md file in the tool’s repository for this information.
Tool’s Outputs Description
Abbreviations, Conventions and Terms used in the log parser’s outputs
· tool: The log parser
· Log file / log: A RocksDB / Speedb information log. Not the WAL (Write-Ahead-Log)
· Parsed Log: The log file given to the log parser for parsing.
· db: Database
· db options: Options that are not specific to any cf. Applicable to the entire db.
· cf options: Options that are specific to an individual cf.
· cf (cf-s): Column Family (Column Families)
· Units:
o Size Units: B (Bytes), KB (Kilobytes), MB (Megabytes), TB (Terabytes)
o Numbers Units: K (Thousands), M (Millions), G (Billions).
Timestamps
Log timestamps (e.g., ‘2023/01/04-08:54:59.130735’) are in local time. The resolution is microseconds. All of the information displayed by the tool is using the timestamps from the parsed log as is.
Availability of Data
Log level
The info_log_level
db option controls the minimal level of issued log traces that will actually be written to the log (INFO_LEVEL
by default in production library builds). Using a higher logging level will result in a log file that has almost no useful information. In that case, the tool (as well as the log file itself) will be of little use in practice.
Availability of the information to parse and process
The tool relies solely on the information contained in the parsed log file. Consequently, the information it displays reflects that. For example, the tool can’t display the average size of a key or a value in the entire DB, or the total number of keys in the DB, as this information is not printed to the log file.
Other common cases in which data may not be available:
· Not having statistics (a configuration options)
· Rolled logs (see “Log Rolling” below)
· The number of cf-s (see “Number of Column Families and its implications” section below)
· Lack of applicable activity. For example, no flush or compaction for a cf.
When data is not available, the output of the log parser will reflect that. For example:
· “Filter Data Not Available”: When there is no information about the filter policy that was configured for a cf.
· "Data Unavailable (No Stats Available)": When the DB was configured not to use statistics.
In the sections that follow, where applicable, there is a description of the information elements in the log file that were used to generate the associated output. For example:
· Flush events and associated flush log traces
· DB Statistics Dump
· Counters and Histogram dumps
Options
DB Options and CF Options
There are options that may be configured individually for every cf. These are called cf options. Options that are not specific to a cf are called DB (or db-wide) options. DB Options apply to the entire DB.
Defaults
Speedb and RocksDB come with defaults for every option. A user may override any of these options when opening a db. A user may also override any of the cf options when creating a new cf explicitly.
Writing the options to the log
The options are written to the log in multiple cases:
· When a db is opened, the db and cf options are written to the log. This may occur when opening a new DB, or when recovering an existing db (with its existing cf-s and potentially newly created cf-s) from persistent storage.
· When a new cf is created, its options are written to the log.
· When a log is rolled, the db and cf options are written to the new log.
However, please see below for important information regarding applicable limitations.
Log Rolling
Log rolling (also called log rotation) is the process used to stop writing to an active log file, renaming it, and opening a new log file to which logging will be directed until the time comes to roll/rotate the log again. There are a few options that control this mechanism. In this document, log files that were rolled, are called rolled logs.
At the time of this writing, the defaults are to use a single log file (max_log_file_size
option), effectively avoiding log rolling altogether. Some more information may be found here.
Opening an existing DB (DB Recovery)
When the DB is opened as part of db recovery (opening an existing db), a new log file is created. This is unrelated to log rolling described above, but results in multiple log files nevertheless.
Number of Column Families and its implications
A user may create any number of column families (cf-s).
However, while opening a DB, only the options of the first 10 (hard coded) cf-s will be printed to the log.
The log will contain the following information for the 11th cf onwards:
So, we know that there are additional cf-s, and we know their names and id-s, but we do not know their options.
Unavailability of CF Names in Rolled Logs (Auto-Generated CF Names)
When a new DB is opened, or when a cf is created explicitly (via the CreateColumnFamily
or CreateColumnFamilies
APIs) , this is what the log contains (Only the first lines are shows):
The first line contains the name of the cf (“default” in this case)
However, when a new log file is created as part of log rolling, this first line is not printed to the log => the tool has no (simple and safe) way of knowing to which cf the options belong. As the “default” cf always exists and is the first one, the tool assumes the first to be “default”, but not for the rest.
In addition, there is no information about the skipped cf-s (those that are > 10, see above).
The rest of the log contains traces that may contain the names of the cf-s, but that is of no use in the association of options with their cf-s.
The log parser handles this by auto-generating cf names for the cf-s whose names are unknown, but for which there are options.
The auto generated names have the following format: UNKNOWN-CF-#<Number>
The following snapshot shows an example:
As a consequence of what is described above, when the tool parses a log that is created as part of log rolling and there are more than 10 cf-s, the tool doesn’t know how many cf-s there are.
This will be indicated to the user by displaying the following information in the console short output, or the json’s General object:
The Output Folder
Every run of the log parser generates multiple output files. All of the files generated in a single run are placed under a single folder. The folder’s location and name are determined as follows:
· The user may specify a parent folder via the ‘-o’ command line parameter.
· If the user doesn’t specify an output parent folder, ‘output_files’ will be used by default.
· If the parent folder doesn’t exist, it will be created.
· Under the parent folder, the parser will create a folder named ‘run_dddd’ and place the run’s output files under that folder. ‘dddd’ are 4 digits that compose the run’s number. They are incremented every run and wrap around if reaching ‘9999’.
· If the parent folder already contains ‘run_dddd’ folders, the tool will detect the largest one (converting ‘dddd’ to its numeric equivalent - N), create a new folder ‘run_mmmm’ where int('mmmm') = N+1
An example to put it all together:
The user specified ‘-o Output' as an argument when running the tool. There is no folder named ‘Output’ => The tool will create a new folder named ‘Output’, and the output files will be under ‘Output/run_0001/’
The user re-runs the tool, again specifying ‘-o Output’:
The run’s output files will be under ‘Output/run_0002’
Console Output
There are 2 console output flavors:
1. Short: A concise summary of the major information elements. May be used to get a basic understanding of the db’s operation according to the parsed log.
2. Detailed: The contents of the JSON file, printed on the console (please see the JSON file description section for more details). This option is available for power users that wish to use only the console. To be of use, these users will probably use JSON command-line filtering tools such as JQ.
Console Short Output Description
Example Output
Output Fields Description
Field Name
Meaning
Comments
1
Title (“Parsing of:…”)
Parsed Log full path
2
Name
Parsed Log full path
3
Start Time
Time of first log entry
4
End Time
Time of last log entry
5
Log Time Span
The time difference between the Start Time and the End Time
The value is expressed in number of days, hours, minutes, and seconds.
6
Creator
The creator of the library that generated the parsed log
Currently may be either Speedb or RocksDB
7
Version
The library’s version [Git Hash]
8
DB Size (*)
The total size of all of the SST files in the database
The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….”
9
Num Keys Written
The total number of keys written to the DB
The value of the rocksdb.number.keys.written
counter if available,
otherwise, extracted from cumulative writes information (DB Stats)
If none of the above is available: “Data Unavailable”
10
Avg. Written Key Size
The average size of a written key
Calculated from information in table_file_creation
events
If there are no such events, “Data Unavailable”
11
Avg. Written Value Size
The average size of a written value
Calculated from information in table_file_creation
events
If there are no such events, “Data Unavailable”
12
Error Messages
The messages in severity “ERROR” as they appear in the log
“No Error Messages” if there are no errors in the log
13
Fatal Messages
The messages in severity “FATAL” as they appear in the log
“No Fatal Messages” if there are no errors in the log
14
Ingest (*)
Total ingest
Calculated from cumulative writes information.
The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….”
If data is not available: “No Ingest Info Available”
15
Ingest Rate
Ingest Rate in Mega-Bytes per second
Calculated from cumulative writes information.
The point in time in which the value was calculated is given below the per-cf table: “(*) Data is calculated ….”
If data is not available: “No Ingest Info Available”
16
Statistics
Whether statistics are available or not
17
Writes
The total number of write operations
<Percentage> (<Count> / <Total Ops>)
Total Ops: Total number of operations (write + read + seek)
Count: Total number of writes
Percentage: Percentage of writes out of Total Ops
Calculated from the rocksdb.number.keys.written
, rocksdb.number.keys.read
, and rocksdb.number.db.seek
counters.
If statistics are not available: “Data Unavailable (No Statistics)”
18
Reads
The total number of read operations
(Same as for the Writes field, but for reads)
Same as for Writes
19
Seeks
The total number of seek operations
(Same as for the Writes field, but for seeks)
Same as for Writes
20
Deletes
<Percentage> (<Num Deletes> / <Num Entries>)
Num Entries: The total number of flushed entries
Num Deletes: The total number of deletes in
Percentage: Percentage of deletes out of Num Entries
Information is gathered from Flush events in the log.
If no such events exist: “Data Unavailable (No Flush Started Events)”
21
Num CF-s
Number of cf-s or
“Can't be accurately determined”
The log parser may not be able to know for sure the number of cf-s in the DB using the parsed log file. In that case it will show the text “Can't be accurately determined”. Please see the “Ability to determine the number of cf-s“ for more information.
22
23
Column Families Information Table
24
Column Family
The name of the CF
Will display any CF for which there was information in the log (TBD - See XXXX)
25
Size
The total size of all of the SST files of the CF
If no data available to calculate the size: “Data Unavailable”
26
Avg. Key Size
Average value of a key in newly created SST-s of the CF
Gathered from table_file_creation events in the log
27
Avg. Value Size
Average value of a value in newly created SST-s of the CF
Gathered from table_file_creation events in the log
28
Compaction Style
The compaction style used in this CF
Taken from the options for the CF. The values are as they appear in the log.
If not known: “UNKNOWN”
29
Compression
The compression type used in this CF
Taken from the options for the CF. The values are as they appear in the log.
If not known: “UNKNOWN”
30
Filter-Policy (<BPK>)
The filter policy used in this CF (If any)
<BPK> - The average BPK for the filter.
Taken from the options for the CF. The values are as they appear in the log.
If not known: “UNKNOWN”
JSON File
Upon the user’s request, a JSON file is generated in the output folder.
The JSON file is a text file. It is best viewed by a JSON viewer. The following screen shots were taken from the Firefox browser that has a built-in JSON viewer.
JSON is a hierarchical format. The following sections describe the contents of the JSON file accordingly.
Top Level JSON Objects
The JSON file contains the following top-level objects (please see the following sections for details on every object):
General: The same information that is displayed in the short console output.
Options:
The differences between the options in the log and an applicable baseline version (if available).
All of the DB-Wide and per-cf options in the log
DB-Size: Summary information of ingest data and per-cf and level size growth.
Flushes: Per CF flush-related information..
Compactions: Per CF compactions-related information.
Reads: Get / Multi-Get operations related information
Seeks: Seek operations related information
Warnings: Warnings statistics.
Block-Cache-Stats: Statistics about the use of the block-cache.
CSV-s: Paths of generated CSV-s.
JSON Top-Level Objects Detailed Description
General
This includes the same information that is displayed in the short console output. Please see that section for more details.
Options
The options object consists of 2 sub-objects:
1. Diff
2. All Options
Notes about the display of Options
· WBM Pseudo-Options: The DB-Wide options contain pseudo-options for the Write Buffer Manager. They are not true Speedb / Rocksdb options. They are configuration parameters given to the WBM during its construction. They are displayed in the log together with the “official” options as follows:
And this is how they are displayed in the JSON (the ‘write_buffer_manager_’ prefix was added):
· Block-Based-Table-Options (CF Options sub-object)
CF Options consist of a set of top-level options. One of these options is the table_factory. The tool assumes the table_factory is of type BlockBasedTableFactory
(Block-Based-Table-Factory). The options for this entity are displayed separately, under a sub-object called “Block-Based Table”.
This is an example from a log file that shows how these options are printed to the log:
block_cache_options: These are displayed in the log as follows:
In the JSON, a ‘block_cache_’ prefix is used as follows:
· metadata_cache_options: These are displayed in the log as follows:
In the JSON, a ‘metadata_cache_’ prefix is used as follows:
· Options that are pointers
Some options are pointers (values that start with ‘0x’ and only contain hexadecimal digits). It is impossible to know from the log the real entity that the pointer points to. A pointer may or may not be initialized. Uninitialized pointers are displayed in the log in multiple ways (e.g., ;(nil)', ‘None’, etc.).
The value of an initialized pointer is meaningless in and of itself. Its value is the address of the associated entity in the address space of its containing process, and is unique within that process. Its only use, in the context of log files, is the ability to understand that the same entity is shared. For example, if the same block cache is shared between multiple cf-s, then all of them will have the same value for the block cache’s pointer option.
When the value of an initialized pointer is displayed in the JSON file, it will be displayed as “Pointer (<pointer value>).
Uninitialized pointers will be displayed as “Pointer (Uninitialise)”.
Options Diff
Overview
This object contains the differences between the options in the log and the default options in an applicable baseline version. The baseline version is the closest available version of the applicable creator of the library (RocksDB or Speedb).
This object contains the following sub-objects:
1. Baseline: The version that is used as the baseline with its creator in parentheses.
2. Baseline Log: The full path to the baseline log file.
3. DB: The diff in the db options.
4. CF-s: Per CF options diff. This object consists of 2 sub-objects:
1. CF-s (Common): Contains the options that are identical in all of the cf-s in the parsed log but are different than the corresponding option in the baseline.
2. CF-s (Specific): Contains the options that are not identical in the parsed log, and the corresponding option in the baseline.
Please see the section “CF-s Options Diff - CF-s (Common) Sub-Object” below for more details.
There are 3 cases with respect to a diff between the baseline and the parsed log:
1. The option exists in the baseline but was removed in the version that generated the parsed log.
2. The option doesn’t exist in the baseline but was added to the version that generated the parsed log.
3. The option exists in both, but the values are different.
Notes
· The Options Diff object will only show options in which there is a difference.
· Every entry in the diff contains 2 lines, the first for the baseline (called “Baseline”), and the second for the parsed log (called “Parsed Log”).
· When an option doesn’t exist, “Missing” will be displayed.
· If an option was renamed in a version, it will be deemed as a new option and it will appear twice, first for the old name (missing in parsed logs) and then for the new name (missing in the baseline).
· Pointers handling:
o The value of initialized pointers will always be different in the baseline and the parsed log. They will be deemed equal for the purposes of the comparison.
o If both pointers are uninitialized, they will be deemed equal.
· In all other cases, the values will be displayed as they appear in the log
DB Options Diff
This is an example of a DB options diff:
CF-s Options Diff - CF-s (Common) Sub-Object
The following two snapshots are an example of this sub-object:
Notes:
· As noted above, the Common sub-object contains options whose value is identical in all of the cf-s of the parsed log. So, for example, the memtable_factory is "speedb.HashSpdRepFactory" in all the cf-s of the parsed log. Its value is “SkipListFactory” in the baseline.
· It consists of two sub-objects:
o CF: The cf options that are not part of the block-based-table-format options
o Block-Based Table: The block-based-table-format options.
CF-s Options Diff - CF-s (Specific) Sub-Object
Notes:
As noted above, this sub-object only contains options that are not identical in all of the cf-s of the parsed log. Please note however that all such options are included in this sub-object, even if its value is the same as the corresponding option in the baseline.
“Unknown-CF-#<I>”: As explained in the “Ability to determine the number of cf-s” section, these represent cf-s whose name can’t be determined, that have options at the top of the parsed log.
All Options
This object lists all of the options that appear in the log file, in the same order:
As in the options diff sub-object, there are sub-objects for the db-wide options and the cf-s options:
The sub-objects themselves, contain additional sub-objects, using the same principles described above for the options diff sub-object:
The CF-s (Common) sub-object contains all of the options that are identical in all of the cf-s of the parsed log.
The CF-s (Specific) sub-object contains all of the options that are not identical in all of the cf-s of the parsed log. For example:
As may be seen in this example, the “default” cf has options that are not the same as the corresponding options in the “column_family_name_000001” and the “column_family_name_000002” cf-s.
DB-Size Sub-Object
This sub-object consists of the following:
Ingest: Ingest information taken from the last cumulative writes log trace:
Ingest:
Ingest Rate:
Ingest Time:
CF-s Growth: Per CF and level report on the difference in the size from the start of the log to its end. The information is obtained from compaction stats dumps.
Per CF and level, the information is displayed as <Start Size> -> <End Size> (<Difference>)
The “Sum” entry shows the total for the CF (sum of all levels of the CF).
Flushes Object
This Object displays per-cf information about flushes.
Per CF, there is information for all of the flushes that occurred in the CF.
The following information is displayed, per CF:
Name
Meaning
Source
L0->L1 Write-Amp
The write-amplification for Level0 → Level1 for this CF.
Compaction Stats
Per <Flush Reason> Sub-Object
Sub-object per flush reason in the flushes for this CF
All of the fields below are in the context of a CF and flush reason.
Flush Events
Sizes Histogram
A histogram of the number of flushes per total data size range (bucket) in a flush.
The numbers are the number of flushes of all flushes for this cf and flush reason.
For example, in the snapshot above, there were 31 flushes whose total data size was more than 31 MB).
Num Flushes
The number of flushes
Min Duration
The minimum duration that a single flush took to complete
Max Duration
The maximum duration that a single flush took to complete
Min Num Memtables
The minimum number of memtables that were part of a single flush.
Max Num Memtables
The maximum number of memtables that were part of a single flush.
Min Total Data Size
The minimum total data size in a single flush.
Max Total Data Size
The maximum total data size in a single flush.
Compactions Object
This Object displays per-cf information about compactions.
All of the information in this sub-objects is based on traces of compaction jobs (events and associatedlog traces), and compaction level stats dumps.
The sub-object consists of the following:
· Largest compaction size of all compactions in the log.
· Per CF compactions information (see below).
Per CF Information
The per-cf compactions information is based on all of the compactions that occurred in the CF.
The following information is displayed, per CF:
Name
Meaning
Per <Flush Reason> Sub-Object
Sub-object per flush reason in the flushes for this CF
All of the fields below are in the context of a CF and flush reason.
Num Compactions
The number of compactions
Min Compactions BW
Minimum write rate of a compaction
Max Compactions BW
Maximum write rate of a compaction
Comp
The elapsed time of all compactions
Comp Merge CPU
The elapsed CPU time of all compactions
Per Level Write-Amp
Write amplication per level, and their total (SUM)
Reads Object
This object contains information about read operations performed by the user (Get and Multi-Get) and associated aspects.
Get Histogram
This is the last dump of the rocksdb.db.get.micros
histogram.
It will be available only when statistics are enabled
Multi-Get Histogram
This is the last dump of the rocksdb.db.multiget.micros
histogram.
It will be available only when statistics are enabled
Per CF Read Latency
Per CF, displays information about read performance across all of the CF’s levels. The information is obtained from “File Read Latency Histogram By Level” dumps:
· Num Reads: Total number of reads.
· Avg. Read Latency: The average read latency
· Max Read Latency: The maximum latency
· Read % of All CF-s: The percentage of reads performed on this CF relative to the total number of reads on all CF-s (the snapshot shows a single CF so the percentage is 100%).
Filter Effectiveness
Consists of:
CF-s: Per CF:
Filter-Policy: The type of filter used (if any).
Avg. BPK: The average effective BPK for that filter
Counters: Global filters counters for all filters (available only if statistics are available):
False-Positive Rate: The effective false positive rate of all of the filters. This is displayed as a “1 in N” as it is the convention.
False-Positives: The number of times the filters answered “Key May Exist”, but the key wasn’t actually found.
Negatives: The number of times the filters answered “Key Doesn’t Exist”
True-Positives: The number of times the filters answered “Key May Exist” and the key was actually found.
Seeks Object
This Object shows information about user seek operations. The information is gathered from counters so it will be available only when statistics are enabled.
The object consists of the following:
· Num Seeks: The number of seek operations performed by the user [rocksdb.number.db.seek
counter]
· Num Found Seeks: The number of calls to seek (SeekToLast
, Seek
, SeekForPrev
, and SeekToFirst
) that succeeded (iterator was placed on a valid value) [rocksdb.number.db.seek.found
counter]
· Num Nexts: The number of calls to Next
on the iterator [rocksdb.number.db.next
counter]
· Num Prevs: The number of calls to Prev
on the iterator [rocksdb.number.db.prev
counter]
· Avg. Seek Range Size: (Num Nexts + Num Prevs) / Num Seeks
· Avg. Seeks Rate Per Second: The average number of seek operation per second during the log’s time span.
· Avg. Seek Latency: The average seek latency taken from the rocksdb.db.seek.micros
histogram.
Warnings Object
This object contains minimal summary information about the warnings found in the log.
The object consists of the following:
DB: Warnings that are not associated with a specific cf. This will be displayed the number of such warnings. Currently all of the DB warnings are not categorized (“Other”).
CF-s: Warning that are associated with a CF. Per CF:
A histogram of the number of warnings per category of warnings. Existing categories include:
Write-Stop: The database warns about a write stop
Write-Delay: The data base warns about a write delay
Other: Any warning not of one of the categories listed above.
Block-Cache-Stats Objects
This object contains block cache information and associated statistics found in the log (if available).
If you are unfamiliar with RocksDB’s block cache, please read RocksDB Block Cache.
Please note the following:
· The block cache is optional and is configured per cf.
· Block Cache Sharing:
o Multiple cf-s may share the same block cache.
o Multiple db-s may share the same block cache. However, there is no information in a single log file that provides any information about the sharing. RocksDB displays the information for the entire cache, so it’s impossible to know how it is used by the entities that share it (cf-s and db-s). Speedb adds per-cf information about the usage of the cache.
· Cache-Id: A block cache is uniquely identified by its id (e.g., ‘LRUCache@0x7fde57dbf300#1’ in the image below). The cache’s id consists of 3 parts - <name>@<pointer>#<process-id>
o Name: The value of the block_cache_name field in the cf’s table options ('LRUCache').
o Pointer: The value of the block_cache field in the the cf’s table options ('0x7fde57dbf300'). This value may serve as a unique identifier for a specific cache entity within a process.
o Process-id: The process id of the process that generated this log ('1'). The log parser may or may not display this part of the name.
· Cache Roles: The cache classifies its contained blocks according to their type or role. Statistics are gathered for specific roles. All other roles are grouped under the ‘Misc’ role. The role’s name indicates its type (e.g., ‘FilterBlock’ is for Filter blocks). The ‘WriteBuffer’ role is shown when the Write Buffer Manager is configured to cost its reservations in the block cache.
· The block cache is configured with a capacity. However, unless it is configured to use strict capacity (which is not done in practice), the usage may exceed the capacity.
Top-level sub-objects
There are 3 top-level sub-objects
1. Caches
2. DB Counters
3. Detailed
They are described in detail below.å
Caches top-level sub-object
This sub-object consists of the following:
Per cache-id (cache entity - “LRUCache@0x55deca6980f0”):
Capacity: The configured cache’s capacity.
Num Shards: The number of shards to which it is divided (2 ^ num_shard_bits).
Shard Size: The size of a single shared (capacity / number of shards)
CF-s:
Per-cf value of the
cache_index_and_filter_blocks
cf block based table option.
Index Block / Filter Block: Statistics for the Index and Filter blocks that belong to cf-s that are part of this DB and share this cache. The data is obtained from
table_file_creation
events, so it is independent of its containment in the cache. The following data is displayed:Avg. Size: The average size of a block of that type.
Max Size: The largest block of that type
Max Size At: At what time was the table that contained the largest block created.
DB Counters top-level sub-object
This sub-object includes block cache miss and hit counters:
· cache_add / cache_miss cache_hit: Total number of cache misses / hits / adds (for all roles)
· index_add / index_miss / index_hit: Number of index block cache misses / hits / adds.
· filter_add / filter_miss / filter_hit: Number of filter block cache misses / hits / adds.
· data_add / data_miss / data_hit: Number of filter block cache misses / hits / adds.
Detailed top-level sub-object
This sub-object includes the individual processed block cache stats dumps such as the following:
Per cache-id, the following global information:
· Capacity: The block cache capacity (as configured)
· Usage: The usage at the end of the log
This global information is followed by the individual dumps that are printed throughout the log. Per dump time, the following information is displayed, that reflects the state at that time:
· Usage: The usage
· Per role (Index / Filter / Write-Buffer / Misc):
o Count: The number of blocks of that role currently in the cache
o Size: The total size of those blocks
o Portion: Their percentage of the total cache’s capacity
CSV Files
During a run, the tool generates multiple CSV files. These files are best viewed with a spreadsheet viewing and editing software (e.g., LibreOffice Calc, Google Sheets, Microsoft Excel, etc.). These files are also an excellent input for further analysis and processing by applicable tools.
It may be that there is no applicable data in the parsed log to allow the generation of a CSV file (e.g., statistics are unavailable) . The tool reports, per CSV file, whether it was generated or not, and, if it was, the full path of the file.
The following table lists summarizes this:
CSV File name
Source
Generation Info
counters.csv
Statistics dump
statistics must be enabled
histograms_human_readable.csv / histogram_tools.csv
Statistics dump
statistics must be enabled
compaction_stats.csv
Compaction stats dumps
compactions.csv
Compaction events (compaction_started
and compaction_finished
)
flushes.csv
Flush events (flush_started
and flush_finished
)
The sections that follow provide detailed information about the individual csv files.
counters.csv
This file contains the values of the dump of the counters in the log:
This file is generated only when statistics are enabled.
The header line contains the names of the counters (e.g., ‘rocksdb.block.cache.miss’), in the order they appear in the log.
The row consists of the timestamp of the dump, followed by the values of all the counters at that time.
The dump to the log includes all of the tracked counters, even if their value is always 0. The CSV will not show counters whose values are always 0.
Counters Histograms CSV-s
These csv-s are for the dump of the counters histograms in the log. These are available when statistics are enabled
Two files are generated, one is aimed at a human reader (histograms_human_readable.csv) and the other at tools that will process the CSV (histograms_tools.csv). Both files contain the same information, just arranged differently.
histograms_human_readable.csv
histograms_tools.csv
compactions_stats.csv
This CSV includes the fields of the Compaction stats dumps:
compactions.csv
This CSV includes the fields of the compaction_started and compaction_finished events:
There is a large number of fields so only a subset is shown here. Please view an actual generated compactions.csv to see all of the fields.
flushes.csv
Contains a subset of the fields from the flush_started and flush_finished events
Testing
There are unit tests for the tool under the test folder. The tests are written in the pytest framework.
pytest installation
Running the unit tests:
The tool should pass all of the unit tests.
A successful run should like like this:
Miscellaneous
Reporting Bugs and Proposing
If you find a bug or wish to propose a new feature or an improvement in an existing feature, please open an issue in our GitHub
Known Bugs
The major issues that and known bugs is listed in the project’s GitHub.
Contributing to the tool’s development
The log parser is an Speedb open source project. Contributions to its development are very welcome. Please see the README.md in the repository for details.
Getting Advice or Discussing the Tool
Please use the following Discord Channel to get assistance, ask questions, etc.
The Future (the Features)
Requests for new features will be opened in GitHub
Last updated