-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cheaper compact strategy #2255
Labels
✏️ Feature
New feature or request
Comments
This was referenced Mar 12, 2024
QlQlqiqi
added a commit
to QlQlqiqi/pika
that referenced
this issue
Jul 16, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Which PikiwiDB functionalities are relevant/related to the feature request?
other
Description
Currently, pika's manual compact function monitors key operations and compacts the key separately when the set threshold is reached (this function only takes effect for four data structures containing fields: hash, set, zset, and list). However, there is additional overhead for key monitoring in design, and a lot of additional compacts may be generated, so it is possible to simplify the compact logic and add a new compact type - the longest unused file compact type.
Proposed solution
The benign effects of manual compaction are:
Tombstones generated by hash, set, zset, list;
Let the discrete arrangement of hash, set, zset, list be converted into a compact arrangement.
Learn from Lethe: A Tunable Delete-Aware LSM Engine and Constructing and Analyzing the LSM Compaction Design Space practices in two documents:
“A naïve way to choose file(s) is at random or by using a roundrobin policy [32, 35]. These data movement policies do not focus on optimizing for any particular performance metric, but help in reducing space amplification. To optimize for read throughput, many production data stores [30, 34] select the łcoldestž file(s) in a level once a compaction is triggered. Another common optimization goal is to minimize write amplification. In this policy, files with the least overlap with the target level are marked for compaction [13, 27]. To reduce space amplification, some storage engines choose files with the highest number of tombstones and/or updates [30]. Another delete-aware approach introduces a tombstone-age driven file picking policy that aims to timely persist logical deletes [51]. “
I suggest that pika’s new compact type can be solved by manually compacting the SSTable with the longest uncompacted state. This is actually an extension of the existing method of monitoring the number of tombstones.
Use the ability provided by rocksdb
GetPropertiesOfAllTables
to extract all sstable indicators to calculate file age and file deletion rate. When the file age is too large and has a certain deletion rate, the upper and lower boundaries of compact are determined as thestart_key
&end_key
of the file. If the files are all young, then select the files with the highest deletion rate and perform compaction as above.The benefits of doing this are:
Alternatives considered
keep status quo~
The text was updated successfully, but these errors were encountered: