Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cheaper compact strategy #2255

Open
MalikHou opened this issue Dec 29, 2023 · 0 comments
Open

Add cheaper compact strategy #2255

MalikHou opened this issue Dec 29, 2023 · 0 comments
Assignees
Labels
✏️ Feature New feature or request

Comments

@MalikHou
Copy link
Contributor

MalikHou commented Dec 29, 2023

Which PikiwiDB functionalities are relevant/related to the feature request?

other

Description

Currently, pika's manual compact function monitors key operations and compacts the key separately when the set threshold is reached (this function only takes effect for four data structures containing fields: hash, set, zset, and list). However, there is additional overhead for key monitoring in design, and a lot of additional compacts may be generated, so it is possible to simplify the compact logic and add a new compact type - the longest unused file compact type.

Proposed solution

The benign effects of manual compaction are:

  • Remove tombstones & reduce space enlargement:
    Tombstones generated by hash, set, zset, list;
  • Data rearrangement
    Let the discrete arrangement of hash, set, zset, list be converted into a compact arrangement.

compact

Learn from Lethe: A Tunable Delete-Aware LSM Engine and Constructing and Analyzing the LSM Compaction Design Space practices in two documents:
“A naïve way to choose file(s) is at random or by using a roundrobin policy [32, 35]. These data movement policies do not focus on optimizing for any particular performance metric, but help in reducing space amplification. To optimize for read throughput, many production data stores [30, 34] select the łcoldestž file(s) in a level once a compaction is triggered. Another common optimization goal is to minimize write amplification. In this policy, files with the least overlap with the target level are marked for compaction [13, 27]. To reduce space amplification, some storage engines choose files with the highest number of tombstones and/or updates [30]. Another delete-aware approach introduces a tombstone-age driven file picking policy that aims to timely persist logical deletes [51]. “

I suggest that pika’s new compact type can be solved by manually compacting the SSTable with the longest uncompacted state. This is actually an extension of the existing method of monitoring the number of tombstones.

Use the ability provided by rocksdb GetPropertiesOfAllTables to extract all sstable indicators to calculate file age and file deletion rate. When the file age is too large and has a certain deletion rate, the upper and lower boundaries of compact are determined as the start_key & end_key of the file. If the files are all young, then select the files with the highest deletion rate and perform compaction as above.

The benefits of doing this are:

  • No longer monitor the status of the key, and execute it regularly according to the state machine. The final dynamic result must be that the overlap of each layer is small;
  • Io is still controllable.

Alternatives considered

keep status quo~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✏️ Feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants