Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(get): seek and filter tables by level #1372

Merged
merged 9 commits into from
Mar 31, 2022
Merged

Conversation

Little-Wallace
Copy link
Contributor

@Little-Wallace Little-Wallace commented Mar 29, 2022

What's changed and what's your intention?

close #1251

For a point get request we do not need to query every level because

  • Summarize your change (mandatory)

Every times we call a get request to state_store, we do not need to query every sstable. In most cases we can find it in some high level file and do not need to query in lower level, because the key with the larger epoch will always stay in higher level.

  • How does this PR work? Need a brief introduction for the changed logic (optional)
    It can changes the calculate and IO for point get request.

  • Describe clearly one logical change and avoid lazy messages (optional)

  • Describe any limitations of the current code (optional)

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

#1251

Signed-off-by: Little-Wallace <[email protected]>
Signed-off-by: Little-Wallace <[email protected]>
@codecov
Copy link

codecov bot commented Mar 29, 2022

Codecov Report

Merging #1372 (e9de105) into main (69e5fc0) will increase coverage by 0.27%.
The diff coverage is 69.44%.

@@             Coverage Diff              @@
##               main    #1372      +/-   ##
============================================
+ Coverage     70.22%   70.50%   +0.27%     
  Complexity     2766     2766              
============================================
  Files          1028     1028              
  Lines         90150    90192      +42     
  Branches       1790     1790              
============================================
+ Hits          63309    63586     +277     
+ Misses        25950    25715     -235     
  Partials        891      891              
Flag Coverage Δ
java 61.01% <ø> (ø)
rust 72.52% <69.44%> (+0.33%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rust/meta/src/barrier/mod.rs 79.83% <ø> (ø)
rust/storage/src/hummock/utils.rs 96.66% <ø> (+7.30%) ⬆️
rust/storage/src/hummock/mod.rs 70.86% <60.78%> (+7.01%) ⬆️
rust/storage/src/hummock/state_store_tests.rs 56.33% <88.88%> (+1.91%) ⬆️
rust/meta/src/hummock/compaction.rs 76.87% <100.00%> (+0.20%) ⬆️
.../src/executor/managed_state/aggregation/extreme.rs 90.09% <0.00%> (-0.28%) ⬇️
rust/frontend/src/binder/expr/function.rs 89.04% <0.00%> (-0.15%) ⬇️
rust/frontend/src/binder/values.rs 88.88% <0.00%> (ø)
...st/stream/src/executor/aggregation/agg_executor.rs 91.11% <0.00%> (ø)
rust/frontend/src/expr/type_inference.rs 93.23% <0.00%> (+0.13%) ⬆️
... and 9 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@Little-Wallace
Copy link
Contributor Author

Little-Wallace commented Mar 30, 2022

I bench branch main and this branch by ten times with the following command. of course, I only concern about getseq.
main: QPS 138000~145000 by 8times and 80000 by 2 times
current branch: QPS 142000 ~ 160000 by 8 times and 90549 by 2 times.
It seems to approve the get performance by 10%.

 cargo run --release --bin ss-bench -- \
 --benchmarks "writebatch,getseq"  --batch-size 10000 \
 --writes 100000 \
 --reads 100000 \
 --scans 0 \
 --deletes 0 \
 --concurrency-num 2 \
 --seed 233 \
 --statistics --store "hummock+memory"

@hzxa21 hzxa21 self-requested a review March 30, 2022 08:35
Signed-off-by: Little-Wallace <[email protected]>
@twocode twocode self-requested a review March 30, 2022 10:23
@twocode
Copy link
Contributor

twocode commented Mar 30, 2022

I bench branch main and this branch by ten times with the following command. of course, I only concern about getseq. main: QPS 138000~145000 by 8times and 80000 by 2 times current branch: QPS 142000 ~ 160000 by 8 times and 90549 by 2 times. It seems to approve the get performance by 10%.

 cargo run --release --bin ss-bench -- \
 --benchmarks "writebatch,getseq"  --batch-size 10000 \
 --writes 100000 \
 --reads 100000 \
 --scans 0 \
 --deletes 0 \
 --concurrency-num 2 \
 --seed 233 \
 --statistics --store "hummock+memory"

"hummock+memory" is using memory object store. The improvement might be more obvious with minio.

Comment on lines +270 to +276
tables.reverse();
for table in tables {
table_counts += 1;
if let Some(v) = self.get_from_table(table, &internal_key, key).await? {
return Ok(Some(StorageValue::from(v)));
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we assume tables in Overlapping level are stored in hummock version following the epoch order. Although it happens to be true currently, we should document this behavior in HummockManager:commit_epoch and CompactStatus::get_compact_task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. Shall I add a note in code of HummockManager:commit_epoch ?
For CompactStatus::get_compact_task I found that we will only switch to a new file builder when the user key of current key is different from the last key. so it does not matter.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I add a note in code of HummockManager:commit_epoch ?

Yes, feel free to do it.

Copy link
Contributor

@zwang28 zwang28 Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even now files in L0 are not guaranteed to be in epoch order, because new version generated by a compaction would have sorted L0 according to key_range of files. @soundOfDestiny
Though it doesn't affect the correctness of the code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even now files in L0 are not guaranteed to be in epoch order, because new version generated by a compaction would have sorted L0 according to key_range of files. @soundOfDestiny
Though it doesn't affect the correctness of the code here.

yes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in #2062

Copy link
Collaborator

@hzxa21 hzxa21 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM. Thanks for the PR.

@Little-Wallace
Copy link
Contributor Author

"hummock+memory" is using memory object store. The improvement might be more obvious with minio.

Yes. maybe I shall prepare a minio environment...

Copy link
Contributor

@twocode twocode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Signed-off-by: Little-Wallace <[email protected]>
@Little-Wallace Little-Wallace merged commit 4fbd5b1 into main Mar 31, 2022
@Little-Wallace Little-Wallace deleted the wallace/get branch March 31, 2022 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: optimize hummock get with layering lookup
5 participants