Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(optimizer): implement column pruning for value, project, agg #710

Merged
merged 10 commits into from
Mar 7, 2022

Conversation

xxchan
Copy link
Member

@xxchan xxchan commented Mar 4, 2022

What's changed and what's your intention?

Summary

Replaced the default implementation of ColPrunable for these nodes:

  • value: same as table scan
  • project
  • agg

How does this PR work?

General idea of how to prune columns (Also partly described in the doc of ColPrunable)

  • For leaf nodes (scan, value), just select corresponding columns.
  • For nodes which can control its output schema (project, agg, join)
    1. Change its output schema
    2. Calculate its needed columns.
    3. Rewrite its used column indexes in every field and prune children.
  • For other nodes whose output schema is the same as its child (filter, limit, topn).
    1. Check whether it needs more columns than required.
    2. Rewrite its used column indexes in every field and prune children.
    3. If true in step 1, insert a project above.

misc:

  • I added assert_eq to validate required_cols. Is this style OK?
  • filter's pruning is updated: insert project only when needed

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

#640

@codecov
Copy link

codecov bot commented Mar 4, 2022

Codecov Report

Merging #710 (355f511) into main (c99e302) will increase coverage by 0.18%.
The diff coverage is 93.99%.

Impacted file tree graph

@@             Coverage Diff              @@
##               main     #710      +/-   ##
============================================
+ Coverage     71.55%   71.74%   +0.18%     
  Complexity     2706     2706              
============================================
  Files           902      902              
  Lines         52071    52334     +263     
  Branches       1781     1781              
============================================
+ Hits          37261    37546     +285     
+ Misses        13995    13973      -22     
  Partials        815      815              
Flag Coverage Δ
java 58.86% <ø> (ø)
rust 77.26% <93.99%> (+0.22%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rust/common/src/expr/agg.rs 44.44% <ø> (ø)
rust/frontend/src/expr/mod.rs 71.42% <ø> (ø)
...st/frontend/src/optimizer/plan_node/col_pruning.rs 50.00% <ø> (ø)
...frontend/src/optimizer/plan_node/logical_values.rs 38.09% <0.00%> (-8.97%) ⬇️
rust/frontend/src/utils/column_index_mapping.rs 93.22% <20.00%> (-3.28%) ⬇️
...t/frontend/src/optimizer/plan_node/logical_scan.rs 80.95% <75.00%> (-2.39%) ⬇️
...frontend/src/optimizer/plan_node/logical_filter.rs 93.91% <95.91%> (+0.95%) ⬆️
...rontend/src/optimizer/plan_node/logical_project.rs 87.38% <96.66%> (+19.53%) ⬆️
...st/frontend/src/optimizer/plan_node/logical_agg.rs 93.78% <99.36%> (+93.78%) ⬆️
rust/meta/src/hummock/compaction.rs 80.89% <0.00%> (+0.63%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c99e302...355f511. Read the comment docs.

@xxchan xxchan marked this pull request as ready for review March 4, 2022 16:39
@xxchan xxchan changed the title feat(optimizer): implement column pruning for remaining nodes feat(optimizer): implement column pruning for value, project, agg Mar 4, 2022
@fuyufjh
Copy link
Member

fuyufjh commented Mar 5, 2022

I added assert_eq to validate required_cols. Is this style OK?

Sure, +1 for this

required_cols
.ones()
.map(|id| {
let agg_call = self.agg_calls[id].clone();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems not correct? The output schema of LogicalAgg consists of 2 parts: fields of the group keys and fields for each aggregation function (AggCall), so what's meaning of agg_calls[id] ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, recommend replacing id with index to avoid ambiguity with ColumnId.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, more specifically, the output of the agg operator in query select a,b,max(c),min(c) group by a,b has 4 columns which is [a,b,max(c),min(c)]

@xxchan xxchan requested review from fuyufjh and st1page and removed request for st1page March 7, 2022 08:09
Copy link
Contributor

@st1page st1page left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@fuyufjh fuyufjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@st1page st1page merged commit 1c0bbe0 into main Mar 7, 2022
@st1page st1page deleted the column_pruning+ branch March 7, 2022 09:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants