-
Notifications
You must be signed in to change notification settings - Fork 572
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(batch): split plan into fragments for local execution mode #3032
Conversation
Codecov Report
@@ Coverage Diff @@
## main #3032 +/- ##
==========================================
- Coverage 73.72% 73.66% -0.06%
==========================================
Files 739 739
Lines 101713 101792 +79
==========================================
+ Hits 74985 74986 +1
- Misses 26728 26806 +78
Flags with carried forward coverage won't be shown. Click here to find out more.
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
src/frontend/src/scheduler/local.rs
Outdated
} else { | ||
// We should only have one child stage of the root stage for now. | ||
assert_eq!(second_stage_id.len(), 1); | ||
let second_stage_id = second_stage_id.iter().next().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The child stages of root maybe more than 1, think about following sql:
select * from t1, t2 where t1.a = t2.a
, in this case we have plan like
HashJoin
/ \
Exchange Exchange
/ \
TableScan TableScan
You can refer to https:/singularity-data/risingwave/blob/408e9fb5249b12b1b457287adc4deba13c301f18/src/frontend/src/scheduler/distributed/stage.rs#L354 for child plan fragment and exchange id mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought select * from t1, t2 where t1.a = t2.a
is not considered as a point query
and thus is executed in the normal distributed mode
instead of local execution mode
. 🤔
The example from quip doc:
SELECT pk, t1.a, t1.fk, t2.b FROM t1, t2 WHERE t1.fk = t2.pk AND t1.pk = 114514
which is a point query as t1.pk = 114514
is specified and thus it uses a look up join
instead of hash join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Choosing the appropriate plan should be the optimizer's responsibility, and the scheduler should not have such strong assumptions about the plan. The local execution mode
is optimized for point query
, but it doesn't mean the user can't execute hash join/sort-merge join.
The example SELECT pk, t1.a, t1.fk, t2.b FROM t1, t2 WHERE t1.fk = t2.pk AND t1.pk = 114514
picks lookup join only when t2.pk has index or is a primary key on it, and chooses hash join when it doesn't have an index.
5e7bfa4
to
a355dfd
Compare
Added two more tests for local mode, i.e. |
will support |
Excuse me, what's the meaning of local execution mode? |
Some operators are directly executed on the frontend node instead of the compute nodes. Only those queries of certain execution plans(relatively simple) are classified as queries executed in local execution mode. The time spent on the execution of these queries is typically dozens of milliseconds, thus reducing RPCs becomes profitable. |
What's changed and what's your intention?
Split plans into fragments that can be executed in local execution mode.
Move some test cases into
.part
so that both distributed and local modes can use them.Will add more test cases and support needed in some future PRs.
Checklist
./risedev check
(or alias,./risedev c
)Refer to a related PR or issue link (optional)
#2978