Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: send notification failed then retry #925

Merged
merged 4 commits into from
Mar 16, 2022

Conversation

HuaHuaY
Copy link
Contributor

@HuaHuaY HuaHuaY commented Mar 15, 2022

What's changed and what's your intention?

  1. add retry when notification manager sends failed.
  2. refactor notificaiton manager, use Mutex<NotificationManagerCore> instead of DashMap, which help us to control concurrency issues. Side effect is we can't call notify method in multiple threads at the same time.
  3. delete NotificationTarget.

I assume that after a worker is disconnected, before we tell notification manager that its sender should be removed, the worker will not recall subscribe RPC.
Consider we store Worker in delete_worker_node method, I do not store information in notification manager when deleting the worker's sender.

Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests

Refer to a related PR or issue link (optional)

related #361

@codecov
Copy link

codecov bot commented Mar 15, 2022

Codecov Report

Merging #925 (37f99e9) into main (5a77a9c) will increase coverage by 0.00%.
The diff coverage is 55.07%.

@@            Coverage Diff            @@
##               main     #925   +/-   ##
=========================================
  Coverage     71.80%   71.80%           
  Complexity     2766     2766           
=========================================
  Files           971      971           
  Lines         56790    56773   -17     
  Branches       1787     1787           
=========================================
- Hits          40776    40764   -12     
+ Misses        15124    15119    -5     
  Partials        890      890           
Flag Coverage Δ
java 61.23% <ø> (ø)
rust 75.86% <55.07%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
rust/meta/src/manager/catalog_v2.rs 0.00% <0.00%> (ø)
rust/meta/src/rpc/service/notification_service.rs 100.00% <ø> (ø)
rust/meta/src/cluster/mod.rs 73.49% <57.14%> (-0.80%) ⬇️
rust/meta/src/manager/notification.rs 68.42% <64.70%> (-14.92%) ⬇️
rust/meta/src/manager/catalog.rs 73.52% <100.00%> (-2.02%) ⬇️
rust/meta/src/rpc/server.rs 70.09% <100.00%> (ø)
rust/meta/src/model/cluster.rs 73.33% <0.00%> (-13.34%) ⬇️
rust/common/src/types/ordered_float.rs 25.82% <0.00%> (+0.33%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

rust/meta/src/manager/notification.rs Show resolved Hide resolved
Comment on lines 85 to 108
loop {
// Heartbeat may delete worker.
// We assume that after a worker is disconnected, before it recalls subscribe, we
// will call notify.
while let Ok(x) = self.rx.try_recv() {
keys.insert(x);
}
if keys.contains(worker_key) {
break;
}
let result = sender
.send(Ok(SubscribeResponse {
status: None,
operation: operation as i32,
info: Some(info.clone()),
}))
.await;
if result.is_ok() {
break;
}
time::sleep(Duration::from_micros(NOTIFY_RETRY_INTERVAL)).await;
}
NotificationTarget::ComputeNode => Box::new(self.be_observers.iter()),
NotificationTarget::Frontend => Box::new(self.fe_observers.iter()),
}
self.remove_by_key(keys);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These snippets seem to be duplicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the difference of two methods is just use the HashMap of frontend or backend. I think there are only two functions, and in the same source file, there is no need to encapsulate another layer at present.

Copy link
Contributor

@BowenXiao1999 BowenXiao1999 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@yezizp2012 yezizp2012 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@HuaHuaY HuaHuaY merged commit 58da156 into main Mar 16, 2022
@HuaHuaY HuaHuaY deleted the zehua/notification_manager_error_handling branch March 16, 2022 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants