Bug 1672951 - Implement a Metrics Ping Scheduler in glean-core #1599

chutten · 2021-04-28T20:53:13Z

The MPS has actually been written for a little while. I strove for as direct a translation of MetricsPingScheduler.{kt|swift} as was practicable (we can always refactor it later). The hard part has been tests.

I've tried making this testable in a few ways. The first way you can see in the second commit and is straightforward, but it can't actually observe most of the behaviour we hope to test.

The second way apes the Policy approach of gecko JSMs. (I had some struggles with the borrow checker on this approach, so it's very possible this is more ornate than it needs to be) I think this Policy approach is the better one because it allows us to inspect and test every operation of the MPS which, as history has taught us, is absolutely crucial for peace of mind.

(( I also attempted to automock Glean, but it would require some serious modifications to make the Glean struct's impl suit the mocking library (mockall), so I abandoned it. ))

I'm looking for some honest feedback about this, both for whether the approach itself is distasteful as well as what patterns I've missed in pursuing things in this manner. I've taken heart from the Book suggesting RefCell-based interior mutability for test mocks itself to mean I'm not completely out to lunch here, but I might be.

badboy

I think the general approach is sound and indeed a 1:1 mapping of the "old" implementation.
I think if we treat this implementation as "experimental" (maybe even put it behind a feature with that name?) we can consider some (breaking) changes on it as followups.

I like the testing approach, though might bikeshed on the Policy name (but don't have a better one yet).
(and of course we need to add some more tests)

I have a bunch of comments I left inline.

badboy · 2021-04-29T09:33:43Z

glean-core/src/lib.rs

@@ -129,6 +134,11 @@ pub struct Configuration {
 pub max_events: Option<usize>,
 /// Whether Glean should delay persistence of data from metrics with ping lifetime.
 pub delay_ping_lifetime_io: bool,
+ /// The application's build identifier. If this changes and use_core_mps is `true`,


Changes when?

badboy · 2021-04-29T09:39:18Z

glean-core/src/scheduler.rs

+/// Must be called before draining the preinit queue.
+/// (We're at the Language Bindings' mercy for that)
+pub fn schedule(glean: &Glean) {
+ let now = Local::now();


This call is what's currently causing some crashes in FOG and it's why we have this awkward workaround now: https:/mozilla/glean/blob/main/glean-core/src/util.rs#L53-L89

We should probably rely on the same workaround (probably possible to split local_now_with_offset into local_now and local_now_with_offset which calls local_now and turns DateTime<Local> into DateTime<FixedOffset>)

...but won't we have problems with any constructed value of DateTime<Local> no matter what? I mean, if Local's timezone is problematic, there's not really much we can do with it.

I guess I'll rewrite scheduler to use DateTime<FixedOffset> and use local_now_with_offset to get it.

If we use the wrong timezone consistently then it's still fine.
But we shouldn't crash when encountering something wrong.

Using DateTime<FixedOffset> throughout should be still correct usage.

glean-core/src/scheduler.rs

badboy · 2021-04-29T09:41:22Z

glean-core/src/scheduler.rs

+ if let Some(last_sent_build) = last_sent_build_metric.get_value(&glean, INTERNAL_STORAGE) {
+ // XXX: If `app_build` is longer than StringMetric's max length, we will always
+ // treat it as a changed build when really it isn't.
+ if &last_sent_build != glean.app_build.as_ref().unwrap() {


Are we sure we always have Some app_build here?
Also we use it below again, so maybe put that into a variable.

We have app_build iff cfg.use_core_mps. But I'm not happy with how it's only enforced by logic not language. cfg.app_build is always present (often "unknown"), I was using Option to signal more than the app_build... maybe I should add use_core_mps to struct Glean?

badboy · 2021-04-29T09:43:17Z

glean-core/src/scheduler.rs

+ // treat it as a changed build when really it isn't.
+ if &last_sent_build != glean.app_build.as_ref().unwrap() {
+ last_sent_build_metric.set(&glean, glean.app_build.as_ref().unwrap());
+ log::info!("Builds don't match. Sending 'metrics' ping");


Maybe something more like "App build changed. Sending 'metrics' ping"?

badboy · 2021-04-29T09:53:44Z

glean-core/src/scheduler.rs

+ Local::today().and_hms(SCHEDULED_HOUR, 0, 0)
+ };
+
+ // Other MPSes cancel outstanding tasks here. I'm not sure we need to, since schedule() is only


TODO or bug for a followup

badboy · 2021-04-29T09:59:24Z

glean-core/src/scheduler.rs

+
+const SCHEDULED_HOUR: u32 = 4;
+
+static THREAD_GENERATION: OnceCell<Arc<AtomicU32>> = OnceCell::new();


This can be a Lazy and then we don't even need the explict init call later.
It also should have some docs on what it's for

badboy · 2021-04-29T10:06:54Z

glean-core/src/scheduler.rs

+ let mut policy = TestPolicy {
+ ..Default::default()
+ };
+ policy.app_build = "a build".to_string();
+ policy
+ .last_sent_build
+ .replace(Some("a different build".to_string()));


This can probably be simplified to:

let mut policy = TestPolicy { app_buid: "a build".to_string(), last_sent_build: Some("a different build".to_string)), ..Default::default() };

badboy · 2021-04-29T10:09:42Z

glean-core/src/lib.rs

+ } else {
+ // Can only kick off the "metrics" ping scheduler after we have a global Glean.
+ let glean = &GLEAN.get().unwrap().lock().unwrap();
+ scheduler::schedule(&glean);


I would probably swap around the if condition here:

if GLEAN.set(Mutex::new(glean)).is_ok() { // Can only kick off the "metrics" ping scheduler after we have a global Glean. let glean = &GLEAN.get().unwrap().lock().unwrap(); scheduler::schedule(&glean); } else { // ... }

That is: handle the error case last

badboy · 2021-04-29T10:18:22Z

glean-core/src/scheduler.rs

+ reason: Option<&'static str>,
+ ) {
+ let thread_gen = Arc::clone(&THREAD_GENERATION.get_or_init(Default::default));
+ std::thread::Builder::new()


So I'm a bit worried about this new thread. It's long running and sleeping all the time.
I think we had the problem before that Firefox tooling wants us to ensure threads are stopped before the main thread. As it stands we don't have a way to inform this thread.

(Additionally I'd wish we would just have a way to hook into whatever-existing background-threaded timers the surrounding thing gives us :( )

chutten · 2021-04-30T20:51:52Z

Notable changes:

Split ping submission away from scheduling for clarity
Use of Condvar for scheduling, cancellation
Lotsa tests

I should've caught all the previous comments. Lemme know if I missed any.

....and clippy immediately fails for something make lint-rust didn't catch locally? Weird. Maybe my clippy's out of date. Anyhoo. Nitfixes incoming.

chutten · 2021-04-30T21:14:07Z

Had to add a lock for scheduler tests, which makes my nose twitch. Being able to cancel the scheduler from any thread without caring if the scheduler's got work to do is a feature, not a bug, but any time I have to do this sort of workaround it makes me wonder if I'm decreasing safety.

badboy

Bunch of questions inline.
Overall this code looks good though and we're close to landing this.

I wonder if you already managed to test this out in use in FOG?

badboy · 2021-05-03T10:59:54Z

glean-core/src/scheduler.rs

+
+ // `When` is responsible for date math. Let's make sure it's correct.
+ #[test]
+ fn test_when() {


The test_ prefix is noise in the output IMO. Can we make the function names a bit more self-describing?

badboy · 2021-05-03T11:10:10Z

glean-core/src/lib.rs

@@ -233,6 +248,9 @@ impl Glean {
 max_events: cfg.max_events.unwrap_or(DEFAULT_MAX_EVENTS),
 is_first_run: false,
 debug: DebugOptions::new(),
+ // Subprocess doesn't use "metrics" pings so has no need for a scheduler.


This comment doesn't apply anymore I guess

Hm, it might, though. There shouldn't be a subprocess scheduler.

glean-core/src/scheduler.rs

badboy · 2021-05-03T11:31:51Z

glean-core/src/scheduler.rs

+ // Ensure that if we have a different build, we immediately submit an "upgrade" ping
+ // and schedule a "reschedule" ping for tomorrow.
+ #[test]
+ fn test_different_app_builds() {


In Rust one usually doesn't use the test_ prefix for this functions.
Otherwise all functions in the output just start the same, making it harder to parse at a glance.

glean-core/src/scheduler.rs

badboy · 2021-05-03T11:39:42Z

glean-core/src/scheduler.rs

+ *cancelled_lock.lock().unwrap() = true; // Cancel the scheduler thread.
+ condvar.notify_all(); // Notify any/all listening schedulers to check whether they were cancelled.


Shouldn't this just call cancel()? It's what the outside will call and does the same inside, so we don't need to repeat it over and over again in tests.

Suggested change

*cancelled_lock.lock().unwrap() = true; // Cancel the scheduler thread.

condvar.notify_all(); // Notify any/all listening schedulers to check whether they were cancelled.

super::cancel();

badboy · 2021-05-03T11:40:56Z

glean-core/src/scheduler.rs

+ *cancelled_lock.lock().unwrap() = true; // Cancel the scheduler thread.
+ condvar.notify_all(); // Notify any/all listening schedulers to check whether they were cancelled.


As above. Gonna stop calling it out.

glean-core/src/lib.rs

chutten · 2021-05-03T12:53:32Z

glean-core/src/lib.rs

@@ -233,6 +248,9 @@ impl Glean {
 max_events: cfg.max_events.unwrap_or(DEFAULT_MAX_EVENTS),
 is_first_run: false,
 debug: DebugOptions::new(),
+ // Subprocess doesn't use "metrics" pings so has no need for a scheduler.


Hm, it might, though. There shouldn't be a subprocess scheduler.

glean-core/src/lib.rs

chutten · 2021-05-03T12:58:53Z

glean-core/src/scheduler.rs

+pub fn cancel() {
+ let (cancelled_lock, condvar) = &**TASK_CONDVAR; // One `*` for Lazy, the second for Arc
+ *cancelled_lock.lock().unwrap() = true; // Cancel the scheduler thread.
+ condvar.notify_all(); // Notify any/all listening schedulers to check whether they were cancelled.


In other news, I fully expect to have to store the task scheduler's joinhandle somewhere and join on it here when we start getting intermittent at-shutdown problems : |

glean-core/src/scheduler.rs

badboy

This looks good now! From the chat I take you took this to a test drive, did this require any last changes or was that purely FOG work now?

glean-core/src/scheduler.rs

Mostly a translation from other LBs' MPSes, but with a few changes. Most notably a split of scheduling from ping submission that makes for easier-to-proxy operations that are easier to test. Also notable is the use of a condvar for cancellable tasks.

Co-authored-by: Jan-Erik Rediger <[email protected]>

badboy · 2021-05-05T15:06:33Z

glean-core/rlb/src/lib.rs

 if !old_enabled && enabled {
+ glean.start_metrics_ping_scheduler();


huh, interestingly we don't do this on Kotlin or Swift.

It's correct though, when re-enabled we should also let the MPS do its work again

chutten requested a review from badboy April 28, 2021 20:53

chutten force-pushed the bug1672951-CoreMPS branch from c5f6156 to 5fa02a3 Compare April 28, 2021 20:55

badboy reviewed Apr 29, 2021

View reviewed changes

chutten force-pushed the bug1672951-CoreMPS branch from 5fa02a3 to 5dc05d1 Compare April 30, 2021 20:46

chutten requested a review from badboy April 30, 2021 20:47

chutten force-pushed the bug1672951-CoreMPS branch 3 times, most recently from 8377191 to c189ff9 Compare April 30, 2021 21:06

badboy requested changes May 3, 2021

View reviewed changes

chutten force-pushed the bug1672951-CoreMPS branch from c189ff9 to 4531fac Compare May 3, 2021 17:00

chutten commented May 3, 2021

View reviewed changes

chutten requested a review from badboy May 3, 2021 17:15

badboy reviewed May 5, 2021

View reviewed changes

glean-core/src/scheduler.rs Show resolved Hide resolved

glean-core/src/scheduler.rs Outdated Show resolved Hide resolved

chutten added 2 commits May 5, 2021 10:02

add new Configuration parameter, fix RLB ping tests

08ffed7

chutten force-pushed the bug1672951-CoreMPS branch from 4531fac to 08ffed7 Compare May 5, 2021 14:03

add unit to comment

26d9eb3

Co-authored-by: Jan-Erik Rediger <[email protected]>

badboy approved these changes May 5, 2021

View reviewed changes

chutten merged commit 1a8ad80 into mozilla:main May 5, 2021

chutten deleted the bug1672951-CoreMPS branch May 5, 2021 15:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug 1672951 - Implement a Metrics Ping Scheduler in glean-core #1599

Bug 1672951 - Implement a Metrics Ping Scheduler in glean-core #1599

chutten commented Apr 28, 2021

badboy left a comment

badboy Apr 29, 2021

badboy Apr 29, 2021

chutten Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

chutten Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

badboy Apr 29, 2021

chutten commented Apr 30, 2021

chutten commented Apr 30, 2021

badboy left a comment

badboy May 3, 2021

badboy May 3, 2021

chutten May 3, 2021

badboy May 3, 2021

badboy May 3, 2021

badboy May 3, 2021

chutten May 3, 2021

chutten May 3, 2021

badboy left a comment

badboy May 5, 2021

badboy May 5, 2021


		const SCHEDULED_HOUR: u32 = 4;

		static THREAD_GENERATION: OnceCell<Arc<AtomicU32>> = OnceCell::new();

		*cancelled_lock.lock().unwrap() = true; // Cancel the scheduler thread.
		condvar.notify_all(); // Notify any/all listening schedulers to check whether they were cancelled.

	*cancelled_lock.lock().unwrap() = true; // Cancel the scheduler thread.
	condvar.notify_all(); // Notify any/all listening schedulers to check whether they were cancelled.
	super::cancel();

		if !old_enabled && enabled {
		glean.start_metrics_ping_scheduler();

Bug 1672951 - Implement a Metrics Ping Scheduler in glean-core #1599

Bug 1672951 - Implement a Metrics Ping Scheduler in glean-core #1599

Conversation

chutten commented Apr 28, 2021

badboy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chutten commented Apr 30, 2021

chutten commented Apr 30, 2021

badboy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

badboy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment