-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clean up logging around on-demand downloads #4030
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving even with 2 => 3 expansion, but I wish that was still considered. Need to upgrade log analysis dashboards.
Test results for 50d8430:debug build: 219 tests run: 209 passed, 0 failed, 10 (full report)release build: 219 tests run: 209 passed, 0 failed, 10 (full report) |
|
3221e35
to
f03a2d1
Compare
@arssher adding you / safekeepers to the review because the logging code is shared. The perf overhead of the |
- Remove repeated tenant & timeline from span - Demote logging of the path to debug level - Log completion at info level, in the same function where we log errors - distinguish between layer file download success & on-demand download succeeding as a whole in the log message wording - Assert that the span contains a tenant id and a timeline id The assert uncovered that walreceiver_connection uses TenantTimelineId in the span. I changed that to tenant_id and timeline_id. fixes #3945 Before: ``` INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: download complete: /storage/pageserver/data/tenants/$TENANT_ID/timelines/$TIMELINE_ID/000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91 INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{tenant_id=$TENANT_ID timeline_id=$TIMELINE_ID layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates. ``` After: ``` INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: layer file download finished INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: Rebuilt layer map. Did 9 insertions to process a batch of 1 updates. INFO compaction_loop{tenant_id=$TENANT_ID}:compact_timeline{timeline=$TIMELINE_ID}:download_remote_layer{layer=000000000000000000000000000000000000-FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF__00000000020C8A71-00000000020CAF91}: on-demand download successful ```
f03a2d1
to
6d5a7fc
Compare
Rebased to get the fix for the failed test
|
Is the only reason for Also seems like |
Yes.
I don't understand this sentence.
True, since But, as I said,
So, I read it's not Ok for you (SKs) to have it enabled always? |
I should have made it separate paragraph, not related to So I think it is ok, I just ensured we can easily turn it off if needed. |
Ok, it can't be turned off currently, so, I take as an action item that it should be configurable and off by default? |
I doubt this is important currently, my message that we try not to log a lot in perf sensitive places was supposed to be an argument why this shouldn't be important. I think we can return here only if we spot any problems. But if the whole deal is a matter of adding single |
Ah, I missed that part. Then leaving as is is definitely ok. |
@arssher now I already put in the work :D That takes out almost all the performance risk with this PR. |
cool! |
PR `build: run clippy for powerset of features (#4077)` brought us a `clippy --release` pass. It was merged after #4030, which fails under `clippy --release` with ``` error: static `TENANT_ID_EXTRACTOR` is never used --> pageserver/src/tenant/timeline.rs:4270:16 | 4270 | pub static TENANT_ID_EXTRACTOR: once_cell::sync::Lazy< | ^^^^^^^^^^^^^^^^^^^ | = note: `-D dead-code` implied by `-D warnings` error: static `TIMELINE_ID_EXTRACTOR` is never used --> pageserver/src/tenant/timeline.rs:4276:16 | 4276 | pub static TIMELINE_ID_EXTRACTOR: once_cell::sync::Lazy< | ^^^^^^^^^^^^^^^^^^^^^ ``` A merge queue would have prevented this.
PR `build: run clippy for powerset of features (#4077)` brought us a `clippy --release` pass. It was merged after #4030, which fails under `clippy --release` with ``` error: static `TENANT_ID_EXTRACTOR` is never used --> pageserver/src/tenant/timeline.rs:4270:16 | 4270 | pub static TENANT_ID_EXTRACTOR: once_cell::sync::Lazy< | ^^^^^^^^^^^^^^^^^^^ | = note: `-D dead-code` implied by `-D warnings` error: static `TIMELINE_ID_EXTRACTOR` is never used --> pageserver/src/tenant/timeline.rs:4276:16 | 4276 | pub static TIMELINE_ID_EXTRACTOR: once_cell::sync::Lazy< | ^^^^^^^^^^^^^^^^^^^^^ ``` A merge queue would have prevented this.
fixes #3945
Before:
After: