Skip to content
This repository has been archived by the owner on Feb 20, 2023. It is now read-only.

Log Serializer Task Metrics Granularity #1569

Open
jkosh44 opened this issue Apr 27, 2021 · 0 comments
Open

Log Serializer Task Metrics Granularity #1569

jkosh44 opened this issue Apr 27, 2021 · 0 comments
Labels
question/discussion Further information is requested.

Comments

@jkosh44
Copy link
Contributor

jkosh44 commented Apr 27, 2021

Discussion

The following code is responsible for tracking metrics in the LogSerializerTask

uint64_t num_bytes = 0, num_records = 0, num_txns = 0;
// Initialize whether to collect metrics outside of the spin loop so as not to count each loop iteration as a sample
// (by calling ComponentToRecord this increments the sample count)
bool logging_metrics_enabled =
common::thread_context.metrics_store_ != nullptr &&
common::thread_context.metrics_store_->ComponentToRecord(metrics::MetricsComponent::LOGGING);
do {
if (logging_metrics_enabled && !common::thread_context.resource_tracker_.IsRunning()) {
// start the operating unit resource tracker
common::thread_context.resource_tracker_.Start();
}
// Serializing is now on the "critical txn path" because txns wait to commit until their logs are serialized. Thus,
// a sleep is not fast enough. We perform exponential back-off, doubling the sleep duration if we don't process any
// buffers in our call to Process. Calls to Process will process as long as new buffers are available. We only
// sleep as part of this exponential backoff when there are logs that need to be processed and we wake up when there
// are new logs to be processed.
if (empty_) {
std::unique_lock<std::mutex> guard(flush_queue_latch_);
sleeping_ = true;
flush_queue_cv_.wait_for(guard, curr_sleep);
sleeping_ = false;
}
// If Process did not find any new buffers, we perform exponential back-off to reduce our rate of polling for new
// buffers. We cap the maximum back-off, since in the case of large gaps of no txns, we don't want to unboundedly
// sleep
std::tie(num_bytes, num_records, num_txns) = Process();
curr_sleep = std::min(num_records > 0 ? serialization_interval_ : curr_sleep * 2, max_sleep);
if (num_records > 0) {
if (common::thread_context.resource_tracker_.IsRunning()) {
// Stop the resource tracker for this operating unit
common::thread_context.resource_tracker_.Stop();
auto &resource_metrics = common::thread_context.resource_tracker_.GetMetrics();
common::thread_context.metrics_store_->RecordSerializerData(num_bytes, num_records, num_txns,
serialization_interval_.count(), resource_metrics);
}
num_bytes = num_records = num_txns = 0;
// Update whether to collect metrics only if we did work (starting a new event) so as not to count each loop
// iteration as a sample (by calling ComponentToRecord this increments the sample count)
logging_metrics_enabled =
common::thread_context.metrics_store_ != nullptr &&
common::thread_context.metrics_store_->ComponentToRecord(metrics::MetricsComponent::LOGGING);
}
} while (run_task_);

Process() Has its own loop, which continuously loops as long as there are more log records to serialize. That means that Process() doesn't return until there's a break in log records. The two consequences of this are

  1. If there's a steady stream of logs then you can end up with a very long interval that contains a lot of log records.
  2. While there's a steady stream of logs then you won't get any metrics until there's a break in the logs.

So if you're running a workload with a lot of small transactions, you may end up with a single interval that contains the metrics of the entire workload.

As an example here are some metrics from running 3000 transactions where each transaction is a single insert.
logs.csv
That last interval lasts for roughly 17 minutes

I'm not actually familiar enough with the metrics and their usages to know if this is an issue or if this is ok.

Solution

If this is an issue, then below are two potential solutions though feel free to suggest others.

  1. There was some brief discussion earlier about removing the while(true) loop from Process(). It seems a bit unnecessary that both Process() and LogSerializerTaskLoop() continuously loop. Though it's possible that there's a good reason for the internal while loop and I haven't looked closely enough.
  2. We can push all of the metrics code into Process() itself, that way each iteration of the loop in Process() is a single collection interval.
@jkosh44 jkosh44 added the question/discussion Further information is requested. label Apr 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question/discussion Further information is requested.
Projects
None yet
Development

No branches or pull requests

1 participant