Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault when MetricProvider tries to log to already destructed global log handler at exit #1718

Closed
johanpel opened this issue Oct 28, 2022 · 8 comments · Fixed by #1767
Closed
Labels
bug Something isn't working

Comments

@johanpel
Copy link
Contributor

When using the SDK MeterProvider::Shutdown explicitly before std::exit without having any log messages before setting the global MeterProvider, the global log handler seems to get destructed before the MeterProvider during cleanup of static storage duration variables.

The below program is a minimal reproducible example, which you may need to run a few times (although in GDB I seem to always hit the segfault).

If I put breakpoints on the destructor of the log handler, I see it getting destructed before the MeterProvider.

Not sure what a good fix would be, but this occurs if Shutdown is called twice. If users are not intended to call the SDK MeterProvider::Shutdown function explicitly, perhaps it should not be public.

Another option may be to make the logger instance non-local to GlobalLogHandler::GetHandlerAndLevel such that it is always initialized before users call SetMeterProvider. I don't know what other static storage duration things may log during destruction, but if more instances of this problem exist, that would also fix those.

To reproduce:

#include <chrono>
#include <memory>
#include <thread>
#include <vector>

#include "opentelemetry/exporters/otlp/otlp_grpc_metric_exporter_factory.h"
#include "opentelemetry/metrics/provider.h"
#include "opentelemetry/nostd/shared_ptr.h"
#include "opentelemetry/sdk/common/global_log_handler.h"
#include "opentelemetry/sdk/metrics/export/periodic_exporting_metric_reader.h"
#include "opentelemetry/sdk/metrics/meter_provider.h"

namespace metric_sdk    = opentelemetry::sdk::metrics;
namespace nostd         = opentelemetry::nostd;
namespace common        = opentelemetry::common;
namespace metrics_api   = opentelemetry::metrics;
namespace otlp_exporter = opentelemetry::exporter::otlp;

namespace
{
class MeasurementFetcher
{
public:
  static void Fetcher(opentelemetry::metrics::ObserverResult observer_result, void * /* state */)
  {
    if (nostd::holds_alternative<nostd::shared_ptr<opentelemetry::metrics::ObserverResultT<long>>>(
            observer_result))
    {
      nostd::get<nostd::shared_ptr<opentelemetry::metrics::ObserverResultT<long>>>(observer_result)
          ->Observe(0);
    }
  }
};
}  // namespace

int main(int argc, char *argv[])
{
  // Initialize exporter.
  otlp_exporter::OtlpGrpcMetricExporterOptions exporter_options;
  auto exporter = otlp_exporter::OtlpGrpcMetricExporterFactory::Create(exporter_options);

  // Initialize reader.
  metric_sdk::PeriodicExportingMetricReaderOptions reader_options;
  reader_options.export_interval_millis = std::chrono::milliseconds(200);
  reader_options.export_timeout_millis  = std::chrono::milliseconds(100);
  std::unique_ptr<metric_sdk::MetricReader> reader{
      new metric_sdk::PeriodicExportingMetricReader(std::move(exporter), reader_options)};

  // Initialize provider.
  auto provider = std::make_shared<metric_sdk::MeterProvider>();
  provider->AddMetricReader(std::move(reader));
  // OTEL_INTERNAL_LOG_DEBUG("Logging before setting the provider prevents segfault.");
  metrics_api::Provider::SetMeterProvider(provider);

  // Create a meter.
  auto meter = provider->GetMeter("my_meter");

  // Create an instrument.
  auto instrument = meter->CreateInt64ObservableGauge("my_instrument");
  instrument->AddCallback(MeasurementFetcher::Fetcher, nullptr);

  provider->Shutdown();

  // Cleaning up the provider explicitly also prevents the segfault.
  // metrics_api::Provider::SetMeterProvider(nostd::shared_ptr<metrics_api::MeterProvider>(
  //    new opentelemetry::metrics::NoopMeterProvider()));
}
@johanpel johanpel added the bug Something isn't working label Oct 28, 2022
@johanpel
Copy link
Contributor Author

Here is a backtrace:

opentelemetry::v1::sdk::metrics::MeterContext::Shutdown meter_context.cc:86
opentelemetry::v1::sdk::metrics::MeterProvider::~MeterProvider meter_provider.cc:99
std::destroy_at<opentelemetry::v1::sdk::metrics::MeterProvider> stl_construct.h:88
std::allocator_traits<std::allocator<opentelemetry::v1::sdk::metrics::MeterProvider> >::destroy<opentelemetry::v1::sdk::metrics::MeterProvider> alloc_traits.h:537
std::_Sp_counted_ptr_inplace<opentelemetry::v1::sdk::metrics::MeterProvider, std::allocator<opentelemetry::v1::sdk::metrics::MeterProvider>, (__gnu_cxx::_Lock_policy)2>::_M_dispose shared_ptr_base.h:528
std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release shared_ptr_base.h:168
std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count shared_ptr_base.h:705
std::__shared_ptr<opentelemetry::v1::metrics::MeterProvider, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr shared_ptr_base.h:1154
std::shared_ptr<opentelemetry::v1::metrics::MeterProvider>::~shared_ptr shared_ptr.h:122
__run_exit_handlers 0x00007ffff77b4495
__GI_exit 0x00007ffff77b4610
__libc_start_call_main 0x00007ffff7798d97
__libc_start_main_impl 0x00007ffff7798e40
_start 0x000055555590ef75

@lalitb lalitb added this to the Metrics post GA release milestone Oct 28, 2022
@lalitb
Copy link
Member

lalitb commented Oct 28, 2022

Thanks for the details. I think it would be safe to silently drop the warning when the MeterProvider::Shutdown() is called twice ( the second one through it's destructor), or atleast don't rely on the global log handler to log the error -

else
{
OTEL_INTERNAL_LOG_WARN("[MeterContext::Shutdown] Shutdown can be invoked only once.");
}

@owent
Copy link
Member

owent commented Oct 31, 2022

Could you debug the codes in meter_context.cc:86 to find why GlobalLogHandler::GetLogHandler() returns a valid pointer when crash?
It should get a null LogHandler and ignore the internal log when GlobalLogHandler is destroyed before other components.

@lalitb
Copy link
Member

lalitb commented Oct 31, 2022

Could you debug the codes in meter_context.cc:86 to find why GlobalLogHandler::GetLogHandler() returns a valid pointer when crash?

GlobalLogHandler::GetLogHandler() doesn't return a valid pointer once the static default logger is destroyed. It returns a not-null pointer (pointing to the already destroyed static log handler) which is no more valid. And so when the DefaultLogHandler::Shutdown() is invoked on this pointer, it crashes. I don't think the problem is specific to Metrics, the same problem can come for Tracer too, but we don't use global log handler in tracer-provider/tracer-context/processor(s) during shutdown.

@owent
Copy link
Member

owent commented Nov 1, 2022

Could you debug the codes in meter_context.cc:86 to find why GlobalLogHandler::GetLogHandler() returns a valid pointer when crash?

GlobalLogHandler::GetLogHandler() doesn't return a valid pointer once the static default logger is destroyed. It returns a not-null pointer (pointing to the already destroyed static log handler) which is no more valid. And so when the DefaultLogHandler::Shutdown() is invoked on this pointer, it crashes. I don't think the problem is specific to Metrics, the same problem can come for Tracer too, but we don't use global log handler in tracer-provider/tracer-context/processor(s) during shutdown.

Maybe we can add a wrapper and set the guard flag to false when the global handle is detroyed? Just like singlethon implantation in boost? The object of static and global variable is not valid but the address is still reachable when it's destroyed.

@lalitb
Copy link
Member

lalitb commented Nov 4, 2022

Maybe we can add a wrapper and set the guard flag to false when the global handle is detroyed? Just like singlethon implantation in boost? The object of static and global variable is not valid but the address is still reachable when it's destroyed.

Correct me, but that would mean there would be an atomic check to this guard flag everytime the log-macros are called.

@owent
Copy link
Member

owent commented Nov 15, 2022

Maybe we can add a wrapper and set the guard flag to false when the global handle is detroyed? Just like singlethon implantation in boost? The object of static and global variable is not valid but the address is still reachable when it's destroyed.

Correct me, but that would mean there would be an atomic check to this guard flag everytime the log-macros are called.

I think we can just use a normal variable, but not a atomic. We just need to check if the memory can be access. But it may be warned by tsan.

@lalitb
Copy link
Member

lalitb commented Nov 15, 2022

I think we can just use a normal variable, but not a atomic. We just need to check if the memory can be access. But it may be warned by tsan.

Thanks @owent , I just used another approach in #1767 to control the order of construction and destruction. Though it is merged, let me know if you see any issues with that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants