Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Provide all metric labels in OpenTelemetry & StatsD #2183

Merged
merged 24 commits into from
Jul 17, 2023

Conversation

tomkerkhove
Copy link
Owner

@tomkerkhove tomkerkhove commented Nov 29, 2022

  • Add unit tests for label scenarios (moved to backlog)

Fixes #2180

@trafico-bot trafico-bot bot added the 🔍 Ready for Review Pull Request is not reviewed yet label Nov 29, 2022
Signed-off-by: Tom Kerkhove <[email protected]>
Signed-off-by: Tom Kerkhove <[email protected]>
Signed-off-by: Tom Kerkhove <[email protected]>
Signed-off-by: Tom Kerkhove <[email protected]>
@amirschw
Copy link
Contributor

Looking forward to seeing this fix go in, anything I can do to help push it? @tomkerkhove

@tomkerkhove
Copy link
Owner Author

Ask me to stop slacking? 🤔 😁

@tomkerkhove
Copy link
Owner Author

tomkerkhove commented Jul 14, 2023

@SagiVaknin Did you manage to find more on this?

Looks like tests are still failing, example:

[xUnit.net 00:34:09.97] Promitor.Tests.Integration.Services.Scraper.MetricSinks.OpenTelemetryMetricSinkTests.OpenTelemetry_Scrape_ExpectedScrapedMetricIsAvailable(expectedMetricName: "promitor_demo_servicebus_messagecount_discovered") [FAIL]
Failed Promitor.Tests.Integration.Services.Scraper.MetricSinks.OpenTelemetryMetricSinkTests.OpenTelemetry_Scrape_ExpectedScrapedMetricIsAvailable(expectedMetricName: "promitor_demo_servicebus_messagecount_discovered") [34 m 6 s]
Error Message:
Assert.NotNull() Failure
Stack Trace:
at Promitor.Tests.Integration.Services.Scraper.MetricSinks.OpenTelemetryMetricSinkTests.AssertExpectedMetricIsAvailable(String expectedMetricName) in /home/vsts/work/1/s/src/Promitor.Tests.Integration/Services/Scraper/MetricSinks/OpenTelemetryMetricSinkTests.cs:line 128
at Promitor.Tests.Integration.Services.Scraper.MetricSinks.OpenTelemetryMetricSinkTests.OpenTelemetry_Scrape_ExpectedScrapedMetricIsAvailable(String expectedMetricName) in /home/vsts/work/1/s/src/Promitor.Tests.Integration/Services/Scraper/MetricSinks/OpenTelemetryMetricSinkTests.cs:line 40
--- End of stack trace from previous location ---
Standard Output Messages:
2023-07-05T07:19:39 Information > Creating Prometheus client for http://localhost:8889/metrics with metric namespace otel
2023-07-05T07:19:39 Information > Base URL for Prometheus interaction is 'http://localhost:8889'/ to scrape on '/metrics' with metric namespace 'otel'
2023-07-05T07:19:39 Information > Starting to look for metric with name 'otel_promitor_demo_servicebus_messagecount_discovered'.

When looking at the OTEL metrics (thanks for adding this!) I see that that above metric was reported as successful but there is no value.

So either there are no values and there is no information, or we are misreporting success/failure.

Based on scraper logs, I'm inclined to think the latter:

[07:18:29 WRN] {"DependencyId": null, "DependencyType": "Http", "DependencyName": "GET /api/v2/resources/groups/synapse-apache-spark-pools/discover", "DependencyData": null, "TargetName": "promitor.agents.resourcediscovery", "ResultCode": 500, "IsSuccessful": false, "StartTime": "2023-07-05T07:18:28.9919147 +00:00", "Duration": "00:00:00.1232588", "Context": {"TelemetryType": "Dependency"}, "$type": "DependencyLogEntry"}
[07:18:29 ERR] Failed to discover resources for group synapse-apache-spark-pools.

However, I also see logs that it was reported with 0:

[07:19:07 INF] Scraping promitor_demo_servicebus_messagecount_discovered for resource type ServiceBusNamespace.

[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-8 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-10 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-13 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-3 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-5 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-6 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-9 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-14 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-4 as part of entity_name dimension with aggregation interval 00:05:00
[07:19:08 INF] Found value 0 for metric promitor_demo_servicebus_messagecount_discovered with dimension queue-15 as part of entity_name dimension with aggregation interval 00:05:00

Likely metric is ignored because the value is 0?

@tomkerkhove
Copy link
Owner Author

tomkerkhove commented Jul 14, 2023

Ok another problem is the following: Test looks for promitor_demo_servicebus_messagecount_discovered instead of otel_promitor_demo_servicebus_messagecount_discovered

@amirschw
Copy link
Contributor

amirschw commented Jul 14, 2023

From what I can tell, there are three metrics in Show Prometheus metrics for Scraper agent that are unavailable in Show Prometheus metrics for OpenTelemetry agent:

  • promitor_demo_appplan_autoscale_instances_current_discovered
  • promitor_demo_servicebus_messagecount_discovered
  • promitor_demo_servicebus_messagecount_limited

Regarding promitor_demo_appplan_autoscale_instances_current_discovered, it's probably because its value is always null. As for the other two, could it be somehow related to the entity_name dimension that is added to them?

@amirschw
Copy link
Contributor

amirschw commented Jul 15, 2023

I took another look at the two failing service bus namespace metrics and found the real issue with them. We're adding to both metrics the variable label app: promitor which is a duplicate of a constant label with the same name (and value) that is defined in the open telemetry prometheus exporter. Ref:

@tomkerkhove
Copy link
Owner Author

I took another look at the two failing service bus namespace metrics and found the real issue with them. We're adding to both metrics the variable label app: promitor which is a duplicate of a constant label with the same name (and value) that is defined in the open telemetry prometheus exporter. Ref:

Updated OTEL configuration, let's see what it does.

Ok another problem is the following: Test looks for promitor_demo_servicebus_messagecount_discovered instead of otel_promitor_demo_servicebus_messagecount_discovered

I will tweak this later this weekend.

@SagiVaknin
Copy link
Contributor

SagiVaknin commented Jul 16, 2023

@tomkerkhove did not have the time to follow on this since merged the PR for the opentelemetry logs.
The configuration issue really did the trick,

Ok another problem is the following: Test looks for promitor_demo_servicebus_messagecount_discovered instead of otel_promitor_demo_servicebus_messagecount_discovered

Isn't this just the metric_name parameter in the test name, and the prefix of otel_ is being added after?

@tomkerkhove
Copy link
Owner Author

Interesting. I'll check status of this PR to open follow-up issues and will merge it.

THanks folks

@tomkerkhove
Copy link
Owner Author

Isn't this just the metric_name parameter in the test name, and the prefix of otel_ is being added after?

Correct, but the test is acting as a consumer so if it adds that suffix, then that is what we have to query for.

@tomkerkhove tomkerkhove merged commit 2ae09eb into master Jul 17, 2023
25 checks passed
@tomkerkhove tomkerkhove deleted the otel-labels branch July 17, 2023 08:04
@trafico-bot trafico-bot bot added ✨ Merged Pull Request has been merged successfully and removed 🔍 Ready for Review Pull Request is not reviewed yet labels Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ Merged Pull Request has been merged successfully
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric labels for dimensions are missing in OpenTelemetry sink
3 participants