Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support metrics with dots in their names #347

Open
SylvainJuge opened this issue Sep 29, 2020 · 37 comments
Open

Support metrics with dots in their names #347

SylvainJuge opened this issue Sep 29, 2020 · 37 comments

Comments

@SylvainJuge
Copy link
Member

SylvainJuge commented Sep 29, 2020

Describe the bug

As of version 1.18.0, the java agent added instrumentation of micrometer framework, which allows to capture metrics managed with this framework. While the reported issue is specific to HikariCP connection pool, it might happen with other frameworks, or could be user-provided.

Initial investigation of the issue shows that some metric names are problematic and conflict with metrics storage.
This issue has been opened in the apm repository as the actual change required might spread over multiple repositories and the solution to fix it properly isn't defined yet.

This was reported in our public forum here : https://discuss.elastic.co/t/failed-to-transform-sample-model-sample/249962

Currently, you can't have two metrics where one is the prefix of another with . as separator, connections=3 and connections.idle=2 will conflict and at least one of them will be ignored as one expects connections to be a metric value, and the other expects it to be an object with an idle property.

To Reproduce

Follow instructions described in the forum to reproduce it with the Java agent.

Any agent that will sent two metrics with names that conflict with each other will have a similar issue.

Expected behavior

Metric names should not have any impact on our ability to capture them.

Debug logs

Server debug logs (provided in the forum)
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.idle", Value:2, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.count", Value:600, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.sum.us", Value:7.0216e+07, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.timeout", Value:6, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.pending", Value:0, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.sum.us", Value:5.7659e+07, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.active", Value:0, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.idle", Value:2, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.creation.count", Value:511, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.usage.count", Value:128944, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.usage.sum.us", Value:1.983578e+09, Values:[]float64(nil), Counts:[]int64(nil)}
Sep 25 10:23:09 apm-server[31357]: WARN [transform] model/metricset.go:141 failed to transform sample model.Sample{Name:"hikaricp.connections.max", Value:50, Values:[]float64(nil), Counts:[]int64(nil)}
Agent logs with log_level=trace

Relevant part where the issue is:

"hikaricp.connections":{"value":3.0},"hikaricp.connections.timeout":{"value":0.0}

Full log

Sep 28 16:34:03 env[3405743]: {"metricset":{"timestamp":1601310843073000,"tags":{"pool":"HikariPool-2"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":46},"hikaricp.connections.usage.sum.us":{"value":409000.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections.acquire.count":{"value":46},"hikaricp.connections.acquire.sum.us":{"value":159244.109},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:34:03 env[3405743]: {"metricset":{"timestamp":1601310843073000,"tags":{"pool":"HikariPool-1"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":50},"hikaricp.connections.usage.sum.us":{"value":889000.0},"hikaricp.connections.acquire.count":{"value":50},"hikaricp.connections.acquire.sum.us":{"value":217912.23},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:34:33 env[3405743]: {"metricset":{"timestamp":1601310873073000,"tags":{"pool":"HikariPool-2"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":60},"hikaricp.connections.usage.sum.us":{"value":416000.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections.acquire.count":{"value":60},"hikaricp.connections.acquire.sum.us":{"value":172080.99},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:34:33 env[3405743]: {"metricset":{"timestamp":1601310873073000,"tags":{"pool":"HikariPool-1"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":64},"hikaricp.connections.usage.sum.us":{"value":896000.0},"hikaricp.connections.acquire.count":{"value":64},"hikaricp.connections.acquire.sum.us":{"value":243931.7},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections":{"value":3.0},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.idle":{"value":3.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:35:03 env[3405743]: {"metricset":{"timestamp":1601310903073000,"tags":{"pool":"HikariPool-2"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":74},"hikaricp.connections.usage.sum.us":{"value":423000.0},"hikaricp.connections":{"value":2.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections.acquire.count":{"value":74},"hikaricp.connections.acquire.sum.us":{"value":184358.326},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.idle":{"value":2.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}
Sep 28 16:35:03 env[3405743]: {"metricset":{"timestamp":1601310903073000,"tags":{"pool":"HikariPool-1"},"samples":{"hikaricp.connections.max":{"value":50.0},"hikaricp.connections.usage.count":{"value":78},"hikaricp.connections.usage.sum.us":{"value":903000.0},"hikaricp.connections.acquire.count":{"value":78},"hikaricp.connections.acquire.sum.us":{"value":269137.9},"hikaricp.connections.min":{"value":2.0},"hikaricp.connections":{"value":2.0},"hikaricp.connections.timeout":{"value":0.0},"hikaricp.connections.pending":{"value":0.0},"hikaricp.connections.active":{"value":0.0},"hikaricp.connections.idle":{"value":2.0},"hikaricp.connections.creation.count":{"value":0},"hikaricp.connections.creation.sum.us":{"value":0.0}}}}

Possible approaches to solve this

While we can apply workarounds that are specific to HikariCP or even to our instrumentation of Micrometer, it seems that a proper fix for this needs to be applied at either agent or apm-server level.

Here is a non-definitive list of ideas that we (java agent team) have come so far:

  1. Agent could simply rename metrics on the fly, for example replace . with _. While simple to change, that would make such metrics different from others that use dots and break compatibility with existing metrics and even impact APM UI as current visualizations rely on metric names. Another downside is with custom metrics, those will have different (but close) names which could be confusing.
  2. Agent (or server) could post-process any metricset to ensure that no metrics conflict by adding an extra _value field automatically when required. This would work as long as those metrics are sent together in the same metricset, which is quite likely. In the example above all metrics related to a database connection pool will always reported together.
  3. Add a configuration option at agent-level that will allow to rename those conflicting metrics. This would assume that the frequency of such cases is small enough to not be a constant burden. Given that hikaricp is the default connection pool for spring-boot applications, which itself is a very popular framework, that's not a very appealing option for Java agent.

Currently, we think that the option 2. is the best for the long term, and having a few metrics with an extra _value attribute should be straight-forward for the end-user.

What are the other options that we could implement for this ?


Current workaround

  • with agent version 1.18.0, add micrometer to disable_instrumentations, this will completely disable micrometer integration.
  • use latest snapshot version or 1.19.0 ( not released as time of writing) and set disabled_metrics=hikaricp.connections
@axw
Copy link
Member

axw commented Sep 30, 2020

The server team discussed this just now, and we feel like option 2 is too fragile, since it only works if the metrics are all part of the same metric set. If for example an agent were to sometimes omit (delta) metrics that did not change in the most recent interval, then that could mean there is no conflict and the name would flip. That would then lead to an indexing conflict.

In the shorter term, could the Java agent rewrite the specific problematic hikaricp metric (e.g. hikaricp.connections -> hikaricp.connections.total)? It wouldn't necessarily need to be configurable. This obviously isn't perfect, but it would resolve the immediate issue.

The server team will also look into our options going forward, and raise enhancement requests with the Elasticsearch team.

@axw
Copy link
Member

axw commented Oct 2, 2020

Agent could simply rename metrics on the fly, for example replace . with _. While simple to change, that would make such metrics different from others that use dots and break compatibility with existing metrics

I learned yesterday of elastic/kibana#69908. This relies on index patterns and therefore isn't immediately useful for dynamic fields, but it may also be possible to introduce a _meta field which sets a display name for a field.

I'm not sure if/how we could dynamically set that though; this doesn't fit with the current proposal for dynamic mapping type hints: elastic/elasticsearch#61939

and even impact APM UI as current visualizations rely on metric names. Another downside is with custom metrics, those will have different (but close) names which could be confusing.

I assume you mean that we would rename things like process, runtime, etc. metrics? We could treat these well-defined metrics differently. In a way they are already treated differently; they have explicit field mappings.

@felixbarny
Copy link
Member

While option 2 is less fragile when done in the agent and not on the server, it's also not perfect. When adding a custom metric foo during the startup of the application and foo.bar at a later point in time, we'll also end up with mapping conflicts.

To fix that, users have to delete the metric index and rename the foo metric.

The only option I see to fix that in a systematic way is to disallow/replace dots for custom metrics.
Interestingly, in Prometheus, dots in metric names are disallowed:
https://prometheus.io/docs/concepts/data_model/#metric-names-and-labels

It [the metric name] may contain ASCII letters and digits, as well as underscores and colons. It must match the regex [a-zA-Z_:][a-zA-Z0-9_:]*.

@axw
Copy link
Member

axw commented Oct 13, 2020

I agree that's the best option for now. There's a new Elasticsearch enhancement request to support dotted field names at: elastic/elasticsearch#63530

@SylvainJuge
Copy link
Member Author

Since the underlying issue is with Elasticsearch, it might take a while before we can rely on a proper fix for that.
Thus in the mean time, we still need to add a work-around.

The current workaround requires to use disable_metrics:
👍 quite simple to configure once you know what value to set
👍 the original metric names expected by user are kept
👎 we remove the conflicting metric, but loose it's value
👎 the mapping conflict is still triggered server side and will require to delete existing metrics

Because option 2 (add suffix automatically) is not reliable, we won't explore it further.

Thus, we are left with three options:

  • rename metrics that contain dots in their names
    ❓ do we apply this to 1) all metrics, 2) only custom metrics set by user (includes JMX) 3) micrometer metrics
    ❓ what about existing metrics and impacts on UI if renaming existing ones
    👎 those metrics will become inconsistent with other metrics and with names that do not match what user expects.
    👎 would trigger a breaking change if we want to revert this later, only users that built things on those metrics would be impacted though.
  • add a configuration option (maybe internal) to add suffix to a list of known metrics.
    👍 slightly better than current workaround as it allows to keep metric value
    👎 same for the mapping conflict
  • wait until we have better options

Do we agree on:

  • doing the "rename option" is the best thing to do at agent level right now
  • apply this to all metrics that are under user control (Micrometer, JMX), and leave existing metrics that are used for JVM UI in Kibana as-is.

@eyalkoren
Copy link
Contributor

Since the optimal solution would be through the implementation of elastic/elasticsearch#63530, and we will eventually want to rely on it, maybe trying to get information on when this is expected can help.
I'd say if we think it is not too far ahead, we can do now with disabling metrics.
Renaming metrics is easy and robust but it means breaking now (for users who rely on such metrics) and re-breaking when we get back to report dotted names.

@lucabelluccini
Copy link

@SylvainJuge
Copy link
Member Author

Also, the latest update on the underlying issue refers to flattened fields, which is currently WIP, thus we can expect to have good news on the topic soon elastic/elasticsearch#63530 (comment) 🤞 !

@OrangeDog
Copy link

Did this get fixed and released at some point? I see some merged PRs.

@lucabelluccini
Copy link

@OrangeDog at least from the latest updates, we have an option for the APM Java Agent which will, by default, dedots the metrics.
E.g. connections.idle becomes connections_idle.
The dedotting:

  • will avoid problems unless the metric is using a different data type for the same metric (which is not common)
  • allows to ingest metrics such as connections=100, connections.idle=89 as they'll be indexed as connections and connections_idle
  • cannot prevent some problems related to float/integers (see Metrics without any explicit mapping can lead to document rejection apm-server#5682)
  • might lead to too many fields (if you have hundreds of metrics)

The solution would be to rely on Elasticsearch feature request elastic/elasticsearch#63530 (comment), which is not yet ready.

@javanna
Copy link
Member

javanna commented May 17, 2022

Hello, I am reaching out to let you know that dedotting will no longer be needed from Elasticsearch 8.3 onwards. See elastic/elasticsearch#86166 . Any object can be configured in the mappings to not hold any further subobjects in which case dots in field names will be left as-is so that connections and connections.idle can both hold values. Note that this is a different approach compared to the comment that was linked above around using flattened field for this usecase. We are rather providiing a solution that allows your solution to replace dedotting and leave dots in field names untouched.

@felixbarny
Copy link
Member

@axw is that all we need in order to support dots in names? I'm not so sure.

In contrast to the examples in elastic/elasticsearch#86166, we don't have a metrics object under which all metrics are nested. Should we set subobjects: false to the first object in a metric then? For example, for the metric foo.bar, set the mapping of the foo object to subobjects: false. However, that can still conflict if there's another metric called foo.

@javanna for TSDB, is there a plan to automatically map metric fields (those that are not dimensions) in a way so that they are never objects? Something like subobjects: false on the root level, but only for metric fields.

@axw
Copy link
Member

axw commented May 18, 2022

is that all we need in order to support dots in names?

@felixbarny I'm also not sure. If we have to add a metrics. prefix to all metrics, then that would require breaking changes because users may have alerts on their metrics, for example.

In elastic/elasticsearch#63530 (comment) it was suggested that we would be able to set flattened: true on the root object.

Something like subobjects: false on the root level, but only for metric fields.

This is what I was originally hoping for, and what I was describing in elastic/elasticsearch#63530 (comment).

@romseygeek made the point that it doesn't really matter whether they're flattened or not, unless we specifically care about the _source structure (which I don't think we do). So, if we can set subobjects: false at the root level, then I think we're good?

@felixbarny
Copy link
Member

Right, I think we'd be fine even with metadata fields, such as host.name not being mapped as an object. Being able to set subobjects: false at the root level sounds like a good idea. @javanna would that be feasible?

@javanna
Copy link
Member

javanna commented May 18, 2022

Heya, happy that I have gotten your attention, there was some confusion in the discussion in the ticket mentioned above. Can I get an example document where you need to use this feature so I can better understand what the way forward is and make recommendations?

The idea would be to find a way to not have to make breaking changes or change your documents. The solution we came up with is hopefully flexible enough to give the option to either have the whole document without subobjects, or only a specific section of it.

@felixbarny
Copy link
Member

An example document:

{
  "host": {
    "name": "myhost"
  },
  "foo": 42,
  "foo.bar": 42,
  "baz": 21,
  "baz.qux": 21
}

@axw
Copy link
Member

axw commented May 18, 2022

@javanna important to note in the above:

  • host.name would be a field defined statically, ahead of time (this is an ECS field, we define it in a component template)
  • foo, foo.bar, baz, and baz.qux would be dynamically mapped metrics

@javanna
Copy link
Member

javanna commented May 18, 2022

Ok with this document there is no doubt that you'll have to set subobjects: false to the root for foo and baz to both hold a value as well as be the prefix for foo.bar and baz.qux.

Though there is a small issue with the current implementation: host can't be specified as an object in the document if the root object is configured to not hold objects. Is it feasible to consistently provide fields with dots in their names like host.nameand have no inner objects? Or does host need to be an object in your documents?

@felixbarny
Copy link
Member

Ok with this document there is no doubt that you'll have to set subobjects: false to the root

Is that already possible?

host can't be specified as an object in the document if the root object is configured to not hold objects. Is it feasible to consistently provide fields with dots in their names like host.nameand have no inner objects? Or does host need to be an object in your documents?

Is it an issue if the source document contains nested objects? Could ES convert that into dotted fields during indexing? If not, I suppose APM Server could modify the fields to convert nesting into dotting. It doesn't sound great but probably feasible.

But whatever we do, we should make sure to closely align with TSDB. Not sure if there's a catch when mapping metric fields and/or dimension fields as subobjects: false.

@axw
Copy link
Member

axw commented May 18, 2022

Ok with this document there is no doubt that you'll have to set subobjects: false to the root for foo and baz to both hold a value as well as be the prefix for foo.bar and baz.qux.

👍

Though there is a small issue with the current implementation: host can't be specified as an object in the document if the root object is configured to not hold objects. Is it feasible to consistently provide fields with dots in their names like host.name and have no inner objects? Or does host need to be an object in your documents?

AFAIK we don't have any specific need for it to be an object, but this is not entirely within our control; Fleet creates the index & component templates. I'll get in touch with the Fleet folks to see if we can flatten the mappings.

It seems like more changes are needed in Elasticsearch though, is that right? This fails:

PUT /foo
{
  "mappings" : {
    "subobjects": false,
    "properties": {
      "host.name": {
        "type": "keyword"
      }
    }
  }
}
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping: Object [_doc] has subobjects set to false hence it does not support inner object [host]"
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping: Object [_doc] has subobjects set to false hence it does not support inner object [host]",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Object [_doc] has subobjects set to false hence it does not support inner object [host]"
    }
  },
  "status": 400
}

@felixbarny
Copy link
Member

Using non-object mappings for ECS fields, such as host.name doesn't sound so great.

There's an effort around making ECS mappings more native in Elasticsearch: elastic/elasticsearch#85692

We'd need to consider two versions of the mapping then - an object-based and a flat one.

@javanna
Copy link
Member

javanna commented May 18, 2022

Ok @felixbarny let me think about that, we can try and handle that in ES.

@axw for the error above, I am looking into it.

@javanna
Copy link
Member

javanna commented May 18, 2022

@axw The error you got does not look like it's returned from the above create index call, but rather from indexing a doc with a host object, or trying to map host as an object, which is the very same problem I mentioned above and I am discussion with Felix too. Can you confirm I got it right?

@axw
Copy link
Member

axw commented May 18, 2022

@javanna nope, I ran that exact PUT /foo request in dev tools -- just tried it again. I'm running the latest 8.3.0-SNAPSHOT, 0418e8a9d824b33c02d4d84ee2ffff91287c9472.

@javanna
Copy link
Member

javanna commented May 18, 2022

@axw I found that problem and opened a PR (elastic/elasticsearch#86894), thanks for spotting it.

@felixbarny I wanted to get back to what you said above "Using non-object mappings for ECS fields, such as host.name doesn't sound so great.". We need to get on the same page here: one thing is whether documents contain objects or not, for instance we could relax the current restriction and allow you to still provide host: {name: ""} in an object shape in your documents. This requires extra work on our end but it's doable. Another thing is allowing for mappings to hold objects when subobjects are disabled: this is not possible. If your root object is configured to not hold subobjects, host cannot be an object in your mappings, and in that case specifying host.name as a keyword field in your mapping will be treated differently, meaning host will not be made an object automatically. I hope, but we need to verify this, that you don't specify host as an object in your mappings.

I hope this clarifies things, but just in case, I would like to see your mappings and real sample documents. It would be great for us also to base our tests on your real usecase rather than dummy documents.

Please ping me and let's chat on a zoom if things are unclear.

@felixbarny
Copy link
Member

felixbarny commented May 19, 2022

@javanna and I had a chat on zoom.

A container object for metrics (all metrics are nested under a metrics object) would make things simpler and probably cleaner, too. Currently, there is a risk that a metric name conflicts with a metadata field. For example, if a metric is called host or host.name. In practice, it seems unlikely that there will often be conflicts, but it's a possibility. But introducing a container object now would be a breaking change.

Currently, the mapping templates we generate are nested. For example:

{
  "mappings": {
    "properties": {
      "host": {
        "properties": {
          "name": {
            "type": "keyword",
            "ignore_above": 1024
          }
        }
      }
    }
  }
}

If we flatten the mappings like this:

{
  "mappings": {
    "dynamic": "true",
    "properties": {
      "host.name": {
        "type": "keyword",
        "ignore_above": 1024
      }
    }
  }
}

They can work both for the case where subobjects is set to true and false. When set to false, Elasticsearch will automatically create nested objects. As Andrew said, that's something we'd need to change in Fleet but it seems doable at first glance.

Another thing to watch out for when mapping the root with subobjects: false is that even the metadata fields in the metric documents need to be flattened. So either APM Server or Elasticsearch needs to transform

{
  "host": {
    "name": "myhost"
  },
  "foo": 42,
}

Into

{
  "host.name": "myhost",
  "foo": 42,
}

I agreed with @javanna that for now (8.3), Elasticsearch should not automatically do the flattening. If it's not feasible to do it in APM Server, we can talk about adding a flattening feature to Elasticsearch.

Luca also explained to me that the very same limitations for dotted field names apply to TSDB.

@OrangeDog
Copy link

I think the metrics. prefix is the best idea. This will be a breaking change anyway, as any existing metrics would have to have replaced the dots somehow.

@javanna
Copy link
Member

javanna commented May 19, 2022

I don't mean to interfere here and I think that you folks need to have an internal discussion to align on the matter, but based on my preliminary chat with Felix if felt like we are trying not to make breaking changes, and we should be able to achieve that. Please ping me if you discuss this, I am glad to be involved.

@javanna
Copy link
Member

javanna commented May 23, 2022

@axw heads up: the bug you hit at your first try is now fixed in master. Thanks for your patience :)

@axw
Copy link
Member

axw commented May 24, 2022

I think the metrics. prefix is the best idea. This will be a breaking change anyway, as any existing metrics would have to have replaced the dots somehow.

@OrangeDog we only replace dots when we don't know the metric names ahead of time, e.g. for custom application metrics.

For builtin metrics (e.g. https://www.elastic.co/guide/en/apm/agent/java/current/metrics.html), we are not de-dotting. So for example, the builtin metric jvm.memory.heap.used will be dynamically mapped as "jvm": {"memory": {"heap": {"used": .... Elasticsearch already allows this to be addressed as jvm.memory.heap.used in search queries, and will return the flattened form in fields in search hits.

I do think having a prefix would be neater and simpler, but we would like to avoid breaking users for the moment.

I agreed with @javanna that for now (8.3), Elasticsearch should not automatically do the flattening. If it's not feasible to do it in APM Server, we can talk about adding a flattening feature to Elasticsearch.

Luca also explained to me that the very same limitations for dotted field names apply to TSDB.

If we need to flatten all metadata fields at ingest time, then we will need to perform some additional testing around the _source storage cost. As there would be a lot of redundancy, it may increase storage size significantly, but perhaps it will compress well enough. If size increases significantly, we might need to tie in with elastic/elasticsearch#86603.

@felixbarny
Copy link
Member

@ruflin told me that he also had a chat with @javanna. He said that for Beats the automatic flattening would be important so that older Beats can write to newer Elasticsearch versions. We were thinking that this could be done with a flatten ingest processor. Basically something like the reverse of the dot_expander.

@javanna
Copy link
Member

javanna commented May 30, 2022

Quick update on my end: 8.3 will include support for the subobjects parameter but it will require to adapt documents, as they can't hold any object field when subobjects are disabled (except for non-object fields that can parse objects natively like geo-point).

Based on the conversations I had, and more thinking I did around the need to adapt documents, I think that the next step should be for Elasticsearch to automatically handle documents containing objects despite subobjects are disabled. This will help solutions as well as users in general adopt the new provided solution for storing metrics. While adapting mappings is required, the need to adapt documents is something we should avoid.

While an ingest processor may help, I am not comfortable with effectively requiring to use an ingest processor to overcome this limitation. Also, it would not be straight-forward to come up with a complete solution in an ingest processor as some objects (think geo_points or other "leaf" fields that are able to parse objects) do need to be preserved, while some others need to be flattened. Ingest does not have knowledge of mappings which makes this not possible (unless a list of fields not to be flattened is provided as a configuration parameter). For these reasons I think we should work on a built-in solution, which will end up affecting how objects are dynamically mapped by Elasticsearch. I will follow-up on this.

@javanna
Copy link
Member

javanna commented Jun 1, 2022

By the way, I just bumped in to an sdh around APM hitting the total fields limit (2000), and it turns out that 300+ of those fields are object fields, which would go away from the mappings with the subobjects: false solution, which feels good given that those object fields don't have much use.

This is assuming that exists query are not performed against objects (a controversial feature that we would love not to have to support), and that those objects are in the mappings only to hold properties and not to declare their dynamic behaviour or even disable their parsing.

@felixbarny
Copy link
Member

felixbarny commented Jul 25, 2022

I think that the next step should be for Elasticsearch to automatically handle documents containing objects despite subobjects are disabled.

@javanna what's the state of this? Is there an issue that you can link?

Once we have that in, it seems to be safe to set subobjects: false on the root level. At least for metric data streams (probably also for logs and possibly others) this seems to be a better default than subobjects: true. Are you aware of any downsides of doing this besides not being able to use exists queries on object fields?


Edit: I've created elastic/elasticsearch#88934

@Mpdreamz
Copy link
Member

I wonder if there still another unexplored option other than setting subobjects: false at the root and switching to treating Elasticsearch documents solely as key/value maps. There is still a lot of value in being able to enforce a structure so anyone consuming _source can be sure something is an object.

e.g do I read _source.host.name or _source['host.name'].

What if we can switch to storing metrics under a dedicated field metrics that we'd treat as a key/value map. By introducing a new feature that would allow us to query that data as if it was stored at the root we would get best of both worlds e.g:

"metrics" : { 
   subobjects: false
   lensed: true
}

any metric stored under metrics could then be queried as metrics.jvm.gc.time OR jvm.gc.time.

Conflicts could be handled at ingest time e.g

metrics.host.name would be invalid because a mapping for host.name already exists.

Older producers still writing
jvm.gc.time would need to be handled by a new mapping mechanism to rewrite it to metrics.jvm.gc.time

New producers writing to metrics.jvm.gc.time would be valid because even though it already has a mapping for jvm.gc.time its explicitly routed to the new name.

That way we could move our metric mappings over to the metrics prefix in an update without introducing breaking changes.

@felixbarny
Copy link
Member

There is still a lot of value in being able to enforce a structure so anyone consuming _source can be sure something is an object.

The structure of _source would be unaffected by this. The _source can still contain nested objects but the mapping to Lucene fields could be flat key/value pairs.
Having said that, with disabled or synthetic source, it would be harder to re-construct the original source in the same way. But I don't see that as a big issue, especially for a metrics index.

@javanna
Copy link
Member

javanna commented Aug 3, 2022

Thanks for creating the issue @felixbarny . I was on holiday when you asked for status, and I am resuming now the discussion around this. I investigated what changes are needed to support flattening objects and some bits require more discussion with the team because they affect quite a bit dynamic mappings of objects as well as field types that support parsing objects like geo_points. Stay tuned, I will keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants