Be more lenient when ingesting data #4659

ruflin · 2022-11-16T00:37:15Z

Today when data is shipped to a data stream for certain data streams, several problems can happen which leads to rejection of the data. This issue is to discuss potential workarounds and eventually standardise on the usage of it.

Potential issue

The following is a list of known / existing issues with proposed workarounds. Lets discuss on the pros / cons of each. This list will be updated.

Keyword / object conflict

If a user ingests foo and foo.bar, an object / keyword conflict will happen an ingestion stops. This issue can be prevent by using subobject: false: https://www.elastic.co/guide/en/elasticsearch/reference/master/subobjects.html This works when the data is ingested already flattened but the idea is that Elasticsearch can do this for us in the future: elastic/elasticsearch#88934

type conflict - ignore_malformed

If a: 15 and later a: "foo" is ingested, a type conflict happens and ingestion stops. Ideally data continues to be ingested but the foo value is not index. To be more lenient on fields, ignore_malformed: true needs to be used: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html

Not index in the first place

Historically we did index all the fields. But with runtime fields, we are able to make some fields available for query but do not have to index it which will also help reduce conflicts.

Use more error handling in ingest pipelines

By default, ignore most errors in ingest pipelines. This is also problematic as it makes it hard to catch actual errors.

Use custom ingest pipelines as escape hatch

In case all the above does not work, add a custom ingest pipeline to solve the problem.

More thoughts

For new data streams, some of these things can be directly applied. For existing data streams, it needs to be discussed if some of these might be breaking changes. Also some of these features are only available in newer version of Elasticsearch. Will it mean packages are now only compatible with newer versions or Elasticsearch or will Fleet handle some compatibility? Could a use change these flags per data stream in Fleet?

Links

In elastic/elasticsearch#89743 a related discussion happens around having a more lenient base template.

The text was updated successfully, but these errors were encountered:

jsoriano · 2022-11-16T09:56:05Z

The flattened type can also help in some cases, but it has other limitations. Discussion about this can be found in https:/elastic/obs-infraobs-team/issues/461 (internal, sorry).

botelastic · 2023-11-16T10:36:44Z

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

ruflin · 2023-11-17T01:35:49Z

The team is actively working on this especially in the context of logs. ignore_malformed was already addressed, more to come.

ruflin · 2024-03-26T09:01:18Z

All data coming into logs-- by default now have ECS mappings, use ignore_malformed, and use the new dynamic field limit. Same applies eventually for integrations when upgrade happens to the newest version of the stack and the integrations has adopted it.

The failure store is under development. As soon as it lands, we should revisit how we treat pipeline failures. Going to close this for now.

botelastic bot added the Stalled label Nov 16, 2023

botelastic bot removed the Stalled label Nov 17, 2023

ruflin self-assigned this Mar 6, 2024

ruflin closed this as completed Mar 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Be more lenient when ingesting data #4659

Be more lenient when ingesting data #4659

ruflin commented Nov 16, 2022 •

edited

Loading

jsoriano commented Nov 16, 2022 •

edited

Loading

botelastic bot commented Nov 16, 2023

ruflin commented Nov 17, 2023

ruflin commented Mar 26, 2024

Be more lenient when ingesting data #4659

Be more lenient when ingesting data #4659

Comments

ruflin commented Nov 16, 2022 • edited Loading

Potential issue

Keyword / object conflict

type conflict - ignore_malformed

Not index in the first place

Use more error handling in ingest pipelines

Use custom ingest pipelines as escape hatch

More thoughts

Links

jsoriano commented Nov 16, 2022 • edited Loading

botelastic bot commented Nov 16, 2023

ruflin commented Nov 17, 2023

ruflin commented Mar 26, 2024

ruflin commented Nov 16, 2022 •

edited

Loading

jsoriano commented Nov 16, 2022 •

edited

Loading