Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Be more lenient when ingesting data #4659

Closed
ruflin opened this issue Nov 16, 2022 · 4 comments
Closed

Be more lenient when ingesting data #4659

ruflin opened this issue Nov 16, 2022 · 4 comments
Assignees

Comments

@ruflin
Copy link
Member

ruflin commented Nov 16, 2022

Today when data is shipped to a data stream for certain data streams, several problems can happen which leads to rejection of the data. This issue is to discuss potential workarounds and eventually standardise on the usage of it.

Potential issue

The following is a list of known / existing issues with proposed workarounds. Lets discuss on the pros / cons of each. This list will be updated.

Keyword / object conflict

If a user ingests foo and foo.bar, an object / keyword conflict will happen an ingestion stops. This issue can be prevent by using subobject: false: https://www.elastic.co/guide/en/elasticsearch/reference/master/subobjects.html This works when the data is ingested already flattened but the idea is that Elasticsearch can do this for us in the future: elastic/elasticsearch#88934

type conflict - ignore_malformed

If a: 15 and later a: "foo" is ingested, a type conflict happens and ingestion stops. Ideally data continues to be ingested but the foo value is not index. To be more lenient on fields, ignore_malformed: true needs to be used: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.html

Not index in the first place

Historically we did index all the fields. But with runtime fields, we are able to make some fields available for query but do not have to index it which will also help reduce conflicts.

Use more error handling in ingest pipelines

By default, ignore most errors in ingest pipelines. This is also problematic as it makes it hard to catch actual errors.

Use custom ingest pipelines as escape hatch

In case all the above does not work, add a custom ingest pipeline to solve the problem.

More thoughts

For new data streams, some of these things can be directly applied. For existing data streams, it needs to be discussed if some of these might be breaking changes. Also some of these features are only available in newer version of Elasticsearch. Will it mean packages are now only compatible with newer versions or Elasticsearch or will Fleet handle some compatibility? Could a use change these flags per data stream in Fleet?

Links

In elastic/elasticsearch#89743 a related discussion happens around having a more lenient base template.

@jsoriano
Copy link
Member

jsoriano commented Nov 16, 2022

The flattened type can also help in some cases, but it has other limitations. Discussion about this can be found in https:/elastic/obs-infraobs-team/issues/461 (internal, sorry).

@botelastic
Copy link

botelastic bot commented Nov 16, 2023

Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1. Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Nov 16, 2023
@ruflin
Copy link
Member Author

ruflin commented Nov 17, 2023

The team is actively working on this especially in the context of logs. ignore_malformed was already addressed, more to come.

@botelastic botelastic bot removed the Stalled label Nov 17, 2023
@ruflin ruflin self-assigned this Mar 6, 2024
@ruflin
Copy link
Member Author

ruflin commented Mar 26, 2024

All data coming into logs-- by default now have ECS mappings, use ignore_malformed, and use the new dynamic field limit. Same applies eventually for integrations when upgrade happens to the newest version of the stack and the integrations has adopted it.

The failure store is under development. As soon as it lands, we should revisit how we treat pipeline failures. Going to close this for now.

@ruflin ruflin closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants