-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Be more lenient when ingesting data #4659
Comments
The flattened type can also help in some cases, but it has other limitations. Discussion about this can be found in https:/elastic/obs-infraobs-team/issues/461 (internal, sorry). |
Hi! We just realized that we haven't looked into this issue in a while. We're sorry! We're labeling this issue as |
The team is actively working on this especially in the context of logs. |
All data coming into logs-- by default now have ECS mappings, use ignore_malformed, and use the new dynamic field limit. Same applies eventually for integrations when upgrade happens to the newest version of the stack and the integrations has adopted it. The failure store is under development. As soon as it lands, we should revisit how we treat pipeline failures. Going to close this for now. |
Today when data is shipped to a data stream for certain data streams, several problems can happen which leads to rejection of the data. This issue is to discuss potential workarounds and eventually standardise on the usage of it.
Potential issue
The following is a list of known / existing issues with proposed workarounds. Lets discuss on the pros / cons of each. This list will be updated.
Keyword / object conflict
If a user ingests
foo
andfoo.bar
, an object / keyword conflict will happen an ingestion stops. This issue can be prevent by usingsubobject: false
: https://www.elastic.co/guide/en/elasticsearch/reference/master/subobjects.html This works when the data is ingested already flattened but the idea is that Elasticsearch can do this for us in the future: elastic/elasticsearch#88934type conflict - ignore_malformed
If
a: 15
and latera: "foo"
is ingested, a type conflict happens and ingestion stops. Ideally data continues to be ingested but thefoo
value is not index. To be more lenient on fields,ignore_malformed: true
needs to be used: https://www.elastic.co/guide/en/elasticsearch/reference/current/ignore-malformed.htmlNot index in the first place
Historically we did index all the fields. But with runtime fields, we are able to make some fields available for query but do not have to index it which will also help reduce conflicts.
Use more error handling in ingest pipelines
By default, ignore most errors in ingest pipelines. This is also problematic as it makes it hard to catch actual errors.
Use custom ingest pipelines as escape hatch
In case all the above does not work, add a custom ingest pipeline to solve the problem.
More thoughts
For new data streams, some of these things can be directly applied. For existing data streams, it needs to be discussed if some of these might be breaking changes. Also some of these features are only available in newer version of Elasticsearch. Will it mean packages are now only compatible with newer versions or Elasticsearch or will Fleet handle some compatibility? Could a use change these flags per data stream in Fleet?
Links
In elastic/elasticsearch#89743 a related discussion happens around having a more lenient base template.
The text was updated successfully, but these errors were encountered: