Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libbeat: Don't force an ignore_above limit on wildcard fields #30668

Merged
merged 3 commits into from
Mar 7, 2022

Conversation

adriansr
Copy link
Contributor

@adriansr adriansr commented Mar 3, 2022

What does this PR do?

Modifies libbeat's template processor to stop hardcoding a default ignore_above limit of 1024 on wildcard fields. This behavior was inherited from keyword fields.

From the Beats users point of view, I've considered this a bugfix. (Bugfix under Affecting All Beats in CHANGELOG).
From the community Beat developers point of view, I consider this a breaking change as someone may be relying on the previous default behavior (Breaking under Affecting all Beats in CHANGELOG-developer).

Why is it important?

Some important ECS wildcard fields are not being indexed properly. See related issue.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Related issues

Closes #30096

Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes elastic#30096
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 3, 2022
@mergify
Copy link
Contributor

mergify bot commented Mar 3, 2022

This pull request does not have a backport label. Could you fix it @adriansr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 7./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@mergify mergify bot added the backport-skip Skip notification from the automated backport with mergify label Mar 3, 2022
@adriansr adriansr added backport-v8.1.0 Automated backport with mergify bug libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Integrations Label for the Integrations team Team:Security-External Integrations labels Mar 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 3, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@adriansr adriansr added backport-7.17 Automated backport to the 7.17 branch with mergify and removed backport-skip Skip notification from the automated backport with mergify labels Mar 3, 2022
@elasticmachine
Copy link
Collaborator

elasticmachine commented Mar 3, 2022

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@cmacknz
Copy link
Member

cmacknz commented Mar 3, 2022

As far as I can tell the last time this ignore_above code was touched was fabdf0b by @ph in 2018. I'm not sure he remembers the context after that long but tagging him anyway just in case.

@ph
Copy link
Contributor

ph commented Mar 4, 2022

This is indeed a really long time ago, the ignore_older limit was indeed added for keyword. I don't exactly remember the actual issues here @ruflin maybe you know? We were really anxious about field explosion.

@ph
Copy link
Contributor

ph commented Mar 4, 2022

I've read about wildcard, since the limit was added previously. I am ok to thread wildcard differently than keywords but @adriansr what kind of data are you indexing in that field?

@ebeahan
Copy link
Member

ebeahan commented Mar 4, 2022

@ph I'll jump in and answer since I originally opened #30096.

Security-focused users have pointed out how specific shell commands, URLs, or encoded payloads values could exceed the 1024 character convention. Not having these values indexed can create detection blindspots. Users can adjust the ignore_above limit, but if you need to capture an immense value, you eventually hit the 32766 bytes Lucene max.

ECS migrated a small number of fields to use wildcard. Fields more likely to contain very long strings, like URLs or command-line executions, were picked. These migrations made it into Beats, but wildcard strings > 1024 characters are still not indexed since the ignore_above value is set on wildcard fields in the mappings.

Wildcard fields have some other benefits, but that's outside the scope of the focus here. 😄

@adriansr adriansr merged commit 677229f into elastic:main Mar 7, 2022
@adriansr adriansr deleted the wildcard_ignore_above branch March 7, 2022 08:19
mergify bot pushed a commit that referenced this pull request Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)

# Conflicts:
#	libbeat/template/processor_test.go
mergify bot pushed a commit that referenced this pull request Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)
adriansr added a commit that referenced this pull request Mar 7, 2022
#30708)

Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)

Co-authored-by: Adrian Serrano <[email protected]>
adriansr added a commit that referenced this pull request Mar 7, 2022
Modifies libbeat's template processor to stop hardcoding a default
`ignore_above` limit of 1024 on wildcard fields. This behavior was
inherited from keyword fields.

Closes #30096

(cherry picked from commit 677229f)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-7.17 Automated backport to the 7.17 branch with mergify backport-v8.1.0 Automated backport with mergify bug libbeat Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:Integrations Label for the Integrations team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fields using wildcard type should not specify ignore_above param
5 participants