Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat include_lines prior multiline #12562

Closed
jose-caballero opened this issue Jun 16, 2019 · 23 comments
Closed

Filebeat include_lines prior multiline #12562

jose-caballero opened this issue Jun 16, 2019 · 23 comments
Labels
enhancement Filebeat Filebeat Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team

Comments

@jose-caballero
Copy link

jose-caballero commented Jun 16, 2019

Describe the enhancement:

Allow FileBeat to process include_lines before executing multiline patterns.

Describe a specific use case for the enhancement or feature:

Here is a real use case.

  1. Same FileBeat running on many hosts (thousands), sending data to a central LogStash host.
  2. Only around 1% of the content in the log files read by FileBeat is relevant.
  3. The lines that are relevant need to be processed by LogStash together.

Under these circumstances, the idea solution is to let FileBeat to first filter data by using include_lines, and then merge into a single line the results, so LogStash can process all of it at once. Otherwise, LogStash requires a fairly complicate plugin coding to distinguish which line comes from each host to avoid mixing them.

@kklmm
Copy link

kklmm commented Jul 10, 2019

I second this.
Use case:
Have a huge log file with relevant lines starting with timestamp, non-relevant lines not including timestamp. Merging relevant lines into a single document using multiline.

example:
include_lines: '^\d{4}-\d{2}-\d{2}'
multiline.pattern: 'Check command Start'
multiline.negate: true
multiline.match: after
multiline.flush_pattern: 'Check command end'

my final document should be 10-15 lines long. But instead its over 50 lines long; most of it useless data.

@botelastic
Copy link

botelastic bot commented Jul 8, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Jul 8, 2020
@jose-caballero
Copy link
Author

Does this mean the suggestion won't be implemented?

@botelastic botelastic bot removed the Stalled label Jul 8, 2020
@jsoriano jsoriano added the Team:Elastic-Agent Label for the Agent team label May 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 10, 2021
@ph
Copy link
Contributor

ph commented May 11, 2021

@jose-caballero Not it doesn't mean that, we just closed stalled issue after a few months. We haven't prioritized this enhancement yet.

@reddybhavaniprasad
Copy link

reddybhavaniprasad commented Aug 3, 2021

Any alternatives to implement this? or we just end up by having lots of unwanted data in multiline.

@belimawr
Copy link
Contributor

belimawr commented Apr 6, 2022

That is a very interesting issue. The current implementation applies the include_lines and exclude_lines after the multline (actually the processors), so once the entry is ready it can be filtered. What you folks are asking is to invert this order, which is a interesting use case as well (however, I'm just not sure how common it is). I'll try to get more input about where it stands in our priorities.

@reddybhavaniprasad a possible workaround is to use the script processor (or other processor) to clean up the event after it has been assembled by the input. It does not look like an ideal solution, but it might be doable.

@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 6, 2022
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@belimawr belimawr assigned belimawr and unassigned belimawr Apr 6, 2022
@nimarezainia
Copy link
Contributor

@belimawr @ph I also believe that we should consider resolving this enhancement request by using input processors on the Elastic Agent. Input processors are there when data needs to be enriched at the edge and this request fits into that.

your thoughts?

@belimawr
Copy link
Contributor

@belimawr @ph I also believe that we should consider resolving this enhancement request by using input processors on the Elastic Agent. Input processors are there when data needs to be enriched at the edge and this request fits into that.

your thoughts?

I'm not sure I follow it @nimarezainia . Do you mean the current implementation of Elastic-Agent or the V2? Because if the current Filebeat does not support it, then how can the Elastic-Agent influence the log processing (that's all done in Filebeat)?

One idea that might work (but we need to check the feasibility) is to have include_lines and exclude_lines as processors, so a user can execute them in whichever order they want.

@ph
Copy link
Contributor

ph commented Apr 19, 2022

@nimarezainia Yes, I think that would make sense too, we were doing this in logstash as a filter. If you are doing this there is a caveat that need to be handle correctly.

When you move stuff outside of a plugin (function call) there is a risk that multiples events from multiples files are processed, so there is some kind of file identity or source identity that need to exist. Ie you need to have procesing be done on the same stream of events and not on all the events.

I believe our processors look likes this.

read_source("A.log", "B.log", "C.log") -> processors(a_events, b_events, c_events)
read_source("A.log", "B.log", "C.log") -> a_processors(a_events), b_processors(b_events), c_processors(c_events)

The later ensure that a stream, ie all events from the same files keep their meaning and that include_lines or exclude_lines helps you refine the events coming from a file.

@ph
Copy link
Contributor

ph commented Apr 19, 2022

@nimarezainia The other problem is you actually also need to have a periodic flush in processors which we do not have, especially if you have multiline support.

@nimarezainia
Copy link
Contributor

@ph @belimawr I will move this back to the queue. I am assuming that addressing this in Filebeat will also have it available for Agent to use. If that is not the case then work needs to be prioritized for the Agent (pls).

@ph
Copy link
Contributor

ph commented May 19, 2022

Yes, it will become available on agent.

@MakoWish
Copy link
Contributor

I am also facing this issue. The only difference is @jose-caballero says he doesn't want to include irrelevant lines. In my case, the irrelevant data I want to exclude pushes me over an apparent line limit, and the message gets truncated, so I cannot include those irrelevant lines. If I had the option to process include_lines before the multiiline processor, it would resolve my issue.

@botelastic
Copy link

botelastic bot commented Jul 8, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jul 8, 2023
@jose-caballero
Copy link
Author

Still relevant. Thanks a lot. :)

@botelastic botelastic bot removed the Stalled label Jul 8, 2023
@MakoWish
Copy link
Contributor

Yes, still want to see this one fixed.

@botelastic
Copy link

botelastic bot commented Jul 9, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jul 9, 2024
@MakoWish
Copy link
Contributor

MakoWish commented Jul 9, 2024

2

@botelastic botelastic bot removed the Stalled label Jul 9, 2024
@belimawr
Copy link
Contributor

belimawr commented Aug 6, 2024

For the Filestream input it is possible to select lines before running the multiline using the include_message parser.

Here is an example:

  1. Create a file named /tmp/flog.log

    /tmp/flog.log

    EXCLUDED 1
    EXCLUDED 2
    INCLUDED 1
    INCLUDED 2
    MULTI 1
    INCLUDED 3
    INCLUDED 4
    MULTI 2
    EXCLUDED 3
    INCLUDED 5
    EXCLUDED 4
    INCLUDED 6
    MULTI 3
    

  2. Start Filebeat with the following configuration

    filebeat.yml

    filebeat.inputs:
    - type: filestream
      id: my-filestream-id
      enabled: true
      paths:
        - /tmp/flog.log
      parsers:
        - include_message:
            patterns: ["^INCLUDED", "^MULTI"]
        - multiline:
            type: pattern
            pattern: '^MULTI'
            negate: true
            match: before
    
    output.console:
      codec.json:
        pretty: true

  3. You will get the following events:

INCLUDED 1\nINCLUDED 2\nMULTI 1
INCLUDED 3\nINCLUDED 4\nMULTI 2
INCLUDED 5\nINCLUDED 6\nMULTI 3

@belimawr
Copy link
Contributor

belimawr commented Aug 6, 2024

I'm closing this issue as I believe it solves the problem described. Feel free to re-open if that's not the case.

@belimawr belimawr closed this as completed Aug 6, 2024
@belimawr
Copy link
Contributor

belimawr commented Aug 7, 2024

Adding a bit more context, there was a bug in the configuration validation logic that would fail to instantiate this parser, this got fixed in v8.14.2, the documentation will reflect that in the next release, v8.15.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Filebeat Filebeat Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
Development

No branches or pull requests

10 participants