Filebeat include_lines prior multiline #12562

jose-caballero · 2019-06-16T18:55:24Z

Describe the enhancement:

Allow FileBeat to process include_lines before executing multiline patterns.

Describe a specific use case for the enhancement or feature:

Here is a real use case.

Same FileBeat running on many hosts (thousands), sending data to a central LogStash host.
Only around 1% of the content in the log files read by FileBeat is relevant.
The lines that are relevant need to be processed by LogStash together.

Under these circumstances, the idea solution is to let FileBeat to first filter data by using include_lines, and then merge into a single line the results, so LogStash can process all of it at once. Otherwise, LogStash requires a fairly complicate plugin coding to distinguish which line comes from each host to avoid mixing them.

kklmm · 2019-07-10T13:04:07Z

I second this.
Use case:
Have a huge log file with relevant lines starting with timestamp, non-relevant lines not including timestamp. Merging relevant lines into a single document using multiline.

example:
include_lines: '^\d{4}-\d{2}-\d{2}'
multiline.pattern: 'Check command Start'
multiline.negate: true
multiline.match: after
multiline.flush_pattern: 'Check command end'

my final document should be 10-15 lines long. But instead its over 50 lines long; most of it useless data.

botelastic · 2020-07-08T19:17:36Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

jose-caballero · 2020-07-08T19:54:11Z

Does this mean the suggestion won't be implemented?

elasticmachine · 2021-05-10T13:39:08Z

Pinging @elastic/agent (Team:Agent)

ph · 2021-05-11T17:18:57Z

@jose-caballero Not it doesn't mean that, we just closed stalled issue after a few months. We haven't prioritized this enhancement yet.

reddybhavaniprasad · 2021-08-03T06:29:50Z

Any alternatives to implement this? or we just end up by having lots of unwanted data in multiline.

belimawr · 2022-04-06T16:21:19Z

That is a very interesting issue. The current implementation applies the include_lines and exclude_lines after the multline (actually the processors), so once the entry is ready it can be filtered. What you folks are asking is to invert this order, which is a interesting use case as well (however, I'm just not sure how common it is). I'll try to get more input about where it stands in our priorities.

@reddybhavaniprasad a possible workaround is to use the script processor (or other processor) to clean up the event after it has been assembled by the input. It does not look like an ideal solution, but it might be doable.

elasticmachine · 2022-04-06T16:21:28Z

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

nimarezainia · 2022-04-15T19:09:06Z

@belimawr @ph I also believe that we should consider resolving this enhancement request by using input processors on the Elastic Agent. Input processors are there when data needs to be enriched at the edge and this request fits into that.

your thoughts?

belimawr · 2022-04-19T10:54:48Z

@belimawr @ph I also believe that we should consider resolving this enhancement request by using input processors on the Elastic Agent. Input processors are there when data needs to be enriched at the edge and this request fits into that.

your thoughts?

I'm not sure I follow it @nimarezainia . Do you mean the current implementation of Elastic-Agent or the V2? Because if the current Filebeat does not support it, then how can the Elastic-Agent influence the log processing (that's all done in Filebeat)?

One idea that might work (but we need to check the feasibility) is to have include_lines and exclude_lines as processors, so a user can execute them in whichever order they want.

ph · 2022-04-19T17:00:26Z

@nimarezainia Yes, I think that would make sense too, we were doing this in logstash as a filter. If you are doing this there is a caveat that need to be handle correctly.

When you move stuff outside of a plugin (function call) there is a risk that multiples events from multiples files are processed, so there is some kind of file identity or source identity that need to exist. Ie you need to have procesing be done on the same stream of events and not on all the events.

I believe our processors look likes this.

read_source("A.log", "B.log", "C.log") -> processors(a_events, b_events, c_events)

read_source("A.log", "B.log", "C.log") -> a_processors(a_events), b_processors(b_events), c_processors(c_events)

The later ensure that a stream, ie all events from the same files keep their meaning and that include_lines or exclude_lines helps you refine the events coming from a file.

ph · 2022-04-19T17:12:37Z

@nimarezainia The other problem is you actually also need to have a periodic flush in processors which we do not have, especially if you have multiline support.

nimarezainia · 2022-05-18T18:25:03Z

@ph @belimawr I will move this back to the queue. I am assuming that addressing this in Filebeat will also have it available for Agent to use. If that is not the case then work needs to be prioritized for the Agent (pls).

ph · 2022-05-19T13:24:37Z

Yes, it will become available on agent.

MakoWish · 2022-05-19T22:06:48Z

I am also facing this issue. The only difference is @jose-caballero says he doesn't want to include irrelevant lines. In my case, the irrelevant data I want to exclude pushes me over an apparent line limit, and the message gets truncated, so I cannot include those irrelevant lines. If I had the option to process include_lines before the multiiline processor, it would resolve my issue.

botelastic · 2023-07-08T05:08:40Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

jose-caballero · 2023-07-08T09:24:47Z

Still relevant. Thanks a lot. :)

MakoWish · 2023-07-10T14:46:03Z

Yes, still want to see this one fixed.

botelastic · 2024-07-09T15:11:44Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

MakoWish · 2024-07-09T15:31:18Z

belimawr · 2024-08-06T20:58:46Z

For the Filestream input it is possible to select lines before running the multiline using the include_message parser.

Here is an example:

Create a file named /tmp/flog.log

/tmp/flog.log

EXCLUDED 1
EXCLUDED 2
INCLUDED 1
INCLUDED 2
MULTI 1
INCLUDED 3
INCLUDED 4
MULTI 2
EXCLUDED 3
INCLUDED 5
EXCLUDED 4
INCLUDED 6
MULTI 3

Start Filebeat with the following configuration

filebeat.yml

filebeat.inputs:
- type: filestream
  id: my-filestream-id
  enabled: true
  paths:
    - /tmp/flog.log
  parsers:
    - include_message:
        patterns: ["^INCLUDED", "^MULTI"]
    - multiline:
        type: pattern
        pattern: '^MULTI'
        negate: true
        match: before

output.console:
  codec.json:
    pretty: true

You will get the following events:

INCLUDED 1\nINCLUDED 2\nMULTI 1
INCLUDED 3\nINCLUDED 4\nMULTI 2
INCLUDED 5\nINCLUDED 6\nMULTI 3

belimawr · 2024-08-06T21:01:42Z

I'm closing this issue as I believe it solves the problem described. Feel free to re-open if that's not the case.

belimawr · 2024-08-07T12:26:26Z

Adding a bit more context, there was a bug in the configuration validation logic that would fail to instantiate this parser, this got fixed in v8.14.2, the documentation will reflect that in the next release, v8.15.0

ph added enhancement Filebeat Filebeat labels Jun 17, 2019

botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Jul 8, 2020

botelastic bot removed the Stalled label Jul 8, 2020

jsoriano added the Team:Elastic-Agent Label for the Agent team label May 10, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 10, 2021

belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Apr 6, 2022

belimawr assigned belimawr and unassigned belimawr Apr 6, 2022

jlind23 added the 8.5-candidate label Jun 1, 2022

jlind23 removed the 8.5-candidate label Jul 8, 2022

botelastic bot added the Stalled label Jul 8, 2023

botelastic bot removed the Stalled label Jul 8, 2023

botelastic bot added the Stalled label Jul 9, 2024

botelastic bot removed the Stalled label Jul 9, 2024

belimawr closed this as completed Aug 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat include_lines prior multiline #12562

Filebeat include_lines prior multiline #12562

jose-caballero commented Jun 16, 2019 •

edited

Loading

kklmm commented Jul 10, 2019 •

edited

Loading

botelastic bot commented Jul 8, 2020

jose-caballero commented Jul 8, 2020

elasticmachine commented May 10, 2021

ph commented May 11, 2021

reddybhavaniprasad commented Aug 3, 2021 •

edited

Loading

belimawr commented Apr 6, 2022

elasticmachine commented Apr 6, 2022

nimarezainia commented Apr 15, 2022

belimawr commented Apr 19, 2022

ph commented Apr 19, 2022

ph commented Apr 19, 2022

nimarezainia commented May 18, 2022

ph commented May 19, 2022

MakoWish commented May 19, 2022

botelastic bot commented Jul 8, 2023

jose-caballero commented Jul 8, 2023

MakoWish commented Jul 10, 2023

botelastic bot commented Jul 9, 2024

MakoWish commented Jul 9, 2024

belimawr commented Aug 6, 2024

belimawr commented Aug 6, 2024

belimawr commented Aug 7, 2024

Filebeat include_lines prior multiline #12562

Filebeat include_lines prior multiline #12562

Comments

jose-caballero commented Jun 16, 2019 • edited Loading

kklmm commented Jul 10, 2019 • edited Loading

botelastic bot commented Jul 8, 2020

jose-caballero commented Jul 8, 2020

elasticmachine commented May 10, 2021

ph commented May 11, 2021

reddybhavaniprasad commented Aug 3, 2021 • edited Loading

belimawr commented Apr 6, 2022

elasticmachine commented Apr 6, 2022

nimarezainia commented Apr 15, 2022

belimawr commented Apr 19, 2022

ph commented Apr 19, 2022

ph commented Apr 19, 2022

nimarezainia commented May 18, 2022

ph commented May 19, 2022

MakoWish commented May 19, 2022

botelastic bot commented Jul 8, 2023

jose-caballero commented Jul 8, 2023

MakoWish commented Jul 10, 2023

botelastic bot commented Jul 9, 2024

MakoWish commented Jul 9, 2024

belimawr commented Aug 6, 2024

belimawr commented Aug 6, 2024

belimawr commented Aug 7, 2024

jose-caballero commented Jun 16, 2019 •

edited

Loading

kklmm commented Jul 10, 2019 •

edited

Loading

reddybhavaniprasad commented Aug 3, 2021 •

edited

Loading