Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace container input with filestream input in hints' default config #34354

Closed
ChrsMark opened this issue Jan 24, 2023 · 16 comments
Closed

Replace container input with filestream input in hints' default config #34354

ChrsMark opened this issue Jan 24, 2023 · 16 comments
Assignees
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team

Comments

@ChrsMark
Copy link
Member

In filebeat's hints based autodiscovery the raw container input is the default configuration that will be used if there is no module specified in the hints. See

.

I think we need to adjust it properly and adopt the usage of filestreams instead, which is now the recommended log input framework.

@ChrsMark ChrsMark added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Jan 24, 2023
@ChrsMark ChrsMark changed the title Replace container input with filestream input in hints' default confif Replace container input with filestream input in hints' default config Jan 24, 2023
@gizas
Copy link
Contributor

gizas commented Jan 24, 2023

We will need testing with docker and kubernetes containers (just adding it as a note)

cc @gsantoro

@ChrsMark
Copy link
Member Author

ChrsMark commented Jan 24, 2023

We will need testing with docker and kubernetes containers (just adding it as a note)

cc @gsantoro

@gizas do you mean that we need to cover the docker autodiscovery part as well? I only had the kubernetes autodiscovery in mind tbh but if we change the underline default config for logs then it should cover both providers. If you see at

// If there isn't a default template then attempt to use builders
e := d.generateHints(event)
if config := d.builders.GetConfig(e); config != nil {
configs = append(configs, config...)
the Builder that is called is the same (in k8s and docker provider) so the same default config would be returned. In this regard yes we need to verify that both providers work after this change (I wouldnt expect that one of them could not work but just to be sure).

Other than this, if the container parser is defined with auto in the format section then the parser will be capable to handle both cri and docker format. So no big deal here I think (again testing this would be helpful).

@gizas
Copy link
Contributor

gizas commented Jan 24, 2023

do you mean that we need to cover the docker autodiscovery part as well?

Yes indeed. As long as this change affects both (docker and k8s autodiscovery) that is why I would like to see tests for both scenarios, to be on safe side.

And we should update our doc examples for that. It needs to be part of the goals of this story

@gsantoro
Copy link
Contributor

Adding some context about the reason this was raised #13140 (comment)

@gsantoro
Copy link
Contributor

I have already run some quick tests about using filestream as default config with filebeat.autodiscover like the below

    filebeat.autodiscover:
     providers:
       - type: kubernetes
         node: ${NODE_NAME}
         hints.enabled: true
         hints.default_config:
           type: filestream
           paths:
             - /var/log/containers/*${data.kubernetes.container.id}.log
           parsers:
             - container:
               stream: all
               format: auto

and I got the following errors

{"log.level":"error","@timestamp":"2023-01-13T15:26:33.794Z","log.logger":"input","log.origin":{"file.name":"input-logfile/manager.go","file.line":182},"message":"filestream input with ID '' already exists, this will lead to data duplication, please use a different ID","service.name":"filebeat","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2023-01-13T15:26:37.520Z","log.logger":"input","log.origin":{"file.name":"input-logfile/manager.go","file.line":176},"message":"filestream input ID without ID might lead to data duplication, please add an ID and restart Filebeat","service.name":"filebeat","ecs.version":"1.6.0"}

If I understand correctly, @gizas was suggesting yesterday that similarly to how we have an id field at https:/elastic/integrations/blob/main/packages/kubernetes/data_stream/container_logs/agent/stream/stream.yml.hbs maybe we should have something similar in the default config. I haven't tested any of this but it is worth keeping truck of this here.

@ChrsMark
Copy link
Member Author

If I understand correctly, @gizas was suggesting yesterday that similarly to how we have an id field at https:/elastic/integrations/blob/main/packages/kubernetes/data_stream/container_logs/agent/stream/stream.yml.hbs maybe we should have something similar in the default config. I haven't tested any of this but it is worth keeping truck of this here.

Yes that's a known "issue"/"feature". We will need an id here too I guess to fix this.

@MichaelKatsoulis MichaelKatsoulis self-assigned this Feb 7, 2023
@cmacknz
Copy link
Member

cmacknz commented Feb 16, 2023

@gsantoro Yes, that is not a bug it is a misconfiguration. Every filestream input needs a unique ID field. From https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-filestream.html:

Each filestream input must have a unique ID. Omitting or changing the filestream ID may cause data duplication. Without a unique ID, filestream is unable to correctly track the state of files

Agent takes this even further and requires a unique ID for every input in the policy, not just filestream.

If we fix this problem, does it work?

@gsantoro
Copy link
Contributor

Agent takes this even further and requires a unique ID for every input in the policy, not just filestream.

If we fix this problem, does it work?

I have no idea. That's a good question for who is picking up this ticket.

@rdner
Copy link
Member

rdner commented Feb 24, 2023

Please don't forget to put take_over: true on the filestream config, otherwise customers would experience data duplication.

See #34292

@BenB196
Copy link

BenB196 commented Mar 30, 2023

FYI, in case anyone else wants to transition to filestream for auto-discovery, here is a working config:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints:
        enabled: true
        default_config:
          type: filestream
          id: kubernetes-container-logs-${data.kubernetes.pod.name}-${data.kubernetes.container.id}
          paths:
            - /var/log/containers/*${data.kubernetes.container.id}.log
          prospector.scanner.symlinks: true: # Optional, but most people probably use symlinks
          parsers:
          - container:
              stream: all
              format: auto

Doesn't include the take_over: true, but can be added if you're using Filebeat 8.7+

@jlind23
Copy link
Collaborator

jlind23 commented Mar 31, 2023

@rdner how much this overlaps / supersedes #34393?

@rdner
Copy link
Member

rdner commented Mar 31, 2023

@jlind23 if we migrate the container input, we don't need this issue. I'd say this is almost a duplicate of #34393.

@rdner
Copy link
Member

rdner commented Apr 3, 2023

I'm closing this issue because it's going to be solved by the container input migration.

@rdner rdner closed this as completed Apr 3, 2023
@asazallesmilner
Copy link

FYI, in case anyone else wants to transition to filestream for auto-discovery, here is a working config:

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      hints:
        enabled: true
        default_config:
          type: filestream
          id: kubernetes-container-logs-${data.kubernetes.pod.name}-${data.kubernetes.container.id}
          paths:
            - /var/log/containers/*${data.kubernetes.container.id}.log
          prospector.scanner.symlinks: true: # Optional, but most people probably use symlinks
          parsers:
          - container:
              stream: all
              format: auto

Doesn't include the take_over: true, but can be added if you're using Filebeat 8.7+

Does this make "Hints annotations based autodiscover" function?

@ChrsMark
Copy link
Member Author

Reopening for now based on #34393 (comment) and the open questions we have at #34393 (comment).

@ChrsMark
Copy link
Member Author

ChrsMark commented Jul 3, 2023

Closing in honor of #35984.

@ChrsMark ChrsMark closed this as completed Jul 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team
Projects
None yet
Development

No branches or pull requests

9 participants