Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libbeat] Stop publisher properly #40572

Merged
merged 15 commits into from
Aug 26, 2024
Merged

Conversation

marc-gr
Copy link
Contributor

@marc-gr marc-gr commented Aug 21, 2024

Proposed commit message

We avoided closing the publisher on Stop to avoid races with the ES output since we were not aborting the active connections. This adds:

  • ES output now aborts active requests on Close()
  • Beater closes publisher on stop.
  • Agent manager also closes telemetry and the publisher on the Close callback.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

  • Now ES connections will be aborted on stop and some events might need to be re-sent. This is in any case a change that puts the ES output on par with how the rest of the outputs work, since was the only one leaving active conns unattended from what I observed.

Related issues

@marc-gr marc-gr added bugfix backport-8.15 Automated backport to the 8.15 branch with mergify labels Aug 21, 2024
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Aug 21, 2024
@marc-gr marc-gr marked this pull request as ready for review August 21, 2024 07:41
@marc-gr marc-gr requested a review from a team as a code owner August 21, 2024 07:41
@pierrehilbert pierrehilbert added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Aug 21, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Aug 21, 2024
@pierrehilbert pierrehilbert requested review from faec and removed request for mauri870 August 21, 2024 07:52
@marc-gr marc-gr marked this pull request as draft August 21, 2024 08:25
@marc-gr marc-gr marked this pull request as ready for review August 21, 2024 09:28
@marc-gr marc-gr marked this pull request as draft August 21, 2024 12:26
@marc-gr marc-gr marked this pull request as ready for review August 22, 2024 09:07
@AndersonQ
Copy link
Member

looks good, but I'll wait the tests to pass to approve it

Comment on lines +62 to 64
r, err := http.Get("http://localhost:5066") //nolint:noctx // fine for tests
require.NoError(t, err)
require.Equal(t, http.StatusOK, r.StatusCode, "incorrect status code")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker] cont.

you could use it instead

Suggested change
r, err := http.Get("http://localhost:5066") //nolint:noctx // fine for tests
require.NoError(t, err)
require.Equal(t, http.StatusOK, r.StatusCode, "incorrect status code")
buff := &bytes.Buffer{}
require.Eventually(t, func() bool {
buff.Reset()
r, err := http.Get("http://localhost:5066")
if err != nil {
_, _ = fmt.Fprintf(buff, "stats endpoint not available: %v", err)
return false
}
if r.StatusCode != http.StatusOK {
_, _ = fmt.Fprintf(buff, "stats endpoint: bad HTTPnstatus: %s",
r.Status)
return false
}
return true
}, time.Second, 100*time.Millisecond,
"stats endpoint never become available: %s", buff)

if you want, you could even remove the WaitForLogs. Just don't mix WaitForLogs and Eventually because WaitForLogs uses eventually underneath.

@@ -57,12 +57,14 @@ output.console:
mockbeat.WriteConfigFile(cfg)
mockbeat.Start()
mockbeat.WaitForLogs("Starting stats endpoint", 60*time.Second)
time.Sleep(time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Blocker]
I see why just WaitForLogs("Starting stats endpoint" isn't enough:

s.log.Info("Starting stats endpoint")
go func(l net.Listener) {
s.log.Infof("Metrics endpoint listening on: %s (configured: %s)", l.Addr().String(), s.config.Host)
err := http.Serve(l, s.mux)

but we don't use time.Sleep anymore, unless there is no other way. However here there is a better alternative. See below

Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid blocking the PR and it not making to 8.15.1, I'll approve and later open another PR to properly fix those tests. The issue is on libbeat, the mock beat does not stops properly, if it did, your PR would not have impacted those tests

@marc-gr marc-gr merged commit 4808269 into elastic:main Aug 26, 2024
123 checks passed
@marc-gr marc-gr deleted the fix/publisher-not-closed branch August 26, 2024 15:33
mergify bot pushed a commit that referenced this pull request Aug 26, 2024
* Stop publisher properly

* Just call beater.Stop from manager

* Delete duplicated lines

* Make call to stopBeat idempotent

* Add context at request creation to not break tracing

* Remove unused lint

* Add default WaitClose timeout

* Adjust wait on close time

* Add delay to account for the stop of the publisher

* Fix lint issues

* Fix lint issues

* Fix lint

(cherry picked from commit 4808269)
marc-gr added a commit that referenced this pull request Aug 27, 2024
* Stop publisher properly

* Just call beater.Stop from manager

* Delete duplicated lines

* Make call to stopBeat idempotent

* Add context at request creation to not break tracing

* Remove unused lint

* Add default WaitClose timeout

* Adjust wait on close time

* Add delay to account for the stop of the publisher

* Fix lint issues

* Fix lint issues

* Fix lint

(cherry picked from commit 4808269)

Co-authored-by: Marc Guasch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.15 Automated backport to the 8.15 branch with mergify bugfix Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team
Projects
None yet
4 participants