Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add test for elasticsearch re-connection after network error & allow graceful shutdown #40794

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Sep 12, 2024

Proposed commit message

This commit reworks the eslegclient.Connection to accept a context in its Connect method, this allows the caller to cancel any in flight requests made by the connection by cancelling the context.

The libbeat outputs.Connectable interface (used by outputs.NetworkClient) had to be updated to accept the context, which required refactoring in most of the outputs to also accept a context on connect.

The worker from libbeat/publisher/pipeline/client_worker.go now uses a context for it's cancellation instead of a channel,
this context is also used when creating a connection to Elasticsearch.

An integration test is added to ensure the
ES output can always recover from network errors.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • [ ] I have made corresponding changes to the documentation
  • [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Disruptive User Impact

It's a bug fix, there is no disruptive user impact

## Author's Checklist

How to test this PR locally

  1. Build Filebeat
  2. Get it sending data to ES
  3. Disconnect from the network, stop ES, do anything that will prevent Filebeat from reaching ES
  4. Wait for network error logs
  5. Re-start ES/reconnect to the network
  6. Filebeat should recover and start sending data again.

Related issues

## Use cases
## Screenshots
## Logs

@belimawr belimawr added the skip-ci Skip the build in the CI but linting label Sep 12, 2024
@belimawr belimawr self-assigned this Sep 12, 2024
@belimawr belimawr requested review from a team as code owners September 12, 2024 17:07
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 12, 2024
Copy link
Contributor

mergify bot commented Sep 12, 2024

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b fix-es-connection-issue upstream/fix-es-connection-issue
git merge upstream/main
git push upstream fix-es-connection-issue

Copy link
Contributor

mergify bot commented Sep 12, 2024

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @belimawr? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-8./d is the label to automatically backport to the 8./d branch. /d is the digit

Copy link
Contributor

mergify bot commented Sep 12, 2024

backport-8.x has been added to help with the transition to the new branch 8.x.
If you don't need it please use backport-skip label and remove the backport-8.x label.

@mergify mergify bot added the backport-8.x Automated backport to the 8.x branch with mergify label Sep 12, 2024
@belimawr belimawr added the Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team label Sep 12, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 12, 2024
@belimawr belimawr added needs_team Indicates that the issue/PR needs a Team:* label and removed skip-ci Skip the build in the CI but linting labels Sep 12, 2024
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 12, 2024
@belimawr belimawr added the backport-8.15 Automated backport to the 8.15 branch with mergify label Sep 12, 2024
Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I have a question. I'll approve once it's answered

libbeat/tests/integration/elasticsearch_test.go Outdated Show resolved Hide resolved
When the Elasticsearch client fails to publish events, it ends up
calling `Close` in the connection (that is reused). To cancel the
in-flight requests, the context is cancelled and a new one is created
to used in future requests.

The callback to check the version holds a reference to the connection
via a closure, now the Elasticsearch client holds a pointer to that
connection, so whenever Close is called, the callback can create a
request with the new, not cancelled, context.

An integration test is added to ensure the
ES output can always recover from network errors.
This commit moves the creation of the request context to the connect
method.
There are some cases where the Connection will be used without calling
Connect, so we initialise reqsContext and cancelReqs in the
NewConnection function to avoid panics.
Connection.Connect now accepts a context to control the life cycle of
its requests.
Add a context to outputs.Connectable.Connect to correctly manage the
life cycle of the connection and it's requests.
@pierrehilbert pierrehilbert added the backport-8.16 Automated backport with mergify label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-8.x Automated backport to the 8.x branch with mergify backport-8.15 Automated backport to the 8.15 branch with mergify backport-8.16 Automated backport with mergify Team:Elastic-Agent-Data-Plane Label for the Agent Data Plane team Team:obs-ds-hosted-services Label for the Observability Hosted Services team Team:Security-Linux Platform Linux Platform Team in Security Solution Team:Security-Windows Platform Windows Platform Team in Security Solution
Projects
None yet
9 participants