Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming logs over TLS can cause an error for the receiver upon close #120

Open
chrisberkhout opened this issue Oct 4, 2024 · 1 comment
Assignees
Labels
bug Something isn't working Team:Security-Service Integrations Team:Security-Service Integrations

Comments

@chrisberkhout
Copy link

chrisberkhout commented Oct 4, 2024

It writes the data and then closes the connection, but sometimes the receiver does not receive all the data.

It seems to be the cause of the CI failures in elastic/integrations#11075, elastic/integrations#11224 (this one for tls and tcp), and elastic/integrations#10620.

Reproduce the bug

It can be reproduced as follows:

In one terminal, run this:

while true; do
  echo Running the server...
  echo
  openssl s_server -accept 4433 -naccept 1 \
    -cert ~/.elastic-package/profiles/default/certs/elastic-agent/cert.pem \
    -key ~/.elastic-package/profiles/default/certs/elastic-agent/key.pem \
    2>&1 | pv -L 100k | tee output.log
  echo
  if grep -q "ERROR" output.log; then
    echo "Error detected. Stopping."
    break
  fi
done

In another terminal, from the root of the integrations repository, repeatedly run the following, until the loop in the first terminal stops with an error:

stream log --delay=1s --addr localhost:4433 -p=tls --insecure packages/cyberarkpas/_dev/deploy/docker/sample_logs/audit/*.log

This will trigger an error on my system within a few runs. If necessary, try lowering the limit set by pv to apply additional backpressure on the server.

When there is an error, the stream log output will show that all files were sent, but the server log file will show that less data was received, and it will end with an ERROR.

Possible fixes

Closing a connection will usually ensure that all written data is sent. For TLS connections, after writing it may be necessary to keep the read side open for longer in order for the sent data to be accepted without error.

The more frequently observed failures and the reproduction were for TLS, but it may not be a TLS-only issue.

@chrisberkhout
Copy link
Author

It does seem to be a general TCP issue.

This article discusses the issue specifically for the case of wanting to just send some data and close:
https://blog.netherlabs.nl/articles/2009/01/18/the-ultimate-so_linger-page-or-why-is-my-tcp-not-reliable

In Go, shutdown of the write side would be via TCPConn.CloseWrite, and it may be worth trying other settings for TCPConn.SetLinger.

@narph narph added the Team:Security-Service Integrations Team:Security-Service Integrations label Oct 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Security-Service Integrations Team:Security-Service Integrations
Projects
None yet
Development

No branches or pull requests

2 participants