-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Auditbeat] Avoid having Linux wait on clearing a backlog #7157
Comments
https://bugzilla.redhat.com/show_bug.cgi?id=1437426 deep sigh - not supported even on CentOS 7 :(
|
@dilchenko Even the latest CentOS version 7.5 doesn't have this feature. |
On Ubuntu 18.04 LTS the default value of |
I’ve hit exactly this issue in a production environment today. Backpressure in Logstash caused Auditbeat to stop reading its buffer and thus make the kernel on multiple machines grind to a halt. Because Auditbeat was also running on the Logstash receiver box, it actually caused a cascading failure due to our Logstash box becoming unresponsive as well. Manual intervention was required to get the Logstash box up and running again, after which everything recovered. |
Also: |
The reason I researched the RHEL7 status is because this is the latest version of a major distribution, and it does not have support for To put it differently: we are offering a product that is guaranteed to break production system for any customer not running 3.14+ kernel with that setting tune. I just want to re-emphasize the importance of this issue. Some time soon, I will get to testing this on our systems with higher limits for rate/backlog. But we won't be able to roll |
RHEL/CentOS 6 is not EOL until November 2020, so we'll be stuck with kernel 2.6 for another while as well. Audit netlink multicast is supported since kernel 3.16 so that's probably not in RHEL7 either. |
This adds a new configuration option, "backpressure_strategy" to the auditd module in auditbeat. It allows to set different ways in which auditbeat can mitigate or avoid backpressure to propagate into the kernel and having an impact on audited processes. The possible values are: - "kernel": Auditbeat will set the backlog_wait_time in the kernel's audit framework to 0. This causes events to be discarded in kernel if the audit backlog queue fills to capacity. Requires a 3.14 kernel or newer. - "userspace": Auditbeat will drop events when there is backpressure from the publishing pipeline. - "both": "kernel" and "userspace" strategies at the same time. - "auto" (default): The "kernel" strategy will be used, if supported. Otherwise will fall back to "userspace". - "none": No backpressure mitigation measures will be enabled. Closes elastic#7157
This adds a new configuration option, "backpressure_strategy" to the auditd module in auditbeat. It allows to set different ways in which auditbeat can mitigate or avoid backpressure to propagate into the kernel and having an impact on audited processes. The possible values are: - "kernel": Auditbeat will set the backlog_wait_time in the kernel's audit framework to 0. This causes events to be discarded in kernel if the audit backlog queue fills to capacity. Requires a 3.14 kernel or newer. - "userspace": Auditbeat will drop events when there is backpressure from the publishing pipeline. If no rate_limit is set then it will set a rate limit of 5000. Users should test their setup and adjust the rate_limit option accordingly. - "both": "kernel" and "userspace" strategies at the same time. - "auto" (default): The "kernel" strategy will be used, if supported. Otherwise will fall back to "userspace". - "none": No backpressure mitigation measures will be enabled. Closes #7157 Other Changes: * Increase default `reassembler.queue_size` to 8192. * Change reassembler lost metric to count sequence gaps. It was renamed to `auditd.reassembler_seq_gaps`. * Add received metric that counts the total number of received messages. It's called `auditd.received_msgs`. * Auditd module ignores it's own syscall invocations by adding a kernel audit audit rule that ignores events from its own PID. This rule is added anytime that the user has defined audit rules. * Make the number of stream buffer consumers configurable. Originally there was only one consumer for the auditd stream buffer. This patch allows to set up a number of consumers with the new `stream_buffer_consumers` setting in Auditd. By default it will use as many consumers as GOMAXPROCS, with a maximum of 4.
Can we get this cherry picked to 6.x as well? |
Added documentation for the `backpressure_strategy` option on the auditd module.
Added documentation for the `backpressure_strategy` option on the auditd module.
Noting for posterity: This was released with auditbeat 6.4.0. |
Just curious. Although this was closed, I wonder what the best approach is for systems that don't support audit_backlog_wait_time (basically all RHEL/CentOS 7 versions). Is dropping events in userspace the recommended approach? Also, considering how widely used RHEL/CentOS 7 are, wouldn't it be preferable if some in-memory or on-disk temporary cache was added as an option to handle this scenario? |
Back-pressure from Auditbeat is propagated to the kernel via the unicast netlink socket buffer and can cause delays in the kernel. The propagation of back-pressure was implemented with the assumption that the kernel drops messages when the backlog queue is full. This assumption is true, but it has an unwanted side-effect. When the backlog queue is full, the kernel will wait for the queue to "drain a little" before providing a buffer to the waiting auditable syscall. If the queue doesn't free up the kernel will log a warning and continue with the syscall.
The waiting period is defined by the
audit_backlog_wait_time
variable. Prior to v3.14 the variable was not configurable. Then in v3.14 a commit was made to make this configurable through the audit system.We need to make two changes for Auditbeat:
For confirmed bugs, please report:
auditd
module in unicast mode.audit_backlog_limit
to be exceeded. (Messages will start showing up in the kernel log with "audit: backlog limit exceeded". The message is rate limited.)audit_backlog_wait_time
period.Workarounds:
auditd
package installed then you can manually set theaudit_backlog_wait_time
to 0 withsudo auditctl --backlog_wait_time 0
.The text was updated successfully, but these errors were encountered: