-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce spooling to disk #6581
Merged
Merged
Commits on Apr 5, 2018
-
Add disk based spool to libbeat
This change implements the `queue.Queue` interface, adding spooling to disk functionality to all beats. The queue interface requires all queues to provide 'connections' for Producers and Consumers The interface also demands all ACKs to be executed asynchronously Once an event is ACKed, it must not be presented to a Consumer (the outputs worker queue) anymore. The new queue type is marked as Beta-feature. In the spool, the events are stored in a dynamic ring buffer, in one single file only. The maximum file size must be configured at startup (default 100MiB). The file layer needs to use available file space for event data and additional meta data. Events are first written into a write buffer, which is flushed once it is full or if the flush timeout is triggered. No limit on event sizes is enforced. An event being bigger then the write buffer will still be accepted by the spool, but will trigger a flush right away. A successful flush also requires 2 fsync operations (1. for data 2. update file header). The spool blocks all producers, once it can not allocate any more space within the spool file. Writing never grows the file past the configured maximum file size. All producers are handled by the `inBroker`. There is no direct communication between producers and consumers. All required signaling and synchronisation is provided by the file layer. Once the write buffer is flushed, a signal is returned to each individual Producer, notifying the producers of events being published. This signal is used by filebeat/winlogbeat to update the registry file. Consumers are handled by the `outBroker`. Consumers request a batch of events. The broker reads up to N message from the file and forwards these. The reading process is 'readonly' and does not update the on-disk read pointers (only in-memory read pointers are updated). The file is memory mapped for reading. This will increases the process it's reported memory usage significantly. The outputs asynchronously ACK batches. The ACK signals are processed by the brokers `ackLoop`. Due to load-balancing or retries, ACKs can be received out of order. The broker guarantees, ACKs are sorted in the same order events have been read from the queue. Once a continuous set of events (starting from the last on-disk read pointer) is ACKed, the on-disk read pointer is update and space occupied by ACKed events is freed. As free space is tracked by the file, the file meta-data must be updated. If no more space for file meta data updates is available, there is a chance of the file potentially growing a few pages past max-size. Growing is required to guarantee progress (otherwise the spool might be stalled forever). In case the file did grow on ACK, the file layer will try to free the space with later write/ACK operations, potentially truncating the file again. The file layer is provided by [go-txfile](github.com/elastic/go-txfile). The file is split into pages of equal size. The layer provides transactional access to pages only. All writes (ACK, flushing the write buffer) are handled concurrently, using write transaction. The reader is isolated from concurrent writes/reads, using a read transaction. The last transaction state is never overwritten in a file. If a beat crashes during a write transaction, the most recent committed transaction is still available, so the beat can continue from the last known state upon restart. No additional repair-phase is required. Known limitations (To be addressed in future PRs): - File maximum size can not be changed once file is generated: - Add support to grow max size - Add support to shrink max size. Shrinking could be made dynamic, trying to return space once pages at end of file are freed - Add command to transfer the queue into a new queue file. In case fragmentation prevents dynamic shrinking or user doesn't want to wait. - Monitoring metrics do not report already available events upon beats restart (requires some more changes to the queue interface itself). - No file related metrics available yet. - If file is too small and write buffer to big, queue can become stalled. Potential solution requires (all): - Limit maximum event size - Generously preallocate meta-area (reserve pages for meta-data only on file creation) - Ensure usable data area is always > 2*write buffer -> partial write buffer flush? - Startup check validating combination of max_size, data vs. meta area size, event size, write buffer size
urso committedApr 5, 2018 Configuration menu - View commit details
-
Copy full SHA for 1d3eac6 - Browse repository at this point
Copy the full SHA 1d3eac6View commit details -
Do not init pipeline event semaphore with values <= 0
urso committedApr 5, 2018 Configuration menu - View commit details
-
Copy full SHA for 3ed5ab5 - Browse repository at this point
Copy the full SHA 3ed5ab5View commit details -
Add spool to pipeline stress tests
urso committedApr 5, 2018 Configuration menu - View commit details
-
Copy full SHA for bad5e97 - Browse repository at this point
Copy the full SHA bad5e97View commit details -
- Improve stability by increasing test duration, timeouts and watchdog timer - Add test start/stop messages if run with `-v`. Help with travis timing out tests with 10min without any output - Add all active go-routine stack traces to errors
urso committedApr 5, 2018 Configuration menu - View commit details
-
Copy full SHA for 55e6483 - Browse repository at this point
Copy the full SHA 55e6483View commit details
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.