Performance-Tuning enhancement proposal.

openshift · Feb 21, 2024 · e8e62c1 · e8e62c1
1 parent 9469001
commit e8e62c1
Showing 1 changed file with 274 additions and 0 deletions.
diff --git a/enhancements/cluster-logging/performance-tuning.md b/enhancements/cluster-logging/performance-tuning.md
@@ -0,0 +1,274 @@
+---
+title: performance-tuning
+authors:
+ - "@alanconway"
+reviewers:
+ - "@jcantrill"
+approvers:
+ - "@jcantrill"
+api-approvers:
+ - "@jcantrill"
+creation-date: 2023-12-15
+last-updated: 2023-12-15
+tracking-link:
+ - https://issues.redhat.com/browse/OBSDA-549
+see-also:
+replaces:
+superseded-by:
+---
+
+# Performance tuning
+
+## Summary
+
+A _performance tuning_ API to control performance, reliability and special protocol features of an output,
+without exposing the complexity of the underlying collector configuration.
+
+**Note**
+- Only vector is be supported initial, there are no current plans to back-port to fluentd.
+- Existing `output[].limits` rate limiting feature is separate from this proposal. The implementations may interact.
+
+## Motivation
+
+Performance and reliability tuning in the underlying collector configuration is complex and error prone.
+We want to expose a sufficient set of controls for realistic use-cases,
+but we don't want to expose the full configuration surface of Vector.
+
+### User Stories
+
+#### As a cluster logging administrator I want to minimize log loss.
+
+``` yaml
+outputs:
+ new - name: minimize-log-loss
+ tuning:
+ delivery: AtLeastOnce
+```
+
+#### As a cluster logging administrator I want to maximize throughput.
+
+``` yaml
+outputs:
+ - name: maximize-throughput
+ tuning:
+ delivery: AtMostOnce
+```
+
+#### As a cluster logging administrator I want to tweak features of a specific output.
+
+``` yaml
+outputs:
+ - name: detailed-tweaks
+ tuning:
+ compression: "zlib" # Enable protocol-specific compression
+ reconnect:
+ maxDelay: 1s # Wait max of 1 second between reconnect attempts
+ batch:
+ maxBytes: 50K # Send max 50K of data in each write.
+```
+
+### Goals
+
+- Accommodate users with different performance and reliability trade-offs.
+- Allow users to tune output parameters that we cannot optimize automatically.
+- Expose a simple, general purpose, collector-neutral tuning API
+- Use vector's end-to-end acknowledgements for more efficient reliability.
+
+### Non-Goals
+
+- Not a full end-to-end delivery guarantee, we are limited by reliability of source and sink.
+- Not exposing underlying vector configuration details.
+- No rate limiting - already provided by `outputs[].limit` field.
+- No plans to support fluentd.
+
+## Proposal
+
+### Workflow Description
+
+**logging administrator** is a human user responsible for setting output tuning parameters as needed.
+The logging administrator needs to tune log collection performance and reliability to suit local requirements.
+
+### API Extensions
+
+New field `ClusterLogForwarder.output[].tuning` with the following sub-fields:
+
+#### delivery: AtMostOnce|AtLeastOnce|AtLeastOncePersistent
+
+##### `AtMostOnce`
+
+Logs may be lost if the collector restarts or other faults occur.
+Logs will not be duplicated _by the collector_.
+They may be duplicated in protocol exchanges with the source or sink.
+This mode does no persistent storage and no acknowledgement book-keeping, so has the lowest CPU and disk use.
+
+Use `AtMostOnce` when:
+- It is acceptable to lose some logs due to restarts or faults (network outage, remote store failures)
+- It is important to minimize memory, CPU, disk and network costs.
+- The logging system is expected to run near capacity and is likely to be overloaded.
+
+##### `AtLeastOnce`
+
+Logs _read by the collector_ will not be lost _by the collector_ due to restarts or faults.
+Logs may still be lost
+- Before the collector reads them: the collector cannot keep up with incoming log rate, e.g. log file rotation rate.
+- After the collector sends them: if the output protocol is unreliable, or the target store can drop logs.
+- If rate limiting is enabled: logs may be dropped by the collector to enforce rate limits.
+
+Use `AtLeastOnce` when:
+- It is important to avoid log loss due to restarts or failures.
+- The collector is properly resourced (memory, CPU, disk) for the maximum log throughput at expected peak loads.
+- Log rates are not expected to exceed the collection rate in normal operation.
+
+`AtLeastOnce` uses end-to-end acknowledgements without persistent buffering if possible.
+Acknowledgements give similar reliability to persistent buffering but are more efficient.
+A persistent buffer is used if acknowledgements are not available.
+
+##### `AtLeastOncePersistent`
+
+Like `AtLeastOnce`, but forces the use of a persistent buffer, possibly in addition to acknowledgements.
+Using buffer and acknowledgements together gives better reliability than buffering alone.
+
+Use `AtLeastOncePersistent` when:
+- You have a special situation where end-to-end acknowledgements alone do not work.
+- You want to force the use of an extra-large buffer to work around large overload spikes.
+
+**Note**: Very large buffers _cannot fix long-term overload_, they can only work-around temporary spikes.
+The _long-term_ average throughput _must_ be within the collectors capacity, otherwise the system cannot "catch up".
+Once the buffer fills, logs will be lost as if the buffer wasn't there,
+and logs that are delivered will have high latency from waiting around in the buffer.
+
+**Note**: The log collector _cannot_ guarantee fully reliable end-to-end delivery.
+It has no control over the reliability or throughput of sources and sinks.
+`AtMostOnce` takes advantage of reliability features at the source and sink, but the end-to-end result is only as good as the weakest link.
+
+#### compression: string
+
+Enable compression if supported
+- Value indicates compression type (e.g. "zlib", "gzip"), valid values depend on the output type.
+- Error if output does not support compression or does not recognize the value.
+
+#### batch: object
+
+A "batch" is the content of a single "write" or "send" operation.
+Defaults are determined by the underlying sink.
+
+- `maxRecords`: Max number of records to include in a batch.
+- `maxBytes`: Max size in bytes of a batch
+
+#### reconnect: object
+
+Controls connect and reconnect attempts.
+Defaults are determined by the underlying sink.
+
+- `minDelay`: minimum delay between attempts.
+- `maxDelay`: maximum delay between attempts.
+
+`minDelay` for the first attempt, delay increases up to `maxDelay` and then repeats.
+Back-off algorithm is determined by the underlying sink.
+
+### Implementation Details
+
+#### Causes of log loss
+
+1. *Overload*: Logs are produced faster than they can be processed.
+2. *Faults*: Collector restarts, network errors, remote store problems etc.
+
+Log loss can be avoided for _temporary_ faults or overloads.
+Persistent buffers and/or acknowledgements can store or re-send lost in-memory data.
+
+_Sustained overload_ lasting long enough to exceed buffering capacity _will_ cause data loss.
+The only remedy is to ensure that the log collector and store can keep up with the rate of log production.
+
+#### Acknowledgements
+
+[End-to-end acknowledgement](https://vector.dev/docs/about/under-the-hood/architecture/end-to-end-acknowledgements) 
+means that source acknowledgements are _delayed_ until all relevant sinks have received the data.
+After restart, the source can re-send data that did not reach all sinks - the source acts as a persistent buffer.
+
+This is more efficient than duplicating the data again in a vector disk buffer,
+but only works for sources that support acknowledgement and "at-least-once" reliable delivery.
+
+Examples of acknowledgement sources:
+- Kafka: Kafka protocol has acknowledgements and at-least-once delivery.
+- HTTP: HTTP response can be used to implement at-least-once.
+ Not all HTTP clients do this, but REST clients for data streaming usually do.
+- **File**: Vector's persistent "position" file can be used like an "acknowledgement" for file sources.
+ Reading from the persisted position after restart is equivalent to at-least-once delivery.
+
+#### Delivery policy implementation
+
+`AtLeastOnce` always enables end-to-end persistence on sources and sinks that allow it.
+If a source does not allow it, then `AtLeastOnce` implements a persistent buffer using the
+same default size as Vector's default in-memory buffer.
+
+``` pseudo-code
+for each `AtLeastOnce` output:
+ set sink.acknowledgement=true for attached sink(s).
+ for each source, of each input, of each pipeline to the output:
+ if source can participate in acknowledgement:
+ set source.acknowledgement=true
+ else
+ set output.buffer=disk
+```
+
+`AtLeastOncePersistent` always enables disk buffering on the output.
+It _also_ enables end-to-end persistence on sources and sinks that allow it.
+
+Buffering and acknowledgement is more reliable than buffering alone.
+Without acknowledgement records are dropped from the buffer as soon as they are sent,
+with acknowledgement they are held until the remote acknowledges that the are safely stored.
+
+**Note**: Rate limits set by `outputs[].limit` re still enforced with `AtLeastOnce*`, even though this
+means deliberately dropping log records from the collector.
+Review the existing limit code when implementing delivery policy so that the two work together properly.
+
+### Risks and Mitigations
+
+- Added complexity to forwarder.
+- Support cost of customers abusing or misunderstanding the parameters.
+- Increased customer demand for help "sizing" the logging stack: setting resources and predicting performance.
+
+No new security risks are expected.
+
+### Drawbacks
+None.
+## Design Details
+
+### Open Questions
+None.
+
+### Test Plan
+None yet.
+
+May need improved tests to measure throughput and loss under sustained load.
+
+### Graduation Criteria
+None.
+
+### Upgrade / Downgrade Strategy
+
+Default configuration is backwards compatible.
+
+### Version Skew Strategy
+None.
+
+### Operational Aspects of API Extensions
+
+Backwards compatible extension to `ClusterLogForwarder` CR.
+
+#### Failure Modes
+
+Should make existing failure modes more predictable and reliable.
+
+#### Support Procedures
+
+None.
+
+
+## Implementation History
+
+None.
+
+## Alternatives
+
+Expose detailed vector configuration: fails our overall mission of simplified configuration.