Allow the aggregator to consistently accept past data #958
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The discovery of #956 was due to replaying a huge swathe of past data into our aggregator instance and finding the data full of holes. I'd assumed it was due to early-expiry of old
IntervalBuffer
objects, but after writing this fix I found this wasn't the case. However, the behaviour is still likely incorrect when data is played over an aggregation interval. Suppose the aggregator config is:And the input is:
Then the aggregation will aggregate seconds 0-29 and output this (value 20) to the cache, then expire the interval as it's too old, then in the next timeout, aggregate seconds 31-60, output this to the cache and expire the interval again. The final value is the aggregate of seconds 20-59, which is 40 instead of the expected 60. There is no way to control when aggregation occurs, so replayed data will be inconsistent, and in this case could be any value between 1 and 60, though if the replay is faster than real-time, the most likely will be 60.
After these changes, we allow the system to keep the newest interval prior to the current time-frame, provided it's been submitted within the current time-frame. This means if old data is submitted in-order, as would be expected from a replay, the intervals will always be aggregated in a complete state at least once before expiry. This is somewhat robust to curiosities in the timestamps of the real-time submitting process (for example, if that process happens to round time down, or be pushing a prediction of future data or similar), as the number of kept intervals is based on submission time rather than interval time.