Allow the aggregator to consistently accept past data #958

bucko909 · 2024-08-21T16:04:15Z

The discovery of #956 was due to replaying a huge swathe of past data into our aggregator instance and finding the data full of holes. I'd assumed it was due to early-expiry of old IntervalBuffer objects, but after writing this fix I found this wasn't the case. However, the behaviour is still likely incorrect when data is played over an aggregation interval. Suppose the aggregator config is:

a (60) = sum a

And the input is:

a 0 1
...
a 1 19
<aggregation timeout triggers here at time (MAX_AGGREGATION_INTERVALS + 1) * 60 or higher>
a 1 20
...
a 1 59

Then the aggregation will aggregate seconds 0-29 and output this (value 20) to the cache, then expire the interval as it's too old, then in the next timeout, aggregate seconds 31-60, output this to the cache and expire the interval again. The final value is the aggregate of seconds 20-59, which is 40 instead of the expected 60. There is no way to control when aggregation occurs, so replayed data will be inconsistent, and in this case could be any value between 1 and 60, though if the replay is faster than real-time, the most likely will be 60.

After these changes, we allow the system to keep the newest interval prior to the current time-frame, provided it's been submitted within the current time-frame. This means if old data is submitted in-order, as would be expected from a replay, the intervals will always be aggregated in a complete state at least once before expiry. This is somewhat robust to curiosities in the timestamps of the real-time submitting process (for example, if that process happens to round time down, or be pushing a prediction of future data or similar), as the number of kept intervals is based on submission time rather than interval time.

``` == warmup == 1 2.564000169513747e-06 1000 0.0003257179996580817 10000 0.003369195001141634 100000 0.03210452299936151 1000000 0.31846129199948336 10000000 3.215345189999425 == noop == 1 1.984999471460469e-06 1000 0.00032863300111785065 10000 0.0032671120006853016 100000 0.03219444500064128 1000000 0.32194886400066025 10000000 3.2092841119992954 == sum == 1 1.1226000424358062e-05 1000 0.000391326000681147 10000 0.0033218999997188803 100000 0.031986795000193524 1000000 0.32090956100000767 10000000 3.2212803769998573 == fake == 1 4.441999408300035e-06 1000 0.0003171700009261258 10000 0.0033069499986595474 100000 0.03180618599981244 1000000 0.3161616289999074 10000000 3.176653272999829 ```

bucko909 · 2024-08-21T16:55:10Z

Push just reset to preferred email.

deniszh · 2024-08-24T08:02:46Z

@bucko909 : linter says that current_interval is not in use, could you please check?
Ignore Deepsource, it should be removed

``` == warmup == 1 3.7179997889325023e-06 1000 0.00032810999982757494 10000 0.003592125000068336 100000 0.03169123900079285 1000000 0.31713802899867005 10000000 3.157578217000264 == noop == 1 2.0410006982274354e-06 1000 0.00042683100036811084 10000 0.003553029999238788 100000 0.031913518001601915 1000000 0.31525603199952457 10000000 3.1432045359997574 == sum == 1 9.91100023384206e-06 1000 0.00038492900057462975 10000 0.003426478999244864 100000 0.031299194999519386 1000000 0.3173040299989225 10000000 3.1426827399991453 == fake == 1 4.47000093117822e-06 1000 0.00035055999978794716 10000 0.0035677849991770927 100000 0.031690031000835006 1000000 0.31694292799875257 10000000 3.1394117309992 ``` No appreciable difference in benchmark output. If anything, it's faster.

bucko909 · 2024-08-29T15:20:10Z

Yep, I think it was just a copy/paste error. Apologies for that.

codecov-commenter · 2024-08-29T15:20:12Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 50.68%. Comparing base (fdc56f6) to head (7ea6ce9).
Report is 17 commits behind head on master.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #958      +/-   ##
==========================================
+ Coverage   50.63%   50.68%   +0.04%     
==========================================
  Files          36       36              
  Lines        3446     3453       +7     
  Branches      535      527       -8     
==========================================
+ Hits         1745     1750       +5     
- Misses       1574     1576       +2     
  Partials      127      127

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bucko909 force-pushed the aggregator-replay-past branch from 1618009 to 21be84d Compare August 21, 2024 16:54

bucko909 force-pushed the aggregator-replay-past branch from 21be84d to 7ea6ce9 Compare August 29, 2024 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow the aggregator to consistently accept past data #958

Allow the aggregator to consistently accept past data #958

bucko909 commented Aug 21, 2024 •

edited

Loading

bucko909 commented Aug 21, 2024

deniszh commented Aug 24, 2024

bucko909 commented Aug 29, 2024

codecov-commenter commented Aug 29, 2024

Allow the aggregator to consistently accept past data #958

Are you sure you want to change the base?

Allow the aggregator to consistently accept past data #958

Conversation

bucko909 commented Aug 21, 2024 • edited Loading

bucko909 commented Aug 21, 2024

deniszh commented Aug 24, 2024

bucko909 commented Aug 29, 2024

codecov-commenter commented Aug 29, 2024

Codecov Report

bucko909 commented Aug 21, 2024 •

edited

Loading