You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently the lookback value defaults to 0. The problem with this is that a lookback of 0 means that the dataset will never have a “complete” batch. This is because when microbatch is run, the current time is used for the latest batch. We do that because we’re favoring freshness over ensuring “only complete batches”. Unfortunately, this combined with lookback=0 makes it such that no batch is ever "complete".
For an example, consider a microbatch model with a batch_size of day, and it's run at noon everyday (12:00:00). If our lookback is 0 then when the microbatch model is run today it’ll get data from today 00:00:00 to 12:00:00. Then tomorrow when my microbatch model it’ll get data for tomorrow 00:00:00 to 12:00:00, but it won’t go back and get the rest of “today’s” data (because the lookback is 0).
Thus the “default” valid behavior should be a lookback of 1. This ensures that batches are complete. The only caveats being when there is regularly late arriving data for which one can set a larger lookback value, or when there is one off late arriving data using —event-time-start + —event-time-end to backfill the specific range.
The text was updated successfully, but these errors were encountered:
We should switch the default for
lookback
to1
.Currently the lookback value defaults to
0
. The problem with this is that a lookback of0
means that the dataset will never have a “complete” batch. This is because when microbatch is run, the current time is used for the latest batch. We do that because we’re favoring freshness over ensuring “only complete batches”. Unfortunately, this combined withlookback=0
makes it such that no batch is ever "complete".For an example, consider a microbatch model with a
batch_size
ofday
, and it's run at noon everyday (12:00:00). If ourlookback
is0
then when the microbatch model is run today it’ll get data from today 00:00:00 to 12:00:00. Then tomorrow when my microbatch model it’ll get data for tomorrow 00:00:00 to 12:00:00, but it won’t go back and get the rest of “today’s” data (because the lookback is0
).Thus the “default” valid behavior should be a
lookback
of1
. This ensures that batches are complete. The only caveats being when there is regularly late arriving data for which one can set a largerlookback
value, or when there is one off late arriving data using—event-time-start
+—event-time-end
to backfill the specific range.The text was updated successfully, but these errors were encountered: