Added property long_eta #59

awiebe · 2018-11-07T04:14:37Z

For tasks that run hours not seconds, a total average eta generally becomes more stable and accurate as the task progresses.

A small running average does the unstable "windows file transfer" sort of estimate where the estimated time is wildly unstable, whereas a long running average starts unstable and then converges to an ok approximation.

For tasks that run hours not seconds, a total average eta generally becomes more stable and accurate as the task progresses.

Add return

praiskup · 2018-11-07T07:07:39Z

IMO, the bug in this project is that the moving average window is too short (sometimes only several ms) -- the higher the next() call frequency is, the worse results the progress bar shows. Correct moving average calculation should have some minimal time window period to give "stable" results (see e.g. https:/python-progress/python-progress where the window is 2 seconds by default, and decide whether you need the long_eta).

That said, "long_eta" calculated across the whole period (hours) would be rarely useful; the upload/download process is dynamic and the speed changes dramatically -- giving the user incorrect estimation. The useful info is "what has been the average speed in last few seconds".

awiebe · 2018-11-07T09:14:51Z

@praiskup
Actually for large batch jobs, the long running average produces much better estimation than even an extended window. As you noted the subtasks may not necessarily uniform in time, however if you take an average over a long period this smooths out.

Model the progress as a scatter plot, sampling the "remaining" at intervals t. Actually this particular method doesn't even require that the intervals be regularly spaced. What you wind up with is a trapezoid that probably overestimates most of the time, but is acceptable for a rough time of a long running job.

Right now I have a big data job, and the estimate is using this logic. It looks like this.
| | 1.6% - 9 days, 9:59:48
It's fairly stable and I know that it will take about 9 days 10 hours.

That being said there are some better schemes that can be made using the long estimate, which will converge to a result faster. I actually made some better logic that uses a weighted shift to prevent spikes from moving the estimate too much. But the logic for that was more involved, so I didn't put it in this particular PR.

For example, I'm using a 60/40 weighted split:

    def __init(self)__
        self.prev_long_avg=3600
    def long_eta(self):
        avg = self.elapsed/(self.index+1)
        prev=self.prev_long_avg
        avg= (prev*0.6) + (avg*0.4)
        self.prev_long_avg=avg
        return int(ceil(avg * self.remaining))

Note that this simple linear extrapolation is a time/memory tradeoff. Not too much time and memory should be spent calculating progress, so even though calculating a linear regression might be the way to go, the manner in which long_eta works is acceptable for long tasks and scales O(1) unlike a regression that scales O(n) in time and space. Thus this is a reasonable compromise. If this PR is accepted, I might try making a class SmartBar(Bar), which chooses an estimation method to display based on available information.

praiskup · 2018-11-07T10:36:17Z

Ah, I thought that you want fix the broken "moving average" speed calculation by taking "average" speed into account, but you have different (pretty corner) use-case in my opinion.

Unless you process something which has "uniform" speed over the whole period, you never want to take absolute "average" into account, otherwise it gives you very misleading info. And if you do, you probably don't want to use "simple moving average" calculator/project. But I can also accept that it is matter of taste.

I'm +1 for additional functionality - since I see that it is useful for you, but if progress should support both calculations now - I'd suggest you to make clear distinction between "moving" average and normal average in documentation and in property names, so it is absolutely clear what is what. Seeing "avg_long" in "simple moving average" project isn't descriptive enough.

awiebe added 4 commits November 6, 2018 20:12

Added property long eta

d9a9091

For tasks that run hours not seconds, a total average eta generally becomes more stable and accurate as the task progresses.

Add return to longg eta

627bd95

Add return

td form of long_eta

5d00ada

Fix divide by zero long average

dba52cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added property long_eta #59

Added property long_eta #59

awiebe commented Nov 7, 2018

praiskup commented Nov 7, 2018

awiebe commented Nov 7, 2018

praiskup commented Nov 7, 2018

Added property long_eta #59

Are you sure you want to change the base?

Added property long_eta #59

Conversation

awiebe commented Nov 7, 2018

praiskup commented Nov 7, 2018

awiebe commented Nov 7, 2018

praiskup commented Nov 7, 2018