-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about less accurate but more efficient percentile calculation? #513
Comments
Sounds interesting. We've talking about this for a while. I'll read the paper and try to get something out there soon for you to try. |
Did you had any time to take a look at this, @jvshahid? Performance of percentile calculation is starting to give me a little headache. |
Just FYI I've read the t-DIGESTS paper during my vacation multiple times and I think it's a great fit for InfluxDB. I will have to take a closer look at this algorithm for a project (most probably in Erlang), but I'm happy to share my remarks regarding a possible implementation in Go. Why I think t-DIGESTS is a good fit:
|
@tisba that would be great |
I am looking through the right place in the code to implement it. |
As mentioned in my post to the mailing list we are experimenting with simplifying our open GitHub Issues. This feature request has been rolled into an aggregate issue for all function requests, so that we can close this issue until we are ready to work on it. You may continue to make comments here. Closing the issue does not mean we are rejecting this idea. |
I was wondering, what your opinion is about offering a less accurate, but way more efficient method for percentile calculation.
AFAIK influx currently calculates percentiles in a naive way. There are more efficient (and way faster approaches) to do this (t-DIGESTS used in Elasticsearch e.g.). If you have many events, distributed over your cluster, then it might be nice to offer the user a way to trace accuracy for speed.
I have to deal with millions of events that I'd like to run percentile (actually I calculate entire percentile ranges) calculations over. The performance is okay-isch right now, but not very good for an interactive use case.
The text was updated successfully, but these errors were encountered: