Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Date histograms can be very slow due to time_zone #18853

Closed
timroes opened this issue May 6, 2018 · 2 comments
Closed

Date histograms can be very slow due to time_zone #18853

timroes opened this issue May 6, 2018 · 2 comments
Labels
Feature:Visualizations Generic visualization features (in case no more specific feature label is available) Meta performance

Comments

@timroes
Copy link
Contributor

timroes commented May 6, 2018

What's the issue?

We often see issues popping up, that the date histogram aggregations are too slow. Most of these times the cause is the used time zone. If you use any timezone that implementing daylight saving times (DST), e.g. by explicitly setting Kibana to such a timezone (like Europe/Berlin) or using the default timezone setting, which will autodetect the timezone from the browser and usually end up with a timezone that implements DST.

The reason why this makes the aggregation way slower is that every timestamp aggregated in Elasticsearch now needs to be calculated against it's timezone, and the actual offset might be different from document to document, since some could fall into DST and some could be outside of DST.

Just for clarification: this performance issue happens in Elasticsearch, when aggregating the documents, not in Kibana itself.

What's the common workaround?

A common workaround is to switch Kibana to a fixed offset timezone (like Etc/GMT-2) meaning any timezone that doesn't implement DST. That way the calculation in Elasticsearch will be faster - depending on your amount of documents that might make a noticeable performance difference.

There are a couple of issues with that approach (also mentioned in this comment):

  • It will force all users of Kibana into the same timezone, even if you might not want that.
  • It will require you to manually switch over timezone when your DST begin or ends.
  • It will show wrong times for documents outside your current DST setting (see next paragraph).

What we can't do

There is one naive solution to that: automatically replace the users timezone by a fixed offset timezone before querying Elasticsearch, e.g. if the users browser has to be detected to be in Europe/Berlin, replace that timezone by Etc/GMT-2 or Etc/GMT-1 depending on whether the user is currently in DST or not. That would indeed improve performance of all requests.

Unfortunately that solution would still trigger the third issue in the above list and even worse: make this implicit and hide it from the user. Let's look at an detailed example:

The date is March 26th, 2018 (Monday). A security engineer in Berlin, Germany - let's call him Hans - is auditing some login logs from the early day and the past week. Everything looks good for today, but there are some strange findings in last weeks logins. To further check those findings Hans compares them to actual working time of the correlating employees. Unfortunately that's the point where this implicit system would be very dangerous, since all dates from past weeks are actually now off by one hour from when they "actually" happened, since DST began on March 25th, 2018 in Europe/Berlin.

For Hans' sake and not to hide time shifting of some times in your data but not others, this is not a viable solution for Kibana at the moment. Of course the same issue happens with the workaround, but at least in that case, the user explicitly chose the specified fixed offset timezone.

What we can do

User specific timezones

One way to solve the first issue of that workaround (forcing all users into the same timezone) could be to allow user specific timezones, e.g. via user specific setting or via allowing the timezone to be changed in the time picker.

That way you could switch to an fixed offset time zone and still every user would be able to use their own appropriate fixed offset timezone.

See #18852

Optimizing timezones when within the same DST period

Update: The following behavior has been introduced in Elasticsearch since 6.4.0.

Another possible solution to improve performance, but still to produce valid output: Detect whether the date range filter when sending a date histogram lies both within one DST period, meaning I am not viewing data that crosses a DST switch. If that would be the case, we could use the offset that timezone had during that time as a fixed offset to the date histogram aggregation. This solution would improve performance, if you are looking at data from within one DST period, and would still show valid data (but with the usual decreased performance) when looking at a period, that had a DST switch in it.

I think that optimization should rather be done in Elasticsearch, than in Kibana, since that way all date histogram aggregations would benefit from that performance improvement. Also it would prevent issues in case Kibana should ever have different DST periods in their timezones than Elasticsearch - which hopefully should never happen.

That's why I commented that suggestion to elastic/elasticsearch#28727 which tracks the date histogram timezone performance issue in Elasticsearch.

@jpountz
Copy link

jpountz commented May 11, 2018

FYI we are trying to mitigate the issue on the Elasticsearch side via elastic/elasticsearch#30534.

@timroes
Copy link
Contributor Author

timroes commented Jun 5, 2018

For reference here: Elasticsearch will optimize timezone date histogram queries within one DST period, as of elastic/elasticsearch#30534 from Elasticsearch 6.4.0 on. This should improve the performance for the very most usecases and users already significantly.

The outstanding suggestion of allowing user specific time zones has it's own ticket, thus I will be closing this meta issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Visualizations Generic visualization features (in case no more specific feature label is available) Meta performance
Projects
None yet
Development

No branches or pull requests

2 participants