Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API to allow queries to bypass the query cache policy #16259

Closed
rendel opened this issue Jan 27, 2016 · 5 comments
Closed

API to allow queries to bypass the query cache policy #16259

rendel opened this issue Jan 27, 2016 · 5 comments
Labels
discuss :Search/Search Search-related issues that do not fall into other categories

Comments

@rendel
Copy link
Contributor

rendel commented Jan 27, 2016

We have currently a performance issue with the new query cache policy. We have queries that are quite heavy to construct and compute, even on small segments. The UsageTrackingQueryCachingPolicy (which use CacheOnLargeSegments) will always discard the caching of our queries on small segments. This leads to a significant drop of performance (5x to 10x) in our scenarios.
Another limitation of the UsageTrackingQueryCachingPolicy is that there is no easy way to indicate him that our queries are costly to build, apart from subclassing our queries with MultiTermQuery so that it is picked up by the UsageTrackingQueryCachingPolicy#isCostly.
At the moment, the only solution we have is to configure elasticsearch to switch back to the QueryCachingPolicy.ALWAYS_CACHE cache policy.

@rendel
Copy link
Contributor Author

rendel commented Jan 27, 2016

Related to #16031

@clintongormley clintongormley added discuss :Search/Search Search-related issues that do not fall into other categories :Cache labels Jan 27, 2016
@clintongormley
Copy link

@rendel i'm curious as to how you figured out that your queries are heavy to construct on small segments? That seems counterintuitive. Could you provide some examples?

@rendel
Copy link
Contributor Author

rendel commented Jan 27, 2016

Hi @clintongormley

we have developed a custom query which embeds a large number of terms to perform a semi-join between indexes (see siren-join plugin). The terms are encoded in a byte array for performance consideration, and decoded lazily at query execution time. The decoding of the terms is the heavy part. We are caching them using a cache key. The issue now is that this decoding is always done for small segments.

@jpountz
Copy link
Contributor

jpountz commented Jan 27, 2016

If a query is slow when it is not cached, I don't think the cache is to blame. It is something that users would hit anyway after a merge or a restart. I actually think not caching on small segments is very important as:

  • it does not affect performance with regular queries
  • it makes memory accounting more accurate (it is easier to account memory usage for a few large cache entries that many tiny entries)
  • it avoids cache churn due to NRT search.

While I think there are things to improve based on the feedback that was given in #16031, I don't think we should make it possible to cache on all segments.

@jpountz jpountz closed this as completed Jan 27, 2016
@rendel
Copy link
Contributor Author

rendel commented Jan 29, 2016

@jpountz I would agree that for mainstream cases - the standard Lucene queries - should not be cached on small segments and that the new caching policy is well adapted for those kind of queries. However, there exist very legitimate cases, when this policy is too restrictive. We are not asking to change the high-level api (e.g., query dsl) but just to give that option at low level for advanced users that - like us - are building on top of Elasticsearch.

Something at the java Lucene Query api level, where people creating a new custom Lucene Query can have somehow some control on the cache policy. Maybe this is something that should be implemented at a Lucene level instead of Elasticsearch ?

Without such a control, we would have to fallback to alternative options that are not very optimal:

  • tell users to activate the index.queries.cache.everything: true setting (but this means that standard queries will not benefit anymore from the cache optimisations introduced by the new caching policy)
  • add and manage a secondary query cache that will cache our custom queries (but this adds unnecessary complexity)
  • be able to change the cache implementation of elasticsearch to introduce our own (but this does not look possible at the moment - we would have to fork elasticsearch)
    What would be the other fallback options available to us ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

3 participants