-
Notifications
You must be signed in to change notification settings - Fork 181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add search_after to geonames and http_logs #130
Conversation
As we are optimizing "search_after" performance, we would like to measure it. This adds "search_after" operation to geonames and http_logs with a big enough value for "search_after". 1. geonames - "population" field. As most documents have 0 value for this field, search_after was added to asc sort. - "geonameid" field has unique values for each document. "search_after" was set to 5000000, as an approximate medium value. 2. http_logs - "@timestamp" field, "search_after" was set to "1998-06-10" for desc and asc sorts, which is a medium value.
Some results obtained on my laptop: http_logs
Full results here geonames
Full results here |
Also would like to ask @jimczi if using a median value for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some comments. Can you share the entire output for the challenges, it's truncated in your comments but I don't understand why the query is so costly. The fact that the throughput is too high shouldn't affect the service time too much.
Do you have an idea on how long the http_logs search_after query runs on a single node ?
"sort" : [ | ||
{"@timestamp" : "desc"} | ||
], | ||
"search_after": ["1998-06-10"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -136,6 +136,20 @@ | |||
"warmup-iterations": 200, | |||
"iterations": 100, | |||
"target-throughput": 2 | |||
}, | |||
{ | |||
"name": "desc-sort-with-after-timestamp-after-force-merge-1-seg", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be interesting to have the number before force merge too?
http_logs/challenges/default.json
Outdated
{ | ||
"name": "desc-sort-with-after-timestamp-after-force-merge-1-seg", | ||
"operation": "desc_sort_with_after_timestamp", | ||
"warmup-iterations": 200, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these warmups and iterations are too big imo. 5 to 10 should be enough to have stable results ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to use a median value |
@jimczi Thank you for the feedback, at first I have incorrectly reported Also I've chatted with @jimczi offline, and we have defined the following tasks to be tackled:
|
@jimczi I have addressed your comments, and ran benchmarking once more with updated operations. Full results here
|
@mayya-sharipova Thank you for the changes. I ran on our low-mem env and the target throughput looks okay to me. I will merge the request tomorrow and add the charts to https://elasticsearch-benchmarks.elastic.co later this week. Thanks! |
As we are optimizing "search_after" performance, we would like to measure it. This adds "search_after" operation to geonames and http_logs with a big enough value for "search_after". 1. geonames - "population" field. As most documents have 0 value for this field, search_after was added to asc sort. - "geonameid" field has unique values for each document. "search_after" was set to 5000000, as an approximate medium value. 2. http_logs - "@timestamp" field, "search_after" was set to "1998-06-10" for desc and asc sorts, which is a medium value.
As we are optimizing "search_after" performance, we would like to
measure it. This adds "search_after" operation to geonames
and http_logs with a big enough value for "search_after".
search_after was added to asc sort.
"search_after" was set to 5000000, as an approximate medium
value.
for desc and asc sorts, which is a medium value.