Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add search_after to geonames and http_logs #130

Merged
merged 2 commits into from
Sep 22, 2020

Conversation

mayya-sharipova
Copy link
Contributor

As we are optimizing "search_after" performance, we would like to
measure it. This adds "search_after" operation to geonames
and http_logs with a big enough value for "search_after".

  1. geonames
  • "population" field. As most documents have 0 value for this field,
    search_after was added to asc sort.
  • "geonameid" field has unique values for each document.
    "search_after" was set to 5000000, as an approximate medium
    value.
  1. http_logs
  • "@timestamp" field, "search_after" was set to "1998-06-10"
    for desc and asc sorts, which is a medium value.

As we are optimizing "search_after" performance, we would like to
measure it. This adds "search_after" operation to geonames
and http_logs with a big enough value for "search_after".

1. geonames
- "population" field. As most documents have 0 value for this field,
search_after was added to asc sort.
- "geonameid" field has unique values for each document.
"search_after" was set to 5000000, as an approximate medium
value.

2. http_logs
- "@timestamp" field, "search_after" was set to "1998-06-10"
for desc and asc sorts, which is a medium value.
@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Sep 7, 2020

Some results obtained on my laptop:


http_logs

| 50th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     6.83232 |     ms |
| 90th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     7.08107 |     ms |
| 99th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     7.71589 |     ms |
|100th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     9.33675 |     ms |

| 50th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     709.761 |     ms |
| 90th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     731.649 |     ms |
| 99th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     752.732 |     ms |
|100th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     767.543 |     ms |

| 50th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     54.4112 |     ms |
| 90th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |      56.511 |     ms |
| 99th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     61.0517 |     ms |
|100th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     61.6246 |     ms | 
                                            
| 50th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     734.612 |     ms |
| 90th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     770.386 |     ms |
| 99th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     821.791 |     ms |
|100th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     834.089 |     ms |

Full results here


geonames

| 50th percentile service time |            asc_sort_population |      49.872 |     ms |
| 90th percentile service time |            asc_sort_population |     53.4872 |     ms |
| 99th percentile service time |            asc_sort_population |     58.1984 |     ms |
|100th percentile service time |            asc_sort_population |     59.2803 |     ms |

| 50th percentile service time | asc_sort_with_after_population |      72.391 |     ms |
| 90th percentile service time | asc_sort_with_after_population |     75.4028 |     ms |
| 99th percentile service time | asc_sort_with_after_population |     84.9475 |     ms |
|100th percentile service time | asc_sort_with_after_population |     85.8957 |     ms |

| 50th percentile service time |             asc_sort_geonameid |     4.17862 |     ms |
| 90th percentile service time |             asc_sort_geonameid |     4.27179 |     ms |
| 99th percentile service time |             asc_sort_geonameid |     4.34915 |     ms |
|100th percentile service time |             asc_sort_geonameid |     4.41063 |     ms |

| 50th percentile service time |  asc_sort_with_after_geonameid |     66.9177 |     ms |
| 90th percentile service time |  asc_sort_with_after_geonameid |     70.9752 |     ms |
| 99th percentile service time |  asc_sort_with_after_geonameid |     73.6624 |     ms |
|100th percentile service time |  asc_sort_with_after_geonameid |     80.3342 |     ms |

| 50th percentile service time |            desc_sort_geonameid |     6.11204 |     ms |
| 90th percentile service time |            desc_sort_geonameid |     6.86492 |     ms |
| 99th percentile service time |            desc_sort_geonameid |     7.03959 |     ms |
|100th percentile service time |            desc_sort_geonameid |     7.09217 |     ms |

| 50th percentile service time | desc_sort_with_after_geonameid |     41.6359 |     ms |
| 90th percentile service time | desc_sort_with_after_geonameid |     43.7668 |     ms |
| 99th percentile service time | desc_sort_with_after_geonameid |     46.2971 |     ms |
|100th percentile service time | desc_sort_with_after_geonameid |     47.1171 |     ms |

Full results here

@ebadyano ebadyano self-assigned this Sep 8, 2020
@mayya-sharipova
Copy link
Contributor Author

Also would like to ask @jimczi if using a median value for search_after looks good to him?

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. Can you share the entire output for the challenges, it's truncated in your comments but I don't understand why the query is so costly. The fact that the throughput is too high shouldn't affect the service time too much.
Do you have an idea on how long the http_logs search_after query runs on a single node ?

geonames/operations/default.json Show resolved Hide resolved
geonames/operations/default.json Show resolved Hide resolved
"sort" : [
{"@timestamp" : "desc"}
],
"search_after": ["1998-06-10"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

http_logs/operations/default.json Show resolved Hide resolved
@@ -136,6 +136,20 @@
"warmup-iterations": 200,
"iterations": 100,
"target-throughput": 2
},
{
"name": "desc-sort-with-after-timestamp-after-force-merge-1-seg",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be interesting to have the number before force merge too?

{
"name": "desc-sort-with-after-timestamp-after-force-merge-1-seg",
"operation": "desc_sort_with_after_timestamp",
"warmup-iterations": 200,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these warmups and iterations are too big imo. 5 to 10 should be enough to have stable results ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addressed in b9f095f. But also wanted to confirm with @ebadyano that we are ok with 10 warmup-iterations

http_logs/challenges/default.json Outdated Show resolved Hide resolved
@jimczi
Copy link
Contributor

jimczi commented Sep 10, 2020

f using a median value for search_after looks good to him?

+1 to use a median value

@mayya-sharipova
Copy link
Contributor Author

@jimczi Thank you for the feedback, at first I have incorrectly reported latency instead of service_time, and latency also included wait time; that's why the numbers were so big. I have updated the comment to report service_time and the numbers are not so atrocious now.

Also I've chatted with @jimczi offline, and we have defined the following tasks to be tackled:

  1. Determine the right target-throughput for http_logs sort with search_after. Running a manual test and checking its performance can help here. Also help from the performance team is appreciated.
  2. Benchmark desc and asc sort with search_after before merging to a single segment.

@mayya-sharipova
Copy link
Contributor Author

mayya-sharipova commented Sep 14, 2020

@jimczi I have addressed your comments, and ran benchmarking once more with updated operations.
Below are the summary of results.

Full results here

|                       Metric |                                                   Task |       Value |   Unit |
| ----------------------------:|-------------------------------------------------------:|------------:|-------:|
| 50th percentile service time |                                    desc_sort_timestamp |     10.7055 |     ms |
| 90th percentile service time |                                    desc_sort_timestamp |     11.2209 |     ms |
| 99th percentile service time |                                    desc_sort_timestamp |     13.4729 |     ms |
|100th percentile service time |                                    desc_sort_timestamp |      18.994 |     ms |
|
| 50th percentile service time |                         desc_sort_with_after_timestamp |     813.787 |     ms |
| 90th percentile service time |                         desc_sort_with_after_timestamp |     921.016 |     ms |
| 99th percentile service time |                         desc_sort_with_after_timestamp |     1021.73 |     ms |
|100th percentile service time |                         desc_sort_with_after_timestamp |     1144.55 |     ms |    
|                                             
| 50th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     53.8716 |     ms |
| 90th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     56.3902 |     ms |
| 99th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     59.7357 |     ms |
|100th percentile service time |            desc-sort-timestamp-after-force-merge-1-seg |     59.9086 |     ms |
|                                       
| 50th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     811.308 |     ms |
| 90th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     893.403 |     ms |
| 99th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     950.185 |     ms |
|100th percentile service time | desc-sort-with-after-timestamp-after-force-merge-1-seg |     958.059 |     ms |
|                                     
|
| 50th percentile service time |                                     asc_sort_timestamp |     8.10474 |     ms |
| 90th percentile service time |                                     asc_sort_timestamp |     12.1846 |     ms |
| 99th percentile service time |                                     asc_sort_timestamp |     13.4501 |     ms |
|100th percentile service time |                                     asc_sort_timestamp |     14.9856 |     ms |                
|                                                   
| 50th percentile service time |                          asc_sort_with_after_timestamp |     754.215 |     ms |
| 90th percentile service time |                          asc_sort_with_after_timestamp |      811.58 |     ms |
| 99th percentile service time |                          asc_sort_with_after_timestamp |     883.414 |     ms |
|100th percentile service time |                          asc_sort_with_after_timestamp |     908.114 |     ms |
|                                                    
| 50th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     3.69549 |     ms |
| 90th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     3.86637 |     ms |
| 99th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     4.08493 |     ms |
|100th percentile service time |             asc-sort-timestamp-after-force-merge-1-seg |     4.42581 |     ms |

| 50th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |      778.27 |     ms |
| 90th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     872.711 |     ms |
| 99th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     905.239 |     ms |
|100th percentile service time |  asc-sort-with-after-timestamp-after-force-merge-1-seg |     920.384 |     ms |

@ebadyano
Copy link
Contributor

@mayya-sharipova Thank you for the changes. I ran on our low-mem env and the target throughput looks okay to me. I will merge the request tomorrow and add the charts to https://elasticsearch-benchmarks.elastic.co later this week. Thanks!

@ebadyano ebadyano merged commit 6bb91d4 into elastic:master Sep 22, 2020
@mayya-sharipova mayya-sharipova deleted the search-after branch September 24, 2020 14:28
dliappis pushed a commit that referenced this pull request Nov 25, 2020
As we are optimizing "search_after" performance, we would like to
measure it. This adds "search_after" operation to geonames
and http_logs with a big enough value for "search_after".

1. geonames
- "population" field. As most documents have 0 value for this field,
search_after was added to asc sort.
- "geonameid" field has unique values for each document.
"search_after" was set to 5000000, as an approximate medium
value.

2. http_logs
- "@timestamp" field, "search_after" was set to "1998-06-10"
for desc and asc sorts, which is a medium value.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants