Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specify multiple search clients for easier benchmarking #614

Merged

Conversation

finnroblin
Copy link
Contributor

Description

Finding the maximum search throughput for an OpenSearch cluster is an important benchmarking scenario in vector search. Currently the only way to figure out the maximum search throughput is to perform multiple benchmark runs with different search client settings (e.g. search_clients = 3, search_clients = 5, ...). Waiting for a run to conclude, changing the config, and rerunning OSB with the associated startup time is tedious. Ideally the maximum throughput could be found automatically.

This PR allows users to provide a clients_list in a operation which runs the operation with each client setting.

For instance, a user might specify "clients_list": [1, 5] in their parameters.
Then OSB will schedule search tasks with 1, 5, 10, and 12 clients. The final benchmark results will look something like the following:

|                                                 Min Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                                Mean Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                              Median Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                                 Max Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                        50th percentile latency | search-only_5_clients |     1.34731 |     ms |
|                                        90th percentile latency | search-only_5_clients |     1.85056 |     ms |
|                                        99th percentile latency | search-only_5_clients |     6.70892 |     ms |
|                                       100th percentile latency | search-only_5_clients |     6.79992 |     ms |
|                                   50th percentile service time | search-only_5_clients |     1.34731 |     ms |
|                                   90th percentile service time | search-only_5_clients |     1.85056 |     ms |
|                                   99th percentile service time | search-only_5_clients |     6.70892 |     ms |
|                                  100th percentile service time | search-only_5_clients |     6.79992 |     ms |
|                                                     error rate | search-only_5_clients |           0 |      % |
|                                  Num clients to max throughput |                       |           5 |        |
|                                                  Mean recall@k | search-only_1_clients |        0.92 |        |
|                                                  Mean recall@1 | search-only_1_clients |        0.99 |        |
|                                                  Mean recall@k | search-only_5_clients |        0.92 |        |
|                                                  Mean recall@1 | search-only_5_clients |        0.99 |        |

Issues Resolved

Closes #613

Testing

  • New functionality includes testing

Unit tested loader.py changes + verified with multiple OSB runs that result publisher output looks good.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

maybe_match_task_is_part_of_throughput_testing = re.search(throughput_pattern, task)
if maybe_match_task_is_part_of_throughput_testing:

# assumption: all units are the same and only maximizing throughput over one operation (i.e. not both ingest and search).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting emphasis on this comment. Right now search clients is the main use case for vector search (users want to see search throughput from inference perspective). I think adding support for multiple clients in any operation can be deferred to a later PR. Any thoughts?

for client in op["clients_list"]:
op["clients"] = client

new_name = name + "_" + str(client) + "_clients"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on testing, this renames the entire procedure, not just the operation in the schedule. To see this run or look at the test_parse_clients_list unit test below -- the task is renamed "default-test_procedure_1_clients",""default-test_procedure_2_clients", etc instead of search--one-client_1_clients, search-one-client_2_clients, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the unittest below, it looks like this is updating only the operation names within the schedule of test-procedure default-test_procedure since schedule[N].name is referencing an operation and not a test-procedure.

If it is suppose to be altering the test-procedure name, there needs to be an assert in the unittest showing that test_procedure.name is no longer default-test_procedure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right.

It seems like there's a name that's associated with each element of the schedule (the task name?). This is not necessarily the same name as the operation of that element in the schedule but it is the name that's changed.
I used a debugger and the workload object's values are copied below. For instance: [name = ['default-test-procedure_1_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}].

[name = ['default-test-procedure_1_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [1], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['default-test-procedure_2_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [2], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['default-test-procedure_3_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [3], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['search-two-clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [2], completes_parent = [False], schedule = [None], params = [{'name': 'search-two-clients', 'operation': 'search', 'clients': 2}], nested = [False]]

for client in op["clients_list"]:
op["clients"] = client

new_name = name + "_" + str(client) + "_clients"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this is dependent on how the workload was written, we should standardize on either using - or _ and not both in an operation's and test procedure's name. Many users have encountered frustrating errors all because the name had a mix of - and _.

Suggestion: Before renaming, check if the operation name is using - or _.

  1. If it is using -, use "-" + str(client) + "-clients".
  2. If it is using _, use "_" + str(client) + "_clients".
  3. Else, if it's already using both, OSB can by default append "_" + str(client) + "_clients" to the end of the existing operation name (like what you are currently doing). I would also log a warning to the screen that the user's test_procedure has a mix of - and _, which might cause frustrating bugs in the future.

}
],
"test_procedure": {
"name": "default-test_procedure",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I would standardize on using hyphens or underscores here

@IanHoang
Copy link
Collaborator

For instance, a user might specify "clients_list": [1, 5] in their parameters.
Then OSB will schedule search tasks with 1, 5, 10, and 12 clients.

Why would the OSB schedule search tasks with 10 and 12 clients if the parameters only specified 1 and 5?

# assumption: all units are the same and only maximizing throughput over one operation (i.e. not both ingest and search).
# To maximize throughput over multiple operations, would need a list/dictionary of maximum throughputs.
task_throughput = record["throughput"][Throughput.MEAN.value]
logger = logging.getLogger(__name__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this to be under the __init__ method of SummaryResultsPublisher for when we would like to add logging elsewhere in the class.

@finnroblin finnroblin changed the title Multiple search clients for automatic scaling Specify multiple search clients for easier benchmarking Aug 20, 2024
metrics_table.extend(self._publish_error_rate(record, task))
self.add_warnings(warnings, record, task)
maybe_match_task_is_part_of_throughput_testing = re.search(throughput_pattern, task)
if maybe_match_task_is_part_of_throughput_testing:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we simplify this to is_task_part_of_throughput_testing?


else:
self.publish_operational_statistics(metrics_table=metrics_table, warnings=warnings, record=record, task=task)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would be nice if there was another comment above this if statement to stating that the following blurb is related to throughput testing / when specifying multiple clients within the operation

@@ -217,6 +251,10 @@ def _publish_recall(self, values, task):
self._line("Mean recall@1", task, recall_1_mean, "", lambda v: "%.2f" % v)
)

def _publish_best_client_settings(self, record, task):
num_clients = re.search(r"_(\d+)_clients$", task).group(1)
return self._join(self._line("Num clients that achieved maximium throughput", "", num_clients, ""))
Copy link
Collaborator

@IanHoang IanHoang Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Num clients" -> "Number of clients"
If this is being surfaced to users in the output, we should refrain from using shorthand.

@@ -145,16 +162,33 @@ def publish(self):

metrics_table.extend(self._publish_transform_stats(stats))

max_throughput = -1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Would also add a comment above this to specify that the following three variables are related to clients_list parameter in test_procedures. Newcomers might not quickly understand that this is only relevant to throughput testing clients

if "parallel" in op:
task = self.parse_parallel(op["parallel"], ops, name)
if "clients_list" in op:
self.logger.info("Clients list specified, running multiple search tasks with %s clients.", op["clients_list"])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "Clients list specified, running task with multiple clients: %s" might be cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this is confusing. Technically we're running multiple tasks, each with multiple clients (or only 1 client).
I changed it to self.logger.info("Clients list specified: %s. Running multiple search tasks, each scheduled with the corresponding number of clients from the list.", op["clients_list"])

Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

Signed-off-by: Finn Roblin <[email protected]>
Copy link
Collaborator

@IanHoang IanHoang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@finnroblin LGTM Thanks for doing this!

@IanHoang IanHoang merged commit 5403700 into opensearch-project:main Sep 5, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support Multiple Search Client Settings in a Single OSB Run
2 participants