Specify multiple search clients for easier benchmarking #614

finnroblin · 2024-08-07T19:46:02Z

Description

Finding the maximum search throughput for an OpenSearch cluster is an important benchmarking scenario in vector search. Currently the only way to figure out the maximum search throughput is to perform multiple benchmark runs with different search client settings (e.g. search_clients = 3, search_clients = 5, ...). Waiting for a run to conclude, changing the config, and rerunning OSB with the associated startup time is tedious. Ideally the maximum throughput could be found automatically.

This PR allows users to provide a clients_list in a operation which runs the operation with each client setting.

For instance, a user might specify "clients_list": [1, 5] in their parameters.
Then OSB will schedule search tasks with 1, 5, 10, and 12 clients. The final benchmark results will look something like the following:

|                                                 Min Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                                Mean Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                              Median Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                                 Max Throughput | search-only_5_clients |     1637.47 |  ops/s |
|                                        50th percentile latency | search-only_5_clients |     1.34731 |     ms |
|                                        90th percentile latency | search-only_5_clients |     1.85056 |     ms |
|                                        99th percentile latency | search-only_5_clients |     6.70892 |     ms |
|                                       100th percentile latency | search-only_5_clients |     6.79992 |     ms |
|                                   50th percentile service time | search-only_5_clients |     1.34731 |     ms |
|                                   90th percentile service time | search-only_5_clients |     1.85056 |     ms |
|                                   99th percentile service time | search-only_5_clients |     6.70892 |     ms |
|                                  100th percentile service time | search-only_5_clients |     6.79992 |     ms |
|                                                     error rate | search-only_5_clients |           0 |      % |
|                                  Num clients to max throughput |                       |           5 |        |
|                                                  Mean recall@k | search-only_1_clients |        0.92 |        |
|                                                  Mean recall@1 | search-only_1_clients |        0.99 |        |
|                                                  Mean recall@k | search-only_5_clients |        0.92 |        |
|                                                  Mean recall@1 | search-only_5_clients |        0.99 |        |

Issues Resolved

Closes #613

Testing

New functionality includes testing

Unit tested loader.py changes + verified with multiple OSB runs that result publisher output looks good.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Finn Roblin <[email protected]>

finnroblin · 2024-08-07T19:48:13Z

osbenchmark/results_publisher.py

+ maybe_match_task_is_part_of_throughput_testing = re.search(throughput_pattern, task)
+ if maybe_match_task_is_part_of_throughput_testing:
+
+ # assumption: all units are the same and only maximizing throughput over one operation (i.e. not both ingest and search).


Putting emphasis on this comment. Right now search clients is the main use case for vector search (users want to see search throughput from inference perspective). I think adding support for multiple clients in any operation can be deferred to a later PR. Any thoughts?

finnroblin · 2024-08-07T19:51:10Z

osbenchmark/workload/loader.py

+ for client in op["clients_list"]:
+ op["clients"] = client
+
+ new_name = name + "_" + str(client) + "_clients"


Based on testing, this renames the entire procedure, not just the operation in the schedule. To see this run or look at the test_parse_clients_list unit test below -- the task is renamed "default-test_procedure_1_clients",""default-test_procedure_2_clients", etc instead of search--one-client_1_clients, search-one-client_2_clients, etc.

Based on the unittest below, it looks like this is updating only the operation names within the schedule of test-procedure default-test_procedure since schedule[N].name is referencing an operation and not a test-procedure.

If it is suppose to be altering the test-procedure name, there needs to be an assert in the unittest showing that test_procedure.name is no longer default-test_procedure.

You're right.

It seems like there's a name that's associated with each element of the schedule (the task name?). This is not necessarily the same name as the operation of that element in the schedule but it is the name that's changed.
I used a debugger and the workload object's values are copied below. For instance: [name = ['default-test-procedure_1_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}].

[name = ['default-test-procedure_1_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [1], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['default-test-procedure_2_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [2], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['default-test-procedure_3_clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [3], completes_parent = [False], schedule = [None], params = [{'name': 'search-one-client', 'operation': 'search', 'clients': 3, 'clients_list': [1, 2, 3]}], nested = [False], name = ['search-two-clients'], operation = [name = ['search'], meta_data = [{}], type = ['search'], params = [{'name': 'search', 'operation-type': 'search', 'index': '_all', 'include-in-results_publishing': True}], param_source = [None]], tags = [[]], meta_data = [{}], warmup_iterations = [None], iterations = [None], warmup_time_period = [None], time_period = [None], clients = [2], completes_parent = [False], schedule = [None], params = [{'name': 'search-two-clients', 'operation': 'search', 'clients': 2}], nested = [False]]

Signed-off-by: Finn Roblin <[email protected]>

IanHoang · 2024-08-20T15:17:54Z

osbenchmark/workload/loader.py

+ for client in op["clients_list"]:
+ op["clients"] = client
+
+ new_name = name + "_" + str(client) + "_clients"


Although this is dependent on how the workload was written, we should standardize on either using - or _ and not both in an operation's and test procedure's name. Many users have encountered frustrating errors all because the name had a mix of - and _.

Suggestion: Before renaming, check if the operation name is using - or _.

If it is using -, use "-" + str(client) + "-clients".

If it is using _, use "_" + str(client) + "_clients".

Else, if it's already using both, OSB can by default append "_" + str(client) + "_clients" to the end of the existing operation name (like what you are currently doing). I would also log a warning to the screen that the user's test_procedure has a mix of - and _, which might cause frustrating bugs in the future.

IanHoang · 2024-08-20T15:18:20Z

tests/workload/loader_test.py

+ }
+ ],
+ "test_procedure": {
+ "name": "default-test_procedure",


Nit: I would standardize on using hyphens or underscores here

IanHoang · 2024-08-20T15:26:07Z

For instance, a user might specify "clients_list": [1, 5] in their parameters.
Then OSB will schedule search tasks with 1, 5, 10, and 12 clients.

Why would the OSB schedule search tasks with 10 and 12 clients if the parameters only specified 1 and 5?

IanHoang · 2024-08-20T15:31:56Z

osbenchmark/results_publisher.py

+ # assumption: all units are the same and only maximizing throughput over one operation (i.e. not both ingest and search).
+ # To maximize throughput over multiple operations, would need a list/dictionary of maximum throughputs.
+ task_throughput = record["throughput"][Throughput.MEAN.value]
+ logger = logging.getLogger(__name__)


Let's move this to be under the __init__ method of SummaryResultsPublisher for when we would like to add logging elsewhere in the class.

IanHoang · 2024-08-20T15:35:39Z

osbenchmark/results_publisher.py

- metrics_table.extend(self._publish_error_rate(record, task))
- self.add_warnings(warnings, record, task)
+ maybe_match_task_is_part_of_throughput_testing = re.search(throughput_pattern, task)
+ if maybe_match_task_is_part_of_throughput_testing:


Could we simplify this to is_task_part_of_throughput_testing?

IanHoang · 2024-08-20T15:39:02Z

osbenchmark/results_publisher.py

+
+ else:
+ self.publish_operational_statistics(metrics_table=metrics_table, warnings=warnings, record=record, task=task)
+


Nit: Would be nice if there was another comment above this if statement to stating that the following blurb is related to throughput testing / when specifying multiple clients within the operation

IanHoang · 2024-08-20T15:39:50Z

osbenchmark/results_publisher.py

@@ -217,6 +251,10 @@ def _publish_recall(self, values, task):
 self._line("Mean recall@1", task, recall_1_mean, "", lambda v: "%.2f" % v)
 )

+ def _publish_best_client_settings(self, record, task):
+ num_clients = re.search(r"_(\d+)_clients$", task).group(1)
+ return self._join(self._line("Num clients that achieved maximium throughput", "", num_clients, ""))


"Num clients" -> "Number of clients"
If this is being surfaced to users in the output, we should refrain from using shorthand.

IanHoang · 2024-08-20T15:44:10Z

osbenchmark/results_publisher.py

@@ -145,16 +162,33 @@ def publish(self):

 metrics_table.extend(self._publish_transform_stats(stats))

+ max_throughput = -1


Nit: Would also add a comment above this to specify that the following three variables are related to clients_list parameter in test_procedures. Newcomers might not quickly understand that this is only relevant to throughput testing clients

IanHoang · 2024-08-20T15:46:26Z

osbenchmark/workload/loader.py

- if "parallel" in op:
- task = self.parse_parallel(op["parallel"], ops, name)
+ if "clients_list" in op:
+ self.logger.info("Clients list specified, running multiple search tasks with %s clients.", op["clients_list"])


Nit: "Clients list specified, running task with multiple clients: %s" might be cleaner?

I agree this is confusing. Technically we're running multiple tasks, each with multiple clients (or only 1 client).
I changed it to self.logger.info("Clients list specified: %s. Running multiple search tasks, each scheduled with the corresponding number of clients from the list.", op["clients_list"])

IanHoang

Left some comments

Signed-off-by: Finn Roblin <[email protected]>

IanHoang

@finnroblin LGTM Thanks for doing this!

Multiple search clients for automatic scaling

94e555c

Signed-off-by: Finn Roblin <[email protected]>

finnroblin requested review from IanHoang, gkamat, beaioun, cgchinmay, rishabh6788 and VijayanB as code owners August 7, 2024 19:46

finnroblin commented Aug 7, 2024

View reviewed changes

Address Vijay offline feedback

c94f1c1

Signed-off-by: Finn Roblin <[email protected]>

IanHoang reviewed Aug 20, 2024

View reviewed changes

finnroblin changed the title ~~Multiple search clients for automatic scaling~~ Specify multiple search clients for easier benchmarking Aug 20, 2024

IanHoang reviewed Aug 20, 2024

View reviewed changes

IanHoang requested changes Aug 20, 2024

View reviewed changes

Address Ian's feedback

38c1241

Signed-off-by: Finn Roblin <[email protected]>

finnroblin requested a review from IanHoang August 23, 2024 21:45

IanHoang approved these changes Sep 5, 2024

View reviewed changes

IanHoang merged commit 5403700 into opensearch-project:main Sep 5, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specify multiple search clients for easier benchmarking #614

Specify multiple search clients for easier benchmarking #614

finnroblin commented Aug 7, 2024

finnroblin Aug 7, 2024

finnroblin Aug 7, 2024

IanHoang Aug 20, 2024

finnroblin Aug 22, 2024

IanHoang Aug 20, 2024

IanHoang Aug 20, 2024

IanHoang commented Aug 20, 2024

IanHoang Aug 20, 2024

IanHoang Aug 20, 2024

IanHoang Aug 20, 2024

IanHoang Aug 20, 2024 •

edited

Loading

IanHoang Aug 20, 2024

IanHoang Aug 20, 2024

finnroblin Aug 22, 2024

IanHoang left a comment

IanHoang left a comment


		else:
		self.publish_operational_statistics(metrics_table=metrics_table, warnings=warnings, record=record, task=task)

		@@ -145,16 +162,33 @@ def publish(self):

		metrics_table.extend(self._publish_transform_stats(stats))

		max_throughput = -1

Specify multiple search clients for easier benchmarking #614

Specify multiple search clients for easier benchmarking #614

Conversation

finnroblin commented Aug 7, 2024

Description

Issues Resolved

Testing

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang commented Aug 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang Aug 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

IanHoang left a comment

Choose a reason for hiding this comment

IanHoang left a comment

Choose a reason for hiding this comment

IanHoang Aug 20, 2024 •

edited

Loading