Converting InferenceRequest to InferInput #644

ZhanqiuHu · 2024-05-10T02:15:05Z

I have a decoupled model (python backend) that receives requests from a client and sends the request to another downstream model, and the intermediate model only processes some inputs and passes the rest to the next model.

Currently, I'm converting inputs to numpy arrays first and then wrap them in InferInput.

for input_name in self.input_names[1:]:
                data_ = pb_utils.get_input_tensor_by_name(request, input_name)\
                    .as_numpy()\
                    .reshape(-1)
                input_ = triton_grpc.InferInput(input_name, data_.shape, "FP32" if data_.dtype == np.float32 else "INT32")
                input_.set_data_from_numpy(data_)
                inputs.append(input_)

However, I think the .as_numpy() and .set_data_from_numpy() functions do some (de)serialization, and using a for loop to copy most of the inputs is a little bit inefficient.

Is there a way to convert InferenceRequest to InferInput more efficiently?

Thanks!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Converting InferenceRequest to InferInput #644

Converting InferenceRequest to InferInput #644

ZhanqiuHu commented May 10, 2024

Converting InferenceRequest to InferInput #644

Converting InferenceRequest to InferInput #644

Comments

ZhanqiuHu commented May 10, 2024