Received message larger than max (4199881 vs. 4194304) #74

pselden · 2020-10-27T14:59:47Z

I have a TFX pipeline that runs in Kubeflow on GCP and recently one of my pipelines started failing with the following error in a ResolverNode.latest_model_resolver and ResolverNode.latest_blessed_model_resolver

Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 165, in _call_method
    response.CopyFrom(grpc_method(request))
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 826, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/usr/local/lib/python3.7/dist-packages/grpc/_channel.py", line 729, in _end_unary_response_blocking
    raise _InactiveRpcError(state)
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
	status = StatusCode.RESOURCE_EXHAUSTED
	details = "Received message larger than max (4199881 vs. 4194304)"
	debug_error_string = "{"created":"@1603760693.874743930","description":"Received message larger than max (4199881 vs. 4194304)","file":"src/core/ext/filters/message_size/message_size_filter.cc","file_line":203,"grpc_status":8}"
>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 360, in <module>
    main()
  File "/tfx-src/tfx/orchestration/kubeflow/container_entrypoint.py", line 353, in main
    execution_info = launcher.launch()
  File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 197, in launch
    self._exec_properties)
  File "/tfx-src/tfx/orchestration/launcher/base_component_launcher.py", line 166, in _run_driver
    component_info=self._component_info)
  File "/tfx-src/tfx/components/common_nodes/resolver_node.py", line 73, in pre_execution
    source_channels=input_dict.copy())
  File "/tfx-src/tfx/dsl/experimental/latest_artifacts_resolver.py", line 56, in resolve
    output_key=c.output_key)
  File "/tfx-src/tfx/orchestration/metadata.py", line 323, in get_qualified_artifacts
    executions = self.store.get_executions_by_context(context.id)
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 1080, in get_executions_by_context
    self._call('GetExecutionsByContext', request, response)
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 140, in _call
    return self._call_method(method_name, request, response)
  File "/usr/local/lib/python3.7/dist-packages/ml_metadata/metadata_store/metadata_store.py", line 170, in _call_method
    raise _make_exception(e.details(), e.code().value[0])  # pytype: disable=attribute-error
ml_metadata.errors.ResourceExhaustedError: Received message larger than max (4199881 vs. 4194304)

Is there a way to fix this on my side?

The text was updated successfully, but these errors were encountered:

pselden · 2020-10-27T17:04:30Z

I was able to fix this by deleting some items from the Associations table for the given context.

This is obviously just a bandaid. It seems like the root of this problem is that since there's no way to do filtering in ml-metadata, that TFX has to load ALL executions just to be able to choose the latest one.

hughmiao · 2020-10-27T19:43:39Z

hi @pselden , you can use the grpc options to increase the size.

/cc @ruoyu90 on the resolver logic refactoring.

We also also working on filtering. Please stay tuned.

hughmiao · 2020-10-28T00:02:22Z

@pselden, related to the discussion here, we will surface the pagination API to the python client.

ConverJens · 2020-11-02T09:47:52Z

hi @pselden , you can use the grpc options to increase the size.

/cc @ruoyu90 on the resolver logic refactoring.

We also also working on filtering. Please stay tuned.

@hughmiao I encountered the same issue when running the Evaluator component in TFX while slicing on continuous features. I'm also running MLMD in Kubeflow, is there a way to set the grpc max message length as a command line option, like grpc-port?

hughmiao · 2020-11-11T21:57:18Z

@ConverJens , the config can be passed in the tfx pipeline mlmd config settings.

I'm also running MLMD in Kubeflow, is there a way to set the grpc max message length as a command line option, like grpc-port?

For kubeflow deployment, + @dushyanthsc for KFP settings / command line options.

redramen · 2020-11-16T23:37:10Z

... we will surface the pagination API to the python client.

@hughmiao do you happen to have an ETA for this? We are building atop MLMD at Twitter and pagination is a must have for us.

dushyanthsc · 2020-11-17T00:14:29Z

@redramen I should have the pagination support surfaced in the python client by end of this week.

hughmiao · 2020-11-17T20:27:03Z

@redramen sg, will prioritize this.

thanks, @dushyanthsc ! Let's follow up in the cl.

/cc @ruoyu90 for tfx side changes if needed.

ConverJens · 2020-11-18T10:53:25Z

@ConverJens , the config can be passed in the tfx pipeline mlmd config settings.

I'm also running MLMD in Kubeflow, is there a way to set the grpc max message length as a command line option, like grpc-port?

For kubeflow deployment, + @dushyanthsc for KFP settings / command line options.

@dushyanthsc Any documentation on how to specify this?

dushyanthsc · 2020-11-20T21:58:36Z

@ConverJens we plan to use the ListOperationOptions(https:/google/ml-metadata/blob/master/ml_metadata/proto/metadata_store.proto#L638) in gRPC service layer and have the python client get executions by calling the gRPC with page size.

I am working on the CL, and should have a point release out early next week.

…gination and ordering results by ID, create time and last update time fields. The change further use the exposed options to in get_executions_by_context / get_artifacts_by_context python APIs to address feature request in #74. PiperOrigin-RevId: 344194942

dushyanthsc · 2020-11-25T22:17:43Z

@ConverJens @redramen A solution for the problem is checked in and available at HEAD.

The solution was to use the pagination support and retrieve Executions in page of 100 executions per page and abstract this logic behind get_executions_by_context.

If you can elaborate how you consume MLMD i.e. from source code or released version or through TFX we can decide we need to cut a point release.

ConverJens · 2020-11-26T09:35:15Z

@dushyanthsc That sounds great for the paging problem. However, my problem was that the proto message from a single Evaluator run was too large, hence I would like to specify this option for MLMD in the KubeFlow installation, not just from a client. Any idea how this can be achieved? Perhaps as a command line option, like grpc-port?

And regarding how we consume MLMD: through Jupyter using the kubeflow-metadata package and through TFX pipelines. The pagination is not a blocker for our use case at the moment.

dushyanthsc · 2020-11-27T17:41:14Z

@ConverJens What is the underlying MLMD API call that the Evaluator makes? Can you provide the log of the error you are seeing, that way I can confirm if the change made solves your problem.

For providing the config flag to increase the allowed payload size. You can set the command line parameter [1] which for a kubeflow-metadata deployment gets passed from the deployment manifest [2]

[1] - https:/google/ml-metadata/blob/master/ml_metadata/metadata_store/metadata_store_server_main.cc#L142

[2] - https:/kubeflow/manifests/blob/master/metadata/base/metadata-deployment.yaml#L26

dushyanthsc · 2020-11-27T17:45:03Z

Adding @Bobgy to comment on the current state of support for kubeflow-metadata package based on what I see in [1]

[1] - kubeflow/manifests#1638

Bobgy · 2020-11-28T02:59:02Z

@dushyanthsc Can I confirm the lowest mlmd's version with pagination capabilities?

I think we'll need to upgrade the server too.

Regarding kubeflow-metadata python client package, there's no maintainers any more, so I'd suggest planning for a migration to an alternative

redramen · 2020-11-30T17:26:13Z

@ConverJens @redramen A solution for the problem is checked in and available at HEAD.

The solution was to use the pagination support and retrieve Executions in page of 100 executions per page and abstract this logic behind get_executions_by_context.

If you can elaborate how you consume MLMD i.e. from source code or released version or through TFX we can decide we need to cut a point release.

We use the latest versioned release available on pypi so we'll be able to use this whenever a new version is released

dushyanthsc · 2020-11-30T17:58:56Z

@Bobgy The pagination support for GetArtifacts, GetExecution and GetContexts were available in the gRPC service from release 0.23.0

@redramen Got it. I will have the point release started today, will update this thread when it is complete.

hughmiao · 2020-12-01T18:19:35Z

thanks, @dushyanthsc . @redramen, in addition to use the 0.25.1 py client release, you also need to use the 0.25.1 server binary which adds the pagination to to that rpc GetExecutionsByContext used in tfx. /cc @Bobgy

hughmiao · 2020-12-08T20:07:21Z

Close the issue, as the wheel and server are released. Please feel free to reopen.

pselden mentioned this issue Oct 27, 2020

Deleting/Cleanup older TFX runs #69

Open

dhruvesh09 mentioned this issue Dec 1, 2020

Update GetExecutionsByContext and GetArtifactsByContext to support pagination #84

Merged

Bobgy mentioned this issue Dec 2, 2020

[Release] Kubeflow Pipelines 1.2 umbrella tracker kubeflow/pipelines#4565

Closed

1 task

jay90099 mentioned this issue Dec 2, 2020

Update GetExecutionsByContext and GetArtifactsByContext to support pa… #90

Closed

hughmiao closed this as completed Dec 8, 2020

Sharathmk99 mentioned this issue Nov 4, 2022

[backend] ML Metadata writer errors with - Received message larger than max kubeflow/pipelines#8408

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Received message larger than max (4199881 vs. 4194304) #74

Received message larger than max (4199881 vs. 4194304) #74

pselden commented Oct 27, 2020

pselden commented Oct 27, 2020

hughmiao commented Oct 27, 2020

hughmiao commented Oct 28, 2020

ConverJens commented Nov 2, 2020

hughmiao commented Nov 11, 2020

redramen commented Nov 16, 2020

dushyanthsc commented Nov 17, 2020

hughmiao commented Nov 17, 2020

ConverJens commented Nov 18, 2020

dushyanthsc commented Nov 20, 2020

dushyanthsc commented Nov 25, 2020

ConverJens commented Nov 26, 2020 •

edited

Loading

dushyanthsc commented Nov 27, 2020

dushyanthsc commented Nov 27, 2020

Bobgy commented Nov 28, 2020 •

edited

Loading

redramen commented Nov 30, 2020

dushyanthsc commented Nov 30, 2020

hughmiao commented Dec 1, 2020

hughmiao commented Dec 8, 2020

Received message larger than max (4199881 vs. 4194304) #74

Received message larger than max (4199881 vs. 4194304) #74

Comments

pselden commented Oct 27, 2020

pselden commented Oct 27, 2020

hughmiao commented Oct 27, 2020

hughmiao commented Oct 28, 2020

ConverJens commented Nov 2, 2020

hughmiao commented Nov 11, 2020

redramen commented Nov 16, 2020

dushyanthsc commented Nov 17, 2020

hughmiao commented Nov 17, 2020

ConverJens commented Nov 18, 2020

dushyanthsc commented Nov 20, 2020

dushyanthsc commented Nov 25, 2020

ConverJens commented Nov 26, 2020 • edited Loading

dushyanthsc commented Nov 27, 2020

dushyanthsc commented Nov 27, 2020

Bobgy commented Nov 28, 2020 • edited Loading

redramen commented Nov 30, 2020

dushyanthsc commented Nov 30, 2020

hughmiao commented Dec 1, 2020

hughmiao commented Dec 8, 2020

ConverJens commented Nov 26, 2020 •

edited

Loading

Bobgy commented Nov 28, 2020 •

edited

Loading