Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unity][MSC] Avoid depending on trivial bindings in Relax intermediate #16349

Conversation

Lunderberg
Copy link
Contributor

The conversion from tensorflow to MSC is done by first converting from tensorflow to relay, then converting from relay to executable python code, executing that python code to generate relax, and finally converting from relax to MSC. During the relax phase of this conversion, some relax IRModule are applied, including FuseOpsByPattern.

The test cases in test_msc/test_translate_tensorflow.py rely on FuseOpsByPattern preserving trivial bindings (e.g. var_1 = var_2) in the relax IRModule. If these trivial bindings are removed by CanonicalizeBindings, then the test cases in this file fail. The presence or absence of trivial bindings FuseOpsByPattern should be considered an implementation detail, and relax passes should not be required to preserve trivial bindings.

This PR updates the relay to executable python step of the tensorflow to MSC conversion, to remove trivial bindings and output a variable name that matches the expected value in the test case. While not an ideal resolution, as other variable name changes could still reintroduce the same test failures, it ensures that FuseOpsByPattern may canonicalize bindings as an internal pre- or post-processing step without breaking these unit tests.

The conversion from tensorflow to MSC is done by first converting from
tensorflow to relay, then converting from relay to executable python
code, executing that python code to generate relax, and finally
converting from relax to MSC.  During the relax phase of this
conversion, some relax `IRModule` are applied, including
`FuseOpsByPattern`.

The test cases in `test_msc/test_translate_tensorflow.py` rely on
`FuseOpsByPattern` preserving trivial bindings (e.g. `var_1 = var_2`)
in the relax IRModule.  If these trivial bindings are removed by
`CanonicalizeBindings`, then the test cases in this file fail.  The
presence or absence of trivial bindings `FuseOpsByPattern` should be
considered an implementation detail, and relax passes should not be
required to preserve trivial bindings.

This PR updates the relay to executable python step of the tensorflow
to MSC conversion, to remove trivial bindings and output a variable
name that matches the expected value in the test case.  While not an
ideal resolution, as other variable name changes could still
reintroduce the same test failures, it ensures that `FuseOpsByPattern`
may canonicalize bindings as an internal pre- or post-processing step
without breaking these unit tests.
@@ -105,6 +105,7 @@ void RelaxCodeGen::CodeGenGraph() {
}
}
stack_.func_call("block_builder.emit_output", idx_exit).call_arg(idx_exit);
stack_.call_arg(DocUtils::ToStrDoc(e->name), "name_hint");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A PR is just merged (hours ago), which change DocUtils::ToStrDoc -> DocUtils::ToStr. I expand some DocNodes for plugin building, and the DocUtils apis are simplified. That happened to conflict with your changes, please rename this ToStrDoc ->ToStr, and I'll wait this PR merged before adding plugin builder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, and conflicts are now resolved.

In looking at the three failing unit tests in CI, I ended up preserving the existing names, and instead avoiding the existence of the trivial bindings by using non-dataflow bindings. From a cursory reading of the MSC conversion steps, the dataflow block doesn't seem required, and removing it simplifies the Relax variable naming considerably. Is my understanding correct, and is there anywhere that relies on the presence of a dataflow block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the unit tests have quite a bit of dependency on the exact functional form of the relax functions generated by the MSC to relax conversion.

  • If R.shape or R.const appears as its own binding, or as part of an expression.
  • If trivial bindings are preserved.
  • If a R.const appears in a return statement, or in a binding.

None of these impact the semantic meaning of the relax compute graph, but they are causing failures for assert_structural_equal, because, well, they aren't structurally equal. Normally, because it isn't important to preserve these details in a round-trip, I'd adjust the test cases to avoid irrelevant details. However, because the test cases are generated using tvm.relax.frontend.torch.from_fx utility, this can't be done either.

So, the design of the test cases requires that tvm.relax.frontend.torch.from_fx and tvm.contrib.msc.framework.tvm.codegen have the exact same behavior when outputting trivial variable bindings. The use of FuseOpsByPattern by tvm.contrib.msc.framework.tvm.codegen requires that FuseOpsByPattern preserve those trivial bindings. This is far tighter coupling that these components should have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. About codegen the dataflow: I haven't test the codegen without dataflow() scope. I build the codegen of relax like: https:/apache/tvm/blob/unity/python/tvm/relax/frontend/torch/fx_translator.py#L1471-L1473. The from_fx function use a dataflow as the scope for the model, so I build the codegen likely. I think removing the dataflow() only have influence on relax codegen, thus test_translate_relax.py, test_translate_relay.py, test_runner.py and test_manager.py may be influenced.

2.Yes, the assert_structural_equal may be too strict for testing. These failures can be fix by adding a build_target, like: https:/apache/tvm/blob/unity/tests/python/contrib/test_msc/test_translate_relay.py#L164, then the test will only compare the results without using assert_structural_equal. The use of assert_structural_equal is mainly for saving test time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was my worry, and what I was trying to avoid doing. The CI in the main branch gets really slow from time to time as too many unit tests require running the full compilation stack, rather than just running the specific functionality being tested. Probably something to circle back to at some point.

For now, the tests in test_translate_relax.py are updated to validate that the round trip of relax -> MSC -> relax has no impact on the output of the compiled module.

The potential for duplicate variable names was introduced by having
the `block_builder.emit_output` call, which is only required to export
values from a dataflow block.  The dataflow block is not used in any
later MSC conversion, and its removal avoids this re-export of
variables.

If the dataflow block is required in the future, it can be generated
using `tvm.relax.transform.ConvertToDataflowBlock`.
@Lunderberg
Copy link
Contributor Author

@Archermmt With CI now passing, can you review?

@Archermmt
Copy link
Contributor

Seems ok to me. Please ask @Hzfengsy to merge.

@Hzfengsy Hzfengsy merged commit b69d720 into apache:unity Jan 12, 2024
15 checks passed
@Lunderberg Lunderberg deleted the unity_msc_avoid_depending_on_trivial_bindings_in_test branch January 12, 2024 14:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants