Skip to content
This repository has been archived by the owner on Oct 13, 2021. It is now read-only.

Bidirectional Masking support #400

Merged
merged 6 commits into from
Mar 6, 2020

Conversation

cjermain
Copy link
Contributor

@cjermain cjermain commented Mar 1, 2020

This PR fixes the case of a Bidirectional LSTM when used in conjunction with a Masking layer. This follows the same solution as #386. The new graph connects the masking output to a section that computes the sequence lengths, which provides proper masking for padded input. Since the GRU and SimpleRNN layers are not yet supported for the bidirectional case, this PR does not include those cases.

temp_sequential onnx(1)

@cjermain
Copy link
Contributor Author

cjermain commented Mar 1, 2020

The model appears to be failing on Python 3.5 because of the shape of the initial_h input. This does not occur on Python 3.6 or 3.7

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running LSTM node. Name:'bi' Status Message: Input initial_h must have shape {2,2,8}. Actual:{2,1,8}

@cjermain
Copy link
Contributor Author

cjermain commented Mar 1, 2020

I fixed the issue by removing the initial_h and initial_c inputs, so they default to 0. This avoids the issue in opsets < 9, where the batch_size was being explicitly set to 1 -- to accomplish the same zero inputs for both initial variables. This simplifies the graph and corrects that bug.

temp_sequential onnx(2)

oopb = OnnxOperatorBuilder(container, scope)

if container.target_opset < 9:
# need the zero initializer to correct some engine shape inference bug.
state_shape = (2, 1, hidden_size)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is removed because this line does not handle the batch size properly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I surprisely found there is no CI build failure of removing the initial state support.
But we do have some user model depends on initial state. And the current code can only support batch_size == 1, either we can raise exception on batchsize > 1 case or we fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the tests are not great in this area (I've been thinking of a refactor of the tests using pytest.mark.parametrize to get better coverage). The implementation I made does not limit the batch size, it just does not support initial_h and initial_c inputs -- which is consistent with the current code. I would suggest that we merge this PR and I can address the remainder in a separate effort. I already have a local branch that is targeting support for GRU and SimpleRNN in the Bidirectional wrapper. I think it is worth adding support for the initial_h and initial_c in that PR, instead of the current one. That way, I can add support for all RNN layers all at once. Does that make sense to you?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think the current code doesn't work on initial_c/initial_h? I only find there is a hard code on batch_size when opset < 9.

Initial states are important feature for some RNN model. what's the conflict between initial states and masking?

If you want to your implementation to be easier, you can raise an exception on both initial states and masking enabled. So your code can safely work only with the non-initial-states case.

Copy link
Contributor Author

@cjermain cjermain Mar 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, the current code does not work for initial_h or initial_c for the Bidirectional case. In the opset < 9 case, the batch size is hard coded to 1. In the opset >= 9 case the following occurs:

state_constant_shape_h = oopb.add_node('ConstantOfShape',
[batch_size_tensor],
operator.inputs[0].full_name + '_state_shape_constant_h')
state_constant_shape_c = oopb.add_node('ConstantOfShape',
[batch_size_tensor],
operator.inputs[0].full_name + '_state_shape_constant_c')

This section uses ConstantOfShape. Since the value input for this node is not specified, "it defaults to a tensor of value 0 and datatype float32". That means that it is just an elaborate way to construct a zeros tensor. In my PR I just removed these lines, because the inputs default to zero tensors -- no need to compute a zero tensor based on the input shape when the LSTM operator does that by default.

You can see this by investigating the model produced by the test I added in adbc41c.
https://filebin.net/xuv4xj6jp2wqcz8m

I reduced this model to output the initial_h and initial_c, and can confirm that it is always producing zeros based on the shape as I describe above.

I agree that the initial states are important, but suggest that we address that in a separate PR. This PR provides masking support -- independent of the current issues with the initial values. I included the changes in 86d11a8 because the builds for opset < 9 will fail otherwise, because I am testing a batch size of 2. This PR fixes the batch size limit, while otherwise producing the same results as before.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, please keep opset < 9 code because I can recall there is a bug in some old onnx package. And WinML user should still be affected by this bug. so let's just keep it as it was.

For opset >9, we can find a better solution in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've reverted the change. I am now skipping the test for opset < 9, since batch_sizes > 1 are not supported properly as discussed above. I will put up a separate PR to add GRU and SimpleRNN support to the Bidirectional converter. Can you point me to more information on the opset < 9 issue?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onnx package has a shape inference which was used by the inference runtime. However in one important onnx release, there was a bug that without initial states even it is all zero, the shape inference will fail. let me see if I can dig out the bug detail from the github later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the context!

oopb = OnnxOperatorBuilder(container, scope)

if container.target_opset < 9:
# need the zero initializer to correct some engine shape inference bug.
state_shape = (2, 1, hidden_size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I surprisely found there is no CI build failure of removing the initial state support.
But we do have some user model depends on initial state. And the current code can only support batch_size == 1, either we can raise exception on batchsize > 1 case or we fix it.

@cjermain cjermain requested a review from wenbingl March 6, 2020 02:22
@wenbingl wenbingl merged commit 4fa3b76 into onnx:master Mar 6, 2020
@cjermain cjermain deleted the bug/bidirectional_masking branch March 7, 2020 02:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants