Bidirectional Masking support #400

cjermain · 2020-03-01T02:55:40Z

This PR fixes the case of a Bidirectional LSTM when used in conjunction with a Masking layer. This follows the same solution as #386. The new graph connects the masking output to a section that computes the sequence lengths, which provides proper masking for padded input. Since the GRU and SimpleRNN layers are not yet supported for the bidirectional case, this PR does not include those cases.

cjermain · 2020-03-01T03:04:37Z

The model appears to be failing on Python 3.5 because of the shape of the initial_h input. This does not occur on Python 3.6 or 3.7

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running LSTM node. Name:'bi' Status Message: Input initial_h must have shape {2,2,8}. Actual:{2,1,8}

…set < 9

cjermain · 2020-03-01T04:04:05Z

I fixed the issue by removing the initial_h and initial_c inputs, so they default to 0. This avoids the issue in opsets < 9, where the batch_size was being explicitly set to 1 -- to accomplish the same zero inputs for both initial variables. This simplifies the graph and corrects that bug.

cjermain · 2020-03-01T04:05:28Z

keras2onnx/ke2onnx/bidirectional.py

 oopb = OnnxOperatorBuilder(container, scope)

- if container.target_opset < 9:
- # need the zero initializer to correct some engine shape inference bug.
- state_shape = (2, 1, hidden_size)


This section is removed because this line does not handle the batch size properly.

I surprisely found there is no CI build failure of removing the initial state support.
But we do have some user model depends on initial state. And the current code can only support batch_size == 1, either we can raise exception on batchsize > 1 case or we fix it.

Yea, the tests are not great in this area (I've been thinking of a refactor of the tests using pytest.mark.parametrize to get better coverage). The implementation I made does not limit the batch size, it just does not support initial_h and initial_c inputs -- which is consistent with the current code. I would suggest that we merge this PR and I can address the remainder in a separate effort. I already have a local branch that is targeting support for GRU and SimpleRNN in the Bidirectional wrapper. I think it is worth adding support for the initial_h and initial_c in that PR, instead of the current one. That way, I can add support for all RNN layers all at once. Does that make sense to you?

Do you think the current code doesn't work on initial_c/initial_h? I only find there is a hard code on batch_size when opset < 9.

Initial states are important feature for some RNN model. what's the conflict between initial states and masking?

If you want to your implementation to be easier, you can raise an exception on both initial states and masking enabled. So your code can safely work only with the non-initial-states case.

Yea, the current code does not work for initial_h or initial_c for the Bidirectional case. In the opset < 9 case, the batch size is hard coded to 1. In the opset >= 9 case the following occurs:

keras-onnx/keras2onnx/ke2onnx/bidirectional.py

Lines 208 to 213 in 30d47f4

state_constant_shape_h = oopb.add_node('ConstantOfShape',

[batch_size_tensor],

operator.inputs[0].full_name + '_state_shape_constant_h')

state_constant_shape_c = oopb.add_node('ConstantOfShape',

[batch_size_tensor],

operator.inputs[0].full_name + '_state_shape_constant_c')

This section uses ConstantOfShape. Since the value input for this node is not specified, "it defaults to a tensor of value 0 and datatype float32". That means that it is just an elaborate way to construct a zeros tensor. In my PR I just removed these lines, because the inputs default to zero tensors -- no need to compute a zero tensor based on the input shape when the LSTM operator does that by default.

You can see this by investigating the model produced by the test I added in adbc41c.
https://filebin.net/xuv4xj6jp2wqcz8m

I reduced this model to output the initial_h and initial_c, and can confirm that it is always producing zeros based on the shape as I describe above.

I agree that the initial states are important, but suggest that we address that in a separate PR. This PR provides masking support -- independent of the current issues with the initial values. I included the changes in 86d11a8 because the builds for opset < 9 will fail otherwise, because I am testing a batch size of 2. This PR fixes the batch size limit, while otherwise producing the same results as before.

OK, please keep opset < 9 code because I can recall there is a bug in some old onnx package. And WinML user should still be affected by this bug. so let's just keep it as it was.

For opset >9, we can find a better solution in the future.

Ok, I've reverted the change. I am now skipping the test for opset < 9, since batch_sizes > 1 are not supported properly as discussed above. I will put up a separate PR to add GRU and SimpleRNN support to the Bidirectional converter. Can you point me to more information on the opset < 9 issue?

onnx package has a shape inference which was used by the inference runtime. However in one important onnx release, there was a bug that without initial states even it is all zero, the shape inference will fail. let me see if I can dig out the bug detail from the github later.

Ok, thanks for the context!

wenbingl · 2020-03-03T19:29:02Z

keras2onnx/ke2onnx/bidirectional.py

 oopb = OnnxOperatorBuilder(container, scope)

- if container.target_opset < 9:
- # need the zero initializer to correct some engine shape inference bug.
- state_shape = (2, 1, hidden_size)


I surprisely found there is no CI build failure of removing the initial state support.
But we do have some user model depends on initial state. And the current code can only support batch_size == 1, either we can raise exception on batchsize > 1 case or we fix it.

cjermain added 3 commits February 29, 2020 21:31

Adding broken test for masked-bidirectional RNNs

adbc41c

Fixing weight setting, and restricting test to LSTM

21a4d27

Adding masking support for Bidirectional layer

e6048f5

Using default initial_h and initial_c to avoid batch size bug with op…

86d11a8

…set < 9

cjermain commented Mar 1, 2020

View reviewed changes

wenbingl suggested changes Mar 3, 2020

View reviewed changes

wenbingl and others added 2 commits March 5, 2020 10:42

Merge branch 'master' into bug/bidirectional_masking

7763eef

Reverting initial_h/initial_c changes, and skipping tests for opset < 9

c8504a5

cjermain requested a review from wenbingl March 6, 2020 02:22

wenbingl approved these changes Mar 6, 2020

View reviewed changes

wenbingl merged commit 4fa3b76 into onnx:master Mar 6, 2020

cjermain deleted the bug/bidirectional_masking branch March 7, 2020 02:25

cjermain mentioned this pull request Mar 24, 2020

Bidirectional GRU and SimpleRNN support #413

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bidirectional Masking support #400

Bidirectional Masking support #400

cjermain commented Mar 1, 2020

cjermain commented Mar 1, 2020

cjermain commented Mar 1, 2020

cjermain Mar 1, 2020

wenbingl Mar 3, 2020

cjermain Mar 4, 2020

wenbingl Mar 4, 2020

cjermain Mar 4, 2020 •

edited

Loading

wenbingl Mar 4, 2020

cjermain Mar 6, 2020

wenbingl Mar 6, 2020

cjermain Mar 7, 2020

wenbingl Mar 3, 2020

	state_constant_shape_h = oopb.add_node('ConstantOfShape',
	[batch_size_tensor],
	operator.inputs[0].full_name + '_state_shape_constant_h')
	state_constant_shape_c = oopb.add_node('ConstantOfShape',
	[batch_size_tensor],
	operator.inputs[0].full_name + '_state_shape_constant_c')

Bidirectional Masking support #400

Bidirectional Masking support #400

Conversation

cjermain commented Mar 1, 2020

cjermain commented Mar 1, 2020

cjermain commented Mar 1, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjermain Mar 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cjermain Mar 4, 2020 •

edited

Loading