-
Notifications
You must be signed in to change notification settings - Fork 109
Conversation
The model appears to be failing on Python 3.5 because of the shape of the
|
keras2onnx/ke2onnx/bidirectional.py
Outdated
oopb = OnnxOperatorBuilder(container, scope) | ||
|
||
if container.target_opset < 9: | ||
# need the zero initializer to correct some engine shape inference bug. | ||
state_shape = (2, 1, hidden_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is removed because this line does not handle the batch size properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I surprisely found there is no CI build failure of removing the initial state support.
But we do have some user model depends on initial state. And the current code can only support batch_size == 1, either we can raise exception on batchsize > 1 case or we fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the tests are not great in this area (I've been thinking of a refactor of the tests using pytest.mark.parametrize
to get better coverage). The implementation I made does not limit the batch size, it just does not support initial_h
and initial_c
inputs -- which is consistent with the current code. I would suggest that we merge this PR and I can address the remainder in a separate effort. I already have a local branch that is targeting support for GRU and SimpleRNN in the Bidirectional wrapper. I think it is worth adding support for the initial_h
and initial_c
in that PR, instead of the current one. That way, I can add support for all RNN layers all at once. Does that make sense to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think the current code doesn't work on initial_c/initial_h? I only find there is a hard code on batch_size when opset < 9.
Initial states are important feature for some RNN model. what's the conflict between initial states and masking?
If you want to your implementation to be easier, you can raise an exception on both initial states and masking enabled. So your code can safely work only with the non-initial-states case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, the current code does not work for initial_h
or initial_c
for the Bidirectional case. In the opset < 9 case, the batch size is hard coded to 1. In the opset >= 9 case the following occurs:
keras-onnx/keras2onnx/ke2onnx/bidirectional.py
Lines 208 to 213 in 30d47f4
state_constant_shape_h = oopb.add_node('ConstantOfShape', | |
[batch_size_tensor], | |
operator.inputs[0].full_name + '_state_shape_constant_h') | |
state_constant_shape_c = oopb.add_node('ConstantOfShape', | |
[batch_size_tensor], | |
operator.inputs[0].full_name + '_state_shape_constant_c') |
This section uses ConstantOfShape. Since the value
input for this node is not specified, "it defaults to a tensor of value 0 and datatype float32". That means that it is just an elaborate way to construct a zeros tensor. In my PR I just removed these lines, because the inputs default to zero tensors -- no need to compute a zero tensor based on the input shape when the LSTM operator does that by default.
You can see this by investigating the model produced by the test I added in adbc41c.
https://filebin.net/xuv4xj6jp2wqcz8m
I reduced this model to output the initial_h
and initial_c
, and can confirm that it is always producing zeros based on the shape as I describe above.
I agree that the initial states are important, but suggest that we address that in a separate PR. This PR provides masking support -- independent of the current issues with the initial values. I included the changes in 86d11a8 because the builds for opset < 9 will fail otherwise, because I am testing a batch size of 2. This PR fixes the batch size limit, while otherwise producing the same results as before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, please keep opset < 9 code because I can recall there is a bug in some old onnx package. And WinML user should still be affected by this bug. so let's just keep it as it was.
For opset >9, we can find a better solution in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've reverted the change. I am now skipping the test for opset < 9, since batch_sizes > 1 are not supported properly as discussed above. I will put up a separate PR to add GRU and SimpleRNN support to the Bidirectional converter. Can you point me to more information on the opset < 9 issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onnx package has a shape inference which was used by the inference runtime. However in one important onnx release, there was a bug that without initial states even it is all zero, the shape inference will fail. let me see if I can dig out the bug detail from the github later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the context!
keras2onnx/ke2onnx/bidirectional.py
Outdated
oopb = OnnxOperatorBuilder(container, scope) | ||
|
||
if container.target_opset < 9: | ||
# need the zero initializer to correct some engine shape inference bug. | ||
state_shape = (2, 1, hidden_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I surprisely found there is no CI build failure of removing the initial state support.
But we do have some user model depends on initial state. And the current code can only support batch_size == 1, either we can raise exception on batchsize > 1 case or we fix it.
This PR fixes the case of a Bidirectional LSTM when used in conjunction with a Masking layer. This follows the same solution as #386. The new graph connects the masking output to a section that computes the sequence lengths, which provides proper masking for padded input. Since the GRU and SimpleRNN layers are not yet supported for the bidirectional case, this PR does not include those cases.