Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468

Open
wants to merge 23 commits into
base: develop
Choose a base branch
from

Conversation

aditya-167
Copy link

Hi @pfultz2,@CharlieL7

I am working on : #3177 (this is my first PR) as suggested by @umangyadav that you would help and review my changes.

Problem Description

OneDNN has a known limitation when handling specific pooling configurations related to padding, stride, and kernel
size.

Solution Overview

OneDNN fails for certain combinations such as padding = {2}, stride = {1}, and lengths = {2}. This configuration leads to a failure in the pooling operation.
OneDNN has a kernel size limitation: The referenced OneDNN documentation specifies a maximum dimension size of 14 for the pooling kernel (referred to as the "weights tensor"). This limitation is not currently enforced by MIGraphX, which can lead to failures during execution.
The goal of this PR is to address these limitations by detecting such problematic pooling configurations and falling back to the reference backend (a CPU-based implementation) when OneDNN is unable to execute the pooling operation.

Condition Checking:

The pooling operator is analyzed to check if the pooling configuration violates OneDNN's known limitations:
Check if the kernel size (lengths) exceeds the maximum size of 14.
Check for combinations of padding, stride, and kernel size that are known to fail with OneDNN.
If OneDNN cannot execute the pooling operation due to these limitations, the code falls back to the reference backend (a CPU-based pooling implementation).

Fallback Mechanism:

For invalid configurations, the pooling operation is replaced with a reference backend pooling operator (ref::pooling).
Valid configurations continue to use the OneDNN pooling operation (dnnl::pooling).

Code Changes:

https:/aditya-167/AMDMIGraphX/blob/054055f93d6e65aa8b3d7c46a701156b97d15b8d/src/targets/cpu/lowering.cpp#L432C5-L453C10

Please suggest if any any improvment needed (this is my first PR) :-) , also let me know I how to test this as I was thinking of creating a test file for the same however I was wondering if there exists already one in the repo that I can add to ? but I couldnt find one for this operation.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Add a verify test in the test/verify/ to make sure this works. Verify tests run the whole compilation process on all targets. The test should look something like this: test/verify/test_pooling_autopad.cpp`

test/CMakeLists.txt Outdated Show resolved Hide resolved
test/onnx/.onnxrt-commit Outdated Show resolved Hide resolved
@aditya-167
Copy link
Author

Looks good overall. Add a verify test in the test/verify/ to make sure this works. Verify tests run the whole compilation process on all targets. The test should look something like this: test/verify/test_pooling_autopad.cpp`

okay, I have added a test: test_pooling_fallback.cpp that checks for 3 cases, one for valid other 2 for kernel size >14 & invalid pooling that should trigger fallback

@CharlieL7
Copy link
Collaborator

CharlieL7 commented Sep 24, 2024

Failing to compile, needs fix. Check CI runs for the errors.

@aditya-167
Copy link
Author

Failing to compile, needs fix. Check CI runs for the errors.

I have updated the tests, seems like I was handling it in not the the right way. compiles in my local repo..

Copy link

codecov bot commented Sep 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.05%. Comparing base (2c2085d) to head (4405589).
Report is 3 commits behind head on develop.

Additional details and impacted files
@@            Coverage Diff            @@
##           develop    #3468    +/-   ##
=========================================
  Coverage    92.04%   92.05%            
=========================================
  Files          506      508     +2     
  Lines        20872    21174   +302     
=========================================
+ Hits         19212    19491   +279     
- Misses        1660     1683    +23     
Flag Coverage Δ
92.05% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CharlieL7
Copy link
Collaborator

I'm going to help this PR along and make some changes to pass the rest of the CI.

@aditya-167
Copy link
Author

I'm going to help this PR along and make some changes to pass the rest of the CI.

thanks @CharlieL7 .. I reviewed the CI.. it fails at cppcheck and format.. not sure how to resolve this, it compiles and tests completely in my local..

Comment on lines 444 to 445
if(std::any_of(lengths.begin(), lengths.end(), [](int len) { return len > 14; }) ||
std::any_of(padding.begin(), padding.end(), [](int pad) { return pad > 14; }))
Copy link
Collaborator

@CharlieL7 CharlieL7 Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The padding condition is not tested and I'm pretty sure is not the correct limitation for OneDNN. I'm also not seeing the 14 kernel length limitation in the OneDNN spec: https://oneapi-src.github.io/oneDNN/dev_guide_pooling.html#doxid-dev-guide-pooling-1dg-pool-impl-limits.

Copy link
Collaborator

@CharlieL7 CharlieL7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me what the actual limitation for this kernel is from OneDNN. Umang did write a TODO comment, but the link provided does not mention such limitations. Need to find a source that lays out the limitation or other proof that OneDNN cannot handle these cases.

@aditya-167
Copy link
Author

It's not clear to me what the actual limitation for this kernel is from OneDNN. Umang did write a TODO comment, but the link provided does not mention such limitations. Need to find a source that lays out the limitation or other proof that OneDNN cannot handle these cases.

It's not clear to me what the actual limitation for this kernel is from OneDNN. Umang did write a TODO comment, but the link provided does not mention such limitations. Need to find a source that lays out the limitation or other proof that OneDNN cannot handle these cases.

Earlier I was referring to https://oneapi-src.github.io/oneDNN/dev_guide_convolution.html#doxid-dev-guide-convolution if you take a look at oneDNN algorithm section, there you may see "Weights tensor width does not exceed 14." I think I mixed up few things here.. I need more digging into this and test few cases where it fails..

@CharlieL7
Copy link
Collaborator

Right, I see the limit for the direct algorithm for convolution; but this is for pooling. Technically pooling can be implemented as a convolution, but the pooling page doesn't mention anything about having the same limitations.

@aditya-167
Copy link
Author

@pfultz2 as rightly pointed by @CharlieL7, the doc OneDNN spec: https://oneapi-src.github.io/oneDNN/dev_guide_pooling.html#doxid-dev-guide-pooling-1dg-pool-impl-limits doesnt explicitly mention about length size and stride for pooling, earlier I was referring to conv.. however in the comments @umangyadav mentioned about two cases on padding, when length = {3} fails. Now my question is do I have to test each configuration for pooling, and check when it first fails, and put on those condition to fallback on cpu? I am thinking of using binary search for a range of lengths, where first it starts to fail and put the condition for refernce fallback cpu.

@migraphx-bot
Copy link
Collaborator

Test Batch Rate new
b9fe91
Rate old
0d07c2
Diff Compare
torchvision-resnet50 64 3,263.47 3,260.52 0.09%
torchvision-resnet50_fp16 64 6,994.34 6,982.39 0.17%
torchvision-densenet121 32 2,432.66 2,430.89 0.07%
torchvision-densenet121_fp16 32 4,091.62 4,067.06 0.60%
torchvision-inceptionv3 32 1,638.11 1,636.77 0.08%
torchvision-inceptionv3_fp16 32 2,764.51 2,755.18 0.34%
cadene-inceptionv4 16 776.17 775.67 0.06%
cadene-resnext64x4 16 808.64 808.09 0.07%
slim-mobilenet 64 7,534.19 7,531.83 0.03%
slim-nasnetalarge 64 211.48 211.39 0.04%
slim-resnet50v2 64 3,501.30 3,495.72 0.16%
bert-mrpc-onnx 8 1,151.19 1,140.79 0.91%
bert-mrpc-tf 1 494.24 447.09 10.55% 🔆
pytorch-examples-wlang-gru 1 423.82 357.24 18.64% 🔆
pytorch-examples-wlang-lstm 1 377.64 320.71 17.75% 🔆
torchvision-resnet50_1 1 788.81 776.99 1.52%
cadene-dpn92_1 1 401.12 430.45 -6.81% 🔴
cadene-resnext101_1 1 380.29 378.42 0.49%
onnx-taau-downsample 1 342.96 341.70 0.37%
dlrm-criteoterabyte 1 33.34 32.05 4.02% 🔆
dlrm-criteoterabyte_fp16 1 52.75 50.61 4.21% 🔆
agentmodel 1 8,426.09 6,696.76 25.82% 🔆
unet_fp16 2 58.86 58.59 0.47%
resnet50v1_fp16 1 922.79 840.65 9.77% 🔆
resnet50v1_int8 1 983.99 931.26 5.66% 🔆
bert_base_cased_fp16 64 1,171.06 1,170.54 0.05%
bert_large_uncased_fp16 32 363.59 363.52 0.02%
bert_large_fp16 1 200.40 196.02 2.24%
distilgpt2_fp16 16 2,203.86 2,199.51 0.20%
yolov5s 1 541.58 536.04 1.03%
tinyllama 1 43.44 43.33 0.27%
vicuna-fastchat 1 170.68 171.81 -0.65%
whisper-tiny-encoder 1 418.42 416.70 0.41%
whisper-tiny-decoder 1 428.82 418.85 2.38%

This build is not recommended to merge 🔴

@migraphx-bot
Copy link
Collaborator


     ✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

     ✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

     ✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

     ✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

     ✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

     ✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

     ✅ agentmodel: PASSED: MIGraphX meets tolerance

     ✅ unet: PASSED: MIGraphX meets tolerance

     ✅ resnet50v1: PASSED: MIGraphX meets tolerance

     ✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output


     ✅ bert_large: PASSED: MIGraphX meets tolerance

     ✅ yolov5s: PASSED: MIGraphX meets tolerance

     ✅ tinyllama: PASSED: MIGraphX meets tolerance

     ✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

     ✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

     ✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants