Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468

aditya-167 · 2024-09-22T22:28:36Z

I am working on : #3177 (this is my first PR) as suggested by @umangyadav that you would help and review my changes.

Problem Description

OneDNN has a known limitation when handling specific pooling configurations related to padding, stride, and kernel
size.

Solution Overview

OneDNN fails for certain combinations such as padding = {2}, stride = {1}, and lengths = {2}. This configuration leads to a failure in the pooling operation.
OneDNN has a kernel size limitation: The referenced OneDNN documentation specifies a maximum dimension size of 14 for the pooling kernel (referred to as the "weights tensor"). This limitation is not currently enforced by MIGraphX, which can lead to failures during execution.
The goal of this PR is to address these limitations by detecting such problematic pooling configurations and falling back to the reference backend (a CPU-based implementation) when OneDNN is unable to execute the pooling operation.

Condition Checking:

The pooling operator is analyzed to check if the pooling configuration violates OneDNN's known limitations:
Check if the kernel size (lengths) exceeds the maximum size of 14.
Check for combinations of padding, stride, and kernel size that are known to fail with OneDNN.
If OneDNN cannot execute the pooling operation due to these limitations, the code falls back to the reference backend (a CPU-based pooling implementation).

Fallback Mechanism:

For invalid configurations, the pooling operation is replaced with a reference backend pooling operator (ref::pooling).
Valid configurations continue to use the OneDNN pooling operation (dnnl::pooling).

Code Changes:

https:/aditya-167/AMDMIGraphX/blob/054055f93d6e65aa8b3d7c46a701156b97d15b8d/src/targets/cpu/lowering.cpp#L432C5-L453C10

Please suggest if any any improvment needed (this is my first PR) :-) , also let me know I how to test this as I was thinking of creating a test file for the same however I was wondering if there exists already one in the repo that I can add to ? but I couldnt find one for this operation.

src/targets/cpu/lowering.cpp

CharlieL7

Looks good overall. Add a verify test in the test/verify/ to make sure this works. Verify tests run the whole compilation process on all targets. The test should look something like this: test/verify/test_pooling_autopad.cpp`

test/CMakeLists.txt

test/onnx/.onnxrt-commit

aditya-167 · 2024-09-23T20:17:27Z

Looks good overall. Add a verify test in the test/verify/ to make sure this works. Verify tests run the whole compilation process on all targets. The test should look something like this: test/verify/test_pooling_autopad.cpp`

okay, I have added a test: test_pooling_fallback.cpp that checks for 3 cases, one for valid other 2 for kernel size >14 & invalid pooling that should trigger fallback

CharlieL7 · 2024-09-24T15:48:46Z

Failing to compile, needs fix. Check CI runs for the errors.

aditya-167 · 2024-09-24T21:43:17Z

Failing to compile, needs fix. Check CI runs for the errors.

I have updated the tests, seems like I was handling it in not the the right way. compiles in my local repo..

codecov · 2024-09-25T15:39:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.05%. Comparing base (2c2085d) to head (4405589).
Report is 3 commits behind head on develop.

Additional details and impacted files

@@            Coverage Diff            @@
##           develop    #3468    +/-   ##
=========================================
  Coverage    92.04%   92.05%            
=========================================
  Files          506      508     +2     
  Lines        20872    21174   +302     
=========================================
+ Hits         19212    19491   +279     
- Misses        1660     1683    +23

Flag	Coverage Δ
	`92.05% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

CharlieL7 · 2024-09-27T15:00:07Z

I'm going to help this PR along and make some changes to pass the rest of the CI.

aditya-167 · 2024-09-27T15:32:03Z

I'm going to help this PR along and make some changes to pass the rest of the CI.

thanks @CharlieL7 .. I reviewed the CI.. it fails at cppcheck and format.. not sure how to resolve this, it compiles and tests completely in my local..

CharlieL7 · 2024-09-27T15:34:38Z

src/targets/cpu/lowering.cpp

+ if(std::any_of(lengths.begin(), lengths.end(), [](int len) { return len > 14; }) ||
+ std::any_of(padding.begin(), padding.end(), [](int pad) { return pad > 14; }))


The padding condition is not tested and I'm pretty sure is not the correct limitation for OneDNN. I'm also not seeing the 14 kernel length limitation in the OneDNN spec: https://oneapi-src.github.io/oneDNN/dev_guide_pooling.html#doxid-dev-guide-pooling-1dg-pool-impl-limits.

CharlieL7

It's not clear to me what the actual limitation for this kernel is from OneDNN. Umang did write a TODO comment, but the link provided does not mention such limitations. Need to find a source that lays out the limitation or other proof that OneDNN cannot handle these cases.

aditya-167 · 2024-09-27T15:58:32Z

It's not clear to me what the actual limitation for this kernel is from OneDNN. Umang did write a TODO comment, but the link provided does not mention such limitations. Need to find a source that lays out the limitation or other proof that OneDNN cannot handle these cases.

Earlier I was referring to https://oneapi-src.github.io/oneDNN/dev_guide_convolution.html#doxid-dev-guide-convolution if you take a look at oneDNN algorithm section, there you may see "Weights tensor width does not exceed 14." I think I mixed up few things here.. I need more digging into this and test few cases where it fails..

CharlieL7 · 2024-09-27T16:05:58Z

Right, I see the limit for the direct algorithm for convolution; but this is for pooling. Technically pooling can be implemented as a convolution, but the pooling page doesn't mention anything about having the same limitations.

aditya-167 · 2024-10-09T03:09:18Z

@pfultz2 as rightly pointed by @CharlieL7, the doc OneDNN spec: https://oneapi-src.github.io/oneDNN/dev_guide_pooling.html#doxid-dev-guide-pooling-1dg-pool-impl-limits doesnt explicitly mention about length size and stride for pooling, earlier I was referring to conv.. however in the comments @umangyadav mentioned about two cases on padding, when length = {3} fails. Now my question is do I have to test each configuration for pooling, and check when it first fails, and put on those condition to fallback on cpu? I am thinking of using binary search for a range of lengths, where first it starts to fail and put the condition for refernce fallback cpu.

migraphx-bot · 2024-10-09T14:08:13Z

Test	Batch	Rate new b9fe91	Rate old 0d07c2	Diff	Compare
torchvision-resnet50	64	3,263.47	3,260.52	0.09%	✅
torchvision-resnet50_fp16	64	6,994.34	6,982.39	0.17%	✅
torchvision-densenet121	32	2,432.66	2,430.89	0.07%	✅
torchvision-densenet121_fp16	32	4,091.62	4,067.06	0.60%	✅
torchvision-inceptionv3	32	1,638.11	1,636.77	0.08%	✅
torchvision-inceptionv3_fp16	32	2,764.51	2,755.18	0.34%	✅
cadene-inceptionv4	16	776.17	775.67	0.06%	✅
cadene-resnext64x4	16	808.64	808.09	0.07%	✅
slim-mobilenet	64	7,534.19	7,531.83	0.03%	✅
slim-nasnetalarge	64	211.48	211.39	0.04%	✅
slim-resnet50v2	64	3,501.30	3,495.72	0.16%	✅
bert-mrpc-onnx	8	1,151.19	1,140.79	0.91%	✅
bert-mrpc-tf	1	494.24	447.09	10.55%	🔆
pytorch-examples-wlang-gru	1	423.82	357.24	18.64%	🔆
pytorch-examples-wlang-lstm	1	377.64	320.71	17.75%	🔆
torchvision-resnet50_1	1	788.81	776.99	1.52%	✅
cadene-dpn92_1	1	401.12	430.45	-6.81%	🔴
cadene-resnext101_1	1	380.29	378.42	0.49%	✅
onnx-taau-downsample	1	342.96	341.70	0.37%	✅
dlrm-criteoterabyte	1	33.34	32.05	4.02%	🔆
dlrm-criteoterabyte_fp16	1	52.75	50.61	4.21%	🔆
agentmodel	1	8,426.09	6,696.76	25.82%	🔆
unet_fp16	2	58.86	58.59	0.47%	✅
resnet50v1_fp16	1	922.79	840.65	9.77%	🔆
resnet50v1_int8	1	983.99	931.26	5.66%	🔆
bert_base_cased_fp16	64	1,171.06	1,170.54	0.05%	✅
bert_large_uncased_fp16	32	363.59	363.52	0.02%	✅
bert_large_fp16	1	200.40	196.02	2.24%	✅
distilgpt2_fp16	16	2,203.86	2,199.51	0.20%	✅
yolov5s	1	541.58	536.04	1.03%	✅
tinyllama	1	43.44	43.33	0.27%	✅
vicuna-fastchat	1	170.68	171.81	-0.65%	✅
whisper-tiny-encoder	1	418.42	416.70	0.41%	✅
whisper-tiny-decoder	1	428.82	418.85	2.38%	✅

This build is not recommended to merge 🔴

migraphx-bot · 2024-10-09T14:08:15Z

✅ bert-mrpc-onnx: PASSED: MIGraphX meets tolerance

✅ bert-mrpc-tf: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-gru: PASSED: MIGraphX meets tolerance

✅ pytorch-examples-wlang-lstm: PASSED: MIGraphX meets tolerance

✅ torchvision-resnet50_1: PASSED: MIGraphX meets tolerance

✅ cadene-dpn92_1: PASSED: MIGraphX meets tolerance

✅ cadene-resnext101_1: PASSED: MIGraphX meets tolerance

✅ dlrm-criteoterabyte: PASSED: MIGraphX meets tolerance

✅ agentmodel: PASSED: MIGraphX meets tolerance

✅ unet: PASSED: MIGraphX meets tolerance

✅ resnet50v1: PASSED: MIGraphX meets tolerance

✅ bert_base_cased_fp16: PASSED: MIGraphX meets tolerance

🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output

✅ bert_large: PASSED: MIGraphX meets tolerance

✅ yolov5s: PASSED: MIGraphX meets tolerance

✅ tinyllama: PASSED: MIGraphX meets tolerance

✅ vicuna-fastchat: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-encoder: PASSED: MIGraphX meets tolerance

✅ whisper-tiny-decoder: PASSED: MIGraphX meets tolerance

✅ distilgpt2_fp16: PASSED: MIGraphX meets tolerance

aditya-167 and others added 3 commits September 22, 2024 21:30

Fix OneDNN pooling limitation by falling back to reference backend

304c9f8

Merge branch 'ROCm:develop' into develop

054055f

Fix OneDNN pooling limitation by falling back to reference backend

8cf7ddf

aditya-167 requested a review from causten as a code owner September 22, 2024 22:28

CharlieL7 requested review from pfultz2 and CharlieL7 September 23, 2024 15:22

pfultz2 reviewed Sep 23, 2024

View reviewed changes

src/targets/cpu/lowering.cpp Outdated Show resolved Hide resolved

aditya-167 and others added 3 commits September 23, 2024 13:18

Merge branch 'ROCm:develop' into develop

d65fe96

update fallback return ins

09c6eac

return fallback backend to reference pooling ins

6e4cd4f

CharlieL7 requested changes Sep 23, 2024

View reviewed changes

test/CMakeLists.txt Outdated Show resolved Hide resolved

test/onnx/.onnxrt-commit Outdated Show resolved Hide resolved

aditya-167 added 2 commits September 23, 2024 18:03

remove whitespace

9114942

verify tests pooling_cpu_fallback

eff80ca

aditya-167 requested a review from CharlieL7 September 24, 2024 03:40

aditya-167 and others added 2 commits September 24, 2024 18:05

update test cases pooling cpu fallback

fbee05b

Merge branch 'develop' into develop

9eacd82

aditya-167 and others added 7 commits September 25, 2024 12:00

update test cases with proper input params for pooling

a124c40

Merge branch 'ROCm:develop' into develop

7fddb60

update test cases with proper input params for pooling

d8e3c2e

update test cases with proper input params for pooling

ec8ec89

Merge branch 'ROCm:develop' into develop

397b294

update test cases with proper input params for pooling

6817c12

update test cases with proper input params for pooling

cf38743

CharlieL7 added 2 commits September 27, 2024 15:12

tidy fix

c3486d1

formatting fix

6216a40

revert onnxrt test commit

4405589

CharlieL7 reviewed Sep 27, 2024

View reviewed changes

CharlieL7 requested changes Sep 27, 2024

View reviewed changes

aditya-167 added 3 commits September 27, 2024 14:11

Merge branch 'ROCm:develop' into develop

1ab830b

Merge branch 'ROCm:develop' into develop

a934d59

Merge branch 'ROCm:develop' into develop

e5d3201

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468

Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468

aditya-167 commented Sep 22, 2024

CharlieL7 left a comment

aditya-167 commented Sep 23, 2024

CharlieL7 commented Sep 24, 2024 •

edited

Loading

aditya-167 commented Sep 24, 2024

codecov bot commented Sep 25, 2024 •

edited

Loading

CharlieL7 commented Sep 27, 2024

aditya-167 commented Sep 27, 2024

CharlieL7 Sep 27, 2024 •

edited

Loading

CharlieL7 left a comment

aditya-167 commented Sep 27, 2024

CharlieL7 commented Sep 27, 2024

aditya-167 commented Oct 9, 2024

migraphx-bot commented Oct 9, 2024

migraphx-bot commented Oct 9, 2024

		if(std::any_of(lengths.begin(), lengths.end(), [](int len) { return len > 14; }) \|\|
		std::any_of(padding.begin(), padding.end(), [](int pad) { return pad > 14; }))

Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468

Are you sure you want to change the base?

Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468

Conversation

aditya-167 commented Sep 22, 2024

Problem Description

Solution Overview

Condition Checking:

Fallback Mechanism:

Code Changes:

CharlieL7 left a comment

Choose a reason for hiding this comment

aditya-167 commented Sep 23, 2024

CharlieL7 commented Sep 24, 2024 • edited Loading

aditya-167 commented Sep 24, 2024

codecov bot commented Sep 25, 2024 • edited Loading

Codecov Report

CharlieL7 commented Sep 27, 2024

aditya-167 commented Sep 27, 2024

CharlieL7 Sep 27, 2024 • edited Loading

Choose a reason for hiding this comment

CharlieL7 left a comment

Choose a reason for hiding this comment

aditya-167 commented Sep 27, 2024

CharlieL7 commented Sep 27, 2024

aditya-167 commented Oct 9, 2024

migraphx-bot commented Oct 9, 2024

migraphx-bot commented Oct 9, 2024

CharlieL7 commented Sep 24, 2024 •

edited

Loading

codecov bot commented Sep 25, 2024 •

edited

Loading

CharlieL7 Sep 27, 2024 •

edited

Loading