-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for Lower unsupported pooling sizes for the CPU to Reference backend #3177 #3468
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Add a verify test in the test/verify/
to make sure this works. Verify tests run the whole compilation process on all targets. The test should look something like this: test/verify/test_pooling_autopad.cpp`
okay, I have added a test: test_pooling_fallback.cpp that checks for 3 cases, one for valid other 2 for kernel size >14 & invalid pooling that should trigger fallback |
Failing to compile, needs fix. Check CI runs for the errors. |
I have updated the tests, seems like I was handling it in not the the right way. compiles in my local repo.. |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #3468 +/- ##
=========================================
Coverage 92.04% 92.05%
=========================================
Files 506 508 +2
Lines 20872 21174 +302
=========================================
+ Hits 19212 19491 +279
- Misses 1660 1683 +23
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
I'm going to help this PR along and make some changes to pass the rest of the CI. |
thanks @CharlieL7 .. I reviewed the CI.. it fails at cppcheck and format.. not sure how to resolve this, it compiles and tests completely in my local.. |
src/targets/cpu/lowering.cpp
Outdated
if(std::any_of(lengths.begin(), lengths.end(), [](int len) { return len > 14; }) || | ||
std::any_of(padding.begin(), padding.end(), [](int pad) { return pad > 14; })) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The padding condition is not tested and I'm pretty sure is not the correct limitation for OneDNN. I'm also not seeing the 14 kernel length limitation in the OneDNN spec: https://oneapi-src.github.io/oneDNN/dev_guide_pooling.html#doxid-dev-guide-pooling-1dg-pool-impl-limits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me what the actual limitation for this kernel is from OneDNN. Umang did write a TODO comment, but the link provided does not mention such limitations. Need to find a source that lays out the limitation or other proof that OneDNN cannot handle these cases.
Earlier I was referring to https://oneapi-src.github.io/oneDNN/dev_guide_convolution.html#doxid-dev-guide-convolution if you take a look at oneDNN algorithm section, there you may see "Weights tensor width does not exceed 14." I think I mixed up few things here.. I need more digging into this and test few cases where it fails.. |
Right, I see the limit for the direct algorithm for convolution; but this is for pooling. Technically pooling can be implemented as a convolution, but the pooling page doesn't mention anything about having the same limitations. |
@pfultz2 as rightly pointed by @CharlieL7, the doc OneDNN spec: https://oneapi-src.github.io/oneDNN/dev_guide_pooling.html#doxid-dev-guide-pooling-1dg-pool-impl-limits doesnt explicitly mention about length size and stride for pooling, earlier I was referring to conv.. however in the comments @umangyadav mentioned about two cases on padding, when length = {3} fails. Now my question is do I have to test each configuration for pooling, and check when it first fails, and put on those condition to fallback on cpu? I am thinking of using binary search for a range of lengths, where first it starts to fail and put the condition for refernce fallback cpu. |
This build is not recommended to merge 🔴 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
Hi @pfultz2,@CharlieL7
I am working on : #3177 (this is my first PR) as suggested by @umangyadav that you would help and review my changes.
Problem Description
OneDNN has a known limitation when handling specific pooling configurations related to padding, stride, and kernel
size.
Solution Overview
OneDNN fails for certain combinations such as padding = {2}, stride = {1}, and lengths = {2}. This configuration leads to a failure in the pooling operation.
OneDNN has a kernel size limitation: The referenced OneDNN documentation specifies a maximum dimension size of 14 for the pooling kernel (referred to as the "weights tensor"). This limitation is not currently enforced by MIGraphX, which can lead to failures during execution.
The goal of this PR is to address these limitations by detecting such problematic pooling configurations and falling back to the reference backend (a CPU-based implementation) when OneDNN is unable to execute the pooling operation.
Condition Checking:
The pooling operator is analyzed to check if the pooling configuration violates OneDNN's known limitations:
Check if the kernel size (lengths) exceeds the maximum size of 14.
Check for combinations of padding, stride, and kernel size that are known to fail with OneDNN.
If OneDNN cannot execute the pooling operation due to these limitations, the code falls back to the reference backend (a CPU-based pooling implementation).
Fallback Mechanism:
For invalid configurations, the pooling operation is replaced with a reference backend pooling operator (ref::pooling).
Valid configurations continue to use the OneDNN pooling operation (dnnl::pooling).
Code Changes:
https:/aditya-167/AMDMIGraphX/blob/054055f93d6e65aa8b3d7c46a701156b97d15b8d/src/targets/cpu/lowering.cpp#L432C5-L453C10
Please suggest if any any improvment needed (this is my first PR) :-) , also let me know I how to test this as I was thinking of creating a test file for the same however I was wondering if there exists already one in the repo that I can add to ? but I couldnt find one for this operation.