Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential CNMEM_NOT_INITIALIZED errors #26

Merged
merged 1 commit into from
Aug 31, 2015

Conversation

slayton58
Copy link

dummy alloc / free in MemoryHandlerActivator to ensure that
the memory pool has been set up before any potential operations

@lukeyeager lukeyeager added the bug label Aug 28, 2015
@@ -229,6 +229,9 @@ class MemoryHandlerActivator {
using_pool_ = true;
MemoryHandler::usePool();
MemoryHandler::setGPUs(gpus);
void* temp;
MemoryHandler::mallocGPU(&temp, 4);
MemoryHandler::freeGPU(temp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably warrants an explanatory comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be wrapped in a #ifndef CPU_ONLY (or #ifdef USE_CNMEM) for the CPU-only build.

@lukeyeager
Copy link
Member

This fixed my HDF5Data layer runtime error - thanks!

But now the tests are failing hard (https://travis-ci.org/NVIDIA/caffe/builds/77675932), and they even fail for me when I build with cuDNN and CNMeM (with 6,000+ lines of stderr output!). Can you verify this? It's not clear to me why such a small change would affect the tests so dramatically.

@lukeyeager
Copy link
Member

It looks like the TravisCI builds are failing to build Caffe, whereas on my machine I'm failing to build the tests.

@slayton58
Copy link
Author

The tests are completely unchanged -- they don't use the memory pool at all.

Only time I've seen something similar it ended up being a bug in gcc -- moving to something newer (in my case from gcc 4.6 -> 4.9) fixed the issue for me

@lukeyeager
Copy link
Member

The tests are completely unchanged -- they don't use the memory pool at all.

Well that seems like a giant test hole then!

Looking into the gcc version now ...

@lukeyeager
Copy link
Member

I have gcc version 4.8.4 - I don't think it's a gcc problem. But it does seem to be a CMake problem (make works fine), and I'm seeing it with v0.13.1 as well. I'll investigate further and create a new issue.

For this pull reqeuest, can you add the #ifdef fix I suggested above so I can merge? Thanks!

dummy alloc / free in MemoryHandlerActivator to ensure that
the memory pool has been set up before any potential operations
@slayton58
Copy link
Author

Added the #ifndef guard and squashed commits

@lukeyeager
Copy link
Member

Ok great, that fixed the Travis tests. Thanks!

lukeyeager added a commit that referenced this pull request Aug 31, 2015
Fix potential CNMEM_NOT_INITIALIZED errors
@lukeyeager lukeyeager merged commit b0df955 into NVIDIA:caffe-0.13 Aug 31, 2015
@lukeyeager
Copy link
Member

Tagged as v0.13.2 and merged into master. In the future, let's create pull requests against master and I'll cherry-pick the fixes back into the version branches.

@slayton58
Copy link
Author

Ok, sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants