Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about performance of SSDLite with Core ML #1

Open
minhohihi opened this issue Dec 28, 2018 · 1 comment
Open

Question about performance of SSDLite with Core ML #1

minhohihi opened this issue Dec 28, 2018 · 1 comment

Comments

@minhohihi
Copy link

minhohihi commented Dec 28, 2018

First of all, always thank you for your blog.

I have a question about performance.

I have two versions of SSDLite (ssd_mobilenet_v1_coco) with same backbone
One version is implemented all components (feature extractor, bbox decoder, and NSM) separately. Feature extractor is implemented by CoreML and both bbox decoder and NSM is by bit of swift code running on CPU. Other version is combined every module in MLModel as you explained

I checked the performance of two versions on iPhoneX and got a result hard to understand.
Processing time of the first version is two times faster than the order one. Although the first one uses more CPU resource to calculate real bbox coordinates and NMS (uses 100 ~ 110% of CPU resource. the other consumes 45 ~ 50%) it's strange in common sense.

Could you give me any opinion or understanding about this status?

Thank you.

@minhohihi minhohihi changed the title Question about performance of SSDLite wit Core ML Question about performance of SSDLite with Core ML Dec 28, 2018
@hollance
Copy link
Owner

I didn't actually do any speed tests (hehe) but 2x speed difference is quite a bit! I wouldn't have expected that...

Doing these operations inside the model will run them on the GPU, or some on the GPU and some on the CPU. My guess is that doing this on the GPU might be slower than on the CPU. Because on the CPU it's just a single loop while on the GPU it's split up into separate layers that all do different calculations.

It might be useful to do a follow-up blog post that compares these three methods on different devices:

  1. Do the decoding and NMS on the CPU afterwards.
  2. Do everything inside the model.
  3. Do the decoding in a custom layer (so it can run on the GPU) and do NMS on the CPU afterwards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants