Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad results - Investigate reason #11

Open
PavlosMelissinos opened this issue May 22, 2017 · 22 comments
Open

Bad results - Investigate reason #11

PavlosMelissinos opened this issue May 22, 2017 · 22 comments
Labels

Comments

@PavlosMelissinos
Copy link
Owner

PavlosMelissinos commented May 22, 2017

Metric IoU area maxDets Result
Average Precision 0.50:0.95 all 100 0.001
Average Precision 0.50 all 100 0.004
Average Precision 0.75 all 100 0.000
Average Precision 0.50:0.95 small 100 0.000
Average Precision 0.50:0.95 medium 100 0.000
Average Precision 0.50:0.95 large 100 0.004
Average Recall 0.50:0.95 all 1 0.005
Average Recall 0.50:0.95 all 10 0.005
Average Recall 0.50:0.95 all 100 0.005
Average Recall 0.50:0.95 small 100 0.000
Average Recall 0.50:0.95 medium 100 0.001
Average Recall 0.50:0.95 large 100 0.019

This is using the official mscoco script.

Setup as: full image as input, each pixel gets classified using a one hot vector with a size of 81, 0 to 80 inclusive, that correspond to the actual category ids in MS-COCO. More specifically, index 0 is background, ..., index 12 corresponds to class id 13 (stop sign), ..., and index 80 is in fact class 90 (toothbrush). Output is the full image, not a crop. Then a script is used to separate the pixels of each detected object. No classes were used in the evalCOCO.py script (useCats = False).

These are really bad scores, and at the moment I have no idea why it's like that. I'll push the changes soon.

Which script do you use for evaluation @athundt ? If you have a working version maybe I should just replace mine with it. Does this work for mscoco?

@ahundt
Copy link

ahundt commented May 23, 2017

Sadly so far my mscoco results in Keras-FCN were no good. The loss function went negative with the full image segmentation and one_hot encoding, then I ran out of time to investigate the details.

@PavlosMelissinos
Copy link
Owner Author

That's too bad... FCNSS doesn't report performance on MS-COCO iirc but in the PSPNet paper, some IoU results are mentioned on page 8 (Table 6) and they seem to be pretty decent.

@ghost
Copy link

ghost commented Aug 1, 2017

Hi guys,
So, have you been able to reproduce the ENet results from the paper?
Best,
D.

@PavlosMelissinos
Copy link
Owner Author

PavlosMelissinos commented Aug 1, 2017

Not really, sorry. It does converge but not well enough.

In the paper, the encoder is pretrained on ImageNet and the full pipeline is then fine-tuned on Cityscapes, CamVid and Sun RGB-D. However, I haven't set them up yet so I've only trained the network on MS-COCO (which often gives awful results). I'd like to finish the project at some point but I've had to move on to other stuff so at the moment I don't have the resources to do it properly, unfortunately. :(

@ghost
Copy link

ghost commented Aug 1, 2017

No, worries I'll pick it up from here and see what's the problem.
Not sure will be allowed to share code though.

@ahundt
Copy link

ahundt commented Aug 1, 2017

There has been a bugfix in densenet that solved some problems so it might work better now!

https:/farizrahman4u/keras-contrib/blob/master/keras_contrib/applications/densenet.py

@jmtatsch
Copy link

jmtatsch commented Aug 2, 2017

@ahundt can you elaborate further how the densenet fix may be applicable to enet-keras? it seems as if the main gradient flow and the pooling indices are connected properly or am I missing something?

@ahundt
Copy link

ahundt commented Aug 4, 2017

@jmtatsch Sorry my post is totally irrelevant I must have mixed up tabs on my browser or something.

@ghost
Copy link

ghost commented Aug 9, 2017

Hi guys,
So it seems that I have probably managed to successfully retrain ENet on our own dataset by loading pretrained weights from torch and using adadelta (adagrad didn't work as well for me).
You can load the weights from the torch model with torchfile.
One more minor thing which shouldn't make much difference is that I added batch norm after the initial layer, following the paper, which I think was not there in the code.
Anyway, I need to investigate a bit more the per-class accuracy and will get back to you.
Best,
D.

@jmtatsch
Copy link

jmtatsch commented Aug 10, 2017

@dkorkino the PReLu also seems to be missing as compared to https:/e-lab/ENet-training/blob/master/train/models/encoder.lua#L86 Could you maybe publish the converted weights?

@PavlosMelissinos
Copy link
Owner Author

PavlosMelissinos commented Aug 21, 2017

You're both right, @ghost and @jmtatsch. I also noticed a division bug in MaxPoolingWithArgmax2D that resulted in unwanted behavior on python 3 and another in the data generator.
All three should be fixed now but let me know about any problems you might encounter.

Thanks a lot for the feedback 👍.

Sorry for taking this long to tackle the issue but I'd been on vacation until yesterday.

@ColdCodeCool
Copy link

@dkorkino @jmtatsch I am also looking forward to the release of the converted weights.

@PavlosMelissinos
Copy link
Owner Author

PavlosMelissinos commented Aug 23, 2017

Does anyone have any idea why it takes so long to train?

I'm getting something like 25K seconds per epoch on MS-COCO (~80K samples) on a K40 for input dimensions of 256x256.

That amounts to ~0.3s per sample, so let's say about 10 fps for just the forward pass. That's much slower than the reported performance (135.4 fps for 640x360 on a Titan X)

I used to think it might be due to preprocessing but it actually only takes a fraction of that time. Any thoughts?

@ahundt
Copy link

ahundt commented Aug 24, 2017

Keras spends a lot of time with an empty gpu. There are collectively quite a few reasons, some of which are discussed in keras-team/keras#6928. Putting things into a tfrecord, using #6928 and using the TF staging areas could help.

Alternately, there are some ways to do it with tensorflow proper, but there aren't great public examples aside from https://www.tensorflow.org/performance/performance_models, which is a bit convoluted.

@PavlosMelissinos
Copy link
Owner Author

PavlosMelissinos commented Aug 25, 2017

That's a bummer to the extent it's true, I'd rather it was 100% my own mistake. There's definitely room for improvement in my implementation (still waiting for training to finish but judging by the progression of the loss, I don't expect the results to be much better than the current ones), however speed is an issue that hinders prototyping and evaluation, especially when this network takes more than 10x as much time as it should to train, and I'm not sure what I could do to fix it.

I've monitored the utilization of the GPU and it's not that low though, maybe that's not always such a big deal?

I'll check out the available solutions when I find some time, thanks @ahundt .

@ahundt
Copy link

ahundt commented Aug 25, 2017

It will definitely vary a bunch by use case and your physical hardware. For example if you've got a titan x but no super fast SSD I don't think it will be feasible to train 135fps. Wouldn't that figure most likely be with 8x titan x devices?

@PavlosMelissinos
Copy link
Owner Author

PavlosMelissinos commented Aug 26, 2017 via email

@PavlosMelissinos
Copy link
Owner Author

PavlosMelissinos commented Sep 5, 2017

@jmtatsch @ColdCodeCool @ghost @ahundt
Good news, everyone! I've pushed a new commit that adds weight transfer capabilities. All you have to do is:

  1. download the trained model and put it in the models/pretrained directory within the enet-keras project.

  2. Run from_torch.py to retrieve the actual weights and put them in a pickle file.

  3. Train/finetune/predict as usual (the model will read the file if it exists, otherwise you'll get a ENet has found no compatible pretrained weights! Skipping weight transfer... message).

Any questions/comments/criticism are welcome as always :)

@PavlosMelissinos
Copy link
Owner Author

Haven't tried to train the network yet but I'll let you know how it goes when I do.

@ahundt
Copy link

ahundt commented Dec 19, 2017

@PavlosMelissinos Hey I was looking through your latest version, and perhaps I misunderstood what I read, but have you considered changing your loss function when training from scratch?

Something like these may be necessary for segmentation:
https:/theduynguyen/Keras-FCN/blob/master/loss_func.py

@PavlosMelissinos
Copy link
Owner Author

The main problem is that it doesn't work well enough even with the pretrained weights.

However, crossentropy without bg seems interesting and it might be what I need, thanks. I'll check it out!

@ahundt
Copy link

ahundt commented Dec 21, 2017

I added some segmentation metrics and losses:
keras-team/keras-contrib#197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants