reliability diagrams #4

kirk86 · 2019-09-04T21:32:36Z

Hi Wesley (@wjmaddox)
I was wondering if you could shed some light on the calibration plots.
So I'm running the save calibration plots script giving as input the predictions and targets of vgg16 trained on cifar10(5+5) but when I plot the output I get something not even close to the plots in the paper. From my reading it seems that both sgd and swag are under confident. Am I doing something wrong?

wjmaddox · 2019-09-05T11:17:50Z

Hi Kirk,

You're reading the plot incorrectly - beneath the blue line shows that both SGD and SWAG are overconfident in that situation (confidence > accuracy). With that being said, I'm not sure if we ever checked calibration of the CIFAR5+5 task - will get back to you on that.

kirk86 · 2019-09-05T11:40:09Z

Thank you, Wesley!
Appreciate the prompt reply and clarification.

wjmaddox · 2019-09-05T20:27:46Z

Just following up, I checked and we never seem to have run calibration on CIFAR 5+5, but it's not terribly surprising that both SGD and SWAG (somewhat less so) are overconfident here as well.

kirk86 · 2019-09-05T22:01:38Z

Hi Wesley,
thanks a lot for the follow up.
May I ask an additional question if you could clarify that for me please?
Why is the split on cifar10 (5+5) deterministic, (i.e. predefined as 0<---first half of the classes and 1<---the remaining, where 0 = [0, 1, 2, 8, 9] and 1 = [3, 2, 4, 8, 1] <--- labels)
Have you noticed that if you train on 1 yields better results than on 0 for out of distribution?

wjmaddox · 2019-09-06T14:02:17Z

I believe we sampled those randomly at one point, so it's a holdover from that.

No, I haven't noticed that.

kirk86 · 2019-09-06T14:24:51Z

Thank you, Wesley!

Here's an example of the difference between sgd vs swag if you train on 1 vs 0. Basically swag seems to perform worse than sgd when trained on 0. Left plots are trained on 1 and right ones on 0.

wjmaddox · 2019-09-06T15:01:26Z

Hmm.. I'll have to look into it. Maybe @izmailovpavel can be of help?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reliability diagrams #4

reliability diagrams #4

kirk86 commented Sep 4, 2019

wjmaddox commented Sep 5, 2019

kirk86 commented Sep 5, 2019

wjmaddox commented Sep 5, 2019

kirk86 commented Sep 5, 2019

wjmaddox commented Sep 6, 2019

kirk86 commented Sep 6, 2019 •

edited

Loading

wjmaddox commented Sep 6, 2019

reliability diagrams #4

reliability diagrams #4

Comments

kirk86 commented Sep 4, 2019

wjmaddox commented Sep 5, 2019

kirk86 commented Sep 5, 2019

wjmaddox commented Sep 5, 2019

kirk86 commented Sep 5, 2019

wjmaddox commented Sep 6, 2019

kirk86 commented Sep 6, 2019 • edited Loading

wjmaddox commented Sep 6, 2019

kirk86 commented Sep 6, 2019 •

edited

Loading