std::rand: substitute an unbiased Exp1 temporarily. #10099

huonw · 2013-10-27T10:33:57Z

The Ziggurat implementation of Exp1 is incorrect: it appears to have an
expected value of around 0.994 rather than 1 and the density of a large
number of sample means does not match the normal distribution expected
by the central limit theorem (specifically, it's a normal distribution
centred at 0.994).

This replaces Ziggurat with a "plain" inverse-cdf implementation for
now. This implementation has the expected properties, although it is
approximately 4 times slower (56 seconds rather than 16 to generate
1 billion samples).

cc #10084 (this does not fix it).

Re-plot of the density estimate of 10000 sample means where each mean is over 100000 samples:

Clearly it matches far closer than the old implementation (cf. the plot in #10084, the scales are approximately the same).

Note: I'm going to investigate & fix the problem with the Ziggurat implementation, this is just temporary so that the implementation is correct.

The Ziggurat implementation of Exp1 is incorrect: it appears to have an expected value of around 0.994 rather than 1 and the density of a large number of sample means does not match the normal distribution expected by the central limit theorem (specifically, it's a normal distribution centred at 0.994). This replaces Ziggurat with a "plain" inverse-cdf implementation for now. This implementation has the expected properties, although it is approximately 4 times slower. cc rust-lang#10084 (this does *not* fix it).

alexcrichton · 2013-10-27T18:19:18Z

So if I understand this correctly, using the ziggurat tables leads to the wrong distribution, but this slower implementation of calculating the distribution is more correct, but less fast. The idea though is to in the future use the ziggurat tables for speed, but you're investigating what's going wrong with the distribution that we're seeing.

If that's the case, I may be more inclined to hold off ignoring ziggurat for a bit. It seems kinda odd to keep around all those tables when they're possibly incorrect and nothing is using them. If the bug with the tables can't be found soon, I'd be more in favor of removing them entirely and then bringing them back when the bugs have been found.

It would also be nice to have unit tests for this kind of thing, but I suppose that it's difficult to test for seeing how you're taking a billion samples. Also, do you know what's up with that bump in the black line compared to the red line? Is that also something that we should be worried about? I found it odd that it was also a part of the previous graph (using the ziggurat tables)...

huonw · 2013-10-27T22:05:12Z

The idea though is to in the future use the ziggurat tables for speed, but you're investigating what's going wrong with the distribution that we're seeing.

Your understanding is correct. FWIW, I checked the Normal (also using Ziggurat) and plain f64 generators in a little more detail, and neither of these display a bias, so it does appears to be specific to the Exp tables. In theory I'll have a solid few hours I can devote to Rust on Wednesday, which will likely be fixing this properly, so feel free to r-, I can always reopen this if I cannot fix it. (The reason for this is just to make sure the implementation is correct, but if we're not too concerned about temporarily being incorrect that's ok.)

It seems kinda odd to keep around all those tables when they're possibly incorrect and nothing is using them

There's only 6 kB of tables (3 of them with 257 f64s each) in an auto-generated non-pub module, so having them hanging around (for now) isn't a huge burden on either code maintenance, public API or binary size.

Also, do you know what's up with that bump in the black line compared to the red line? Is that also something that we should be worried about?

Appears to just be an artefact of the randomness, running a few more samples all look generally similar to:

It would also be nice to have unit tests for this kind of thing, but I suppose that it's difficult to test for seeing how you're taking a billion samples

It's not just the huge number of samples, since one can just run the tests with a smaller sample size: these are inherently random, so every reasonable test would have a non-zero chance of failure. Of course, if one is doing significance-level tests, one can choose the p-value such that the chance of failure is very small (when the implementation is correct), however this is at the cost of having less power to detect an incorrect implementation. Compensating for this by increasing the sample size makes the tests very slow.

I'm planning to add Gamma (I have an implementation that is essentially blocked on this) and derived distributions next, and to land that I'll write either a pile of probabilistic tests in a separate repo (on Rust CI, so that failures don't block other PRs) or as a run-by-hand file in src/test/run-pass.

alexcrichton · 2013-10-28T20:20:33Z

Just wanted to say, excellent analysis of this problem! In the meantime though, let's leave this open for awhile and see if you managed to uncover the underlying problem (no rush at all of course). If it does take longer than expected, we can probably merge this.

huonw · 2013-10-31T00:52:10Z

I think I've tracked the actual bug in the Ziggurat implementation, will submit a PR doing that correction when I've checked it properly.

Null fn lints Adds lints to check for code, that assumes nullable `fn()`. ### Lint examples: `transmute_null_to_fn`: ```rust error: transmuting a known null pointer into a function pointer --> $DIR/transmute_null_to_fn.rs:9:23 | LL | let _: fn() = std::mem::transmute(std::ptr::null::<()>()); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this transmute results in undefined behavior | = help: try wrapping your function pointer type in `Option<T>` instead, and using `None` as a null pointer value ``` `fn_null_check`: ```rust error: function pointer assumed to be nullable, even though it isn't --> $DIR/fn_null_check.rs:13:8 | LL | if (fn_ptr as *mut ()).is_null() {} | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | = help: try wrapping your function pointer type in `Option<T>` instead, and using `is_none` to check for null pointer value ``` Closes rust-lang#1644 --- changelog: Improvement: [`transmuting_null`]: Now detects `const` pointers to all types [rust-lang#10099](rust-lang/rust-clippy#10099) changelog: New lint: [`transmute_null_to_fn`] [rust-lang#10099](rust-lang/rust-clippy#10099) changelog: New lint: [`fn_null_check`] [rust-lang#10099](rust-lang/rust-clippy#10099)

huonw closed this Oct 31, 2013

huonw deleted the paper-over-exp branch December 4, 2014 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

std::rand: substitute an unbiased Exp1 temporarily. #10099

std::rand: substitute an unbiased Exp1 temporarily. #10099

huonw commented Oct 27, 2013

alexcrichton commented Oct 27, 2013

huonw commented Oct 27, 2013

alexcrichton commented Oct 28, 2013

huonw commented Oct 31, 2013

std::rand: substitute an unbiased Exp1 temporarily. #10099

std::rand: substitute an unbiased Exp1 temporarily. #10099

Conversation

huonw commented Oct 27, 2013

alexcrichton commented Oct 27, 2013

huonw commented Oct 27, 2013

alexcrichton commented Oct 28, 2013

huonw commented Oct 31, 2013