Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::rand: substitute an unbiased Exp1 temporarily. #10099

Closed
wants to merge 1 commit into from

Conversation

huonw
Copy link
Member

@huonw huonw commented Oct 27, 2013

The Ziggurat implementation of Exp1 is incorrect: it appears to have an
expected value of around 0.994 rather than 1 and the density of a large
number of sample means does not match the normal distribution expected
by the central limit theorem (specifically, it's a normal distribution
centred at 0.994).

This replaces Ziggurat with a "plain" inverse-cdf implementation for
now. This implementation has the expected properties, although it is
approximately 4 times slower (56 seconds rather than 16 to generate
1 billion samples).

cc #10084 (this does not fix it).

Re-plot of the density estimate of 10000 sample means where each mean is over 100000 samples:

cdf-exp-density

Clearly it matches far closer than the old implementation (cf. the plot in #10084, the scales are approximately the same).

Note: I'm going to investigate & fix the problem with the Ziggurat implementation, this is just temporary so that the implementation is correct.

The Ziggurat implementation of Exp1 is incorrect: it appears to have an
expected value of around 0.994 rather than 1 and the density of a large
number of sample means does not match the normal distribution expected
by the central limit theorem (specifically, it's a normal distribution
centred at 0.994).

This replaces Ziggurat with a "plain" inverse-cdf implementation for
now. This implementation has the expected properties, although it is
approximately 4 times slower.

cc rust-lang#10084 (this does *not* fix it).
@alexcrichton
Copy link
Member

So if I understand this correctly, using the ziggurat tables leads to the wrong distribution, but this slower implementation of calculating the distribution is more correct, but less fast. The idea though is to in the future use the ziggurat tables for speed, but you're investigating what's going wrong with the distribution that we're seeing.

If that's the case, I may be more inclined to hold off ignoring ziggurat for a bit. It seems kinda odd to keep around all those tables when they're possibly incorrect and nothing is using them. If the bug with the tables can't be found soon, I'd be more in favor of removing them entirely and then bringing them back when the bugs have been found.

It would also be nice to have unit tests for this kind of thing, but I suppose that it's difficult to test for seeing how you're taking a billion samples. Also, do you know what's up with that bump in the black line compared to the red line? Is that also something that we should be worried about? I found it odd that it was also a part of the previous graph (using the ziggurat tables)...

@huonw
Copy link
Member Author

huonw commented Oct 27, 2013

The idea though is to in the future use the ziggurat tables for speed, but you're investigating what's going wrong with the distribution that we're seeing.

Your understanding is correct. FWIW, I checked the Normal (also using Ziggurat) and plain f64 generators in a little more detail, and neither of these display a bias, so it does appears to be specific to the Exp tables. In theory I'll have a solid few hours I can devote to Rust on Wednesday, which will likely be fixing this properly, so feel free to r-, I can always reopen this if I cannot fix it. (The reason for this is just to make sure the implementation is correct, but if we're not too concerned about temporarily being incorrect that's ok.)

It seems kinda odd to keep around all those tables when they're possibly incorrect and nothing is using them

There's only 6 kB of tables (3 of them with 257 f64s each) in an auto-generated non-pub module, so having them hanging around (for now) isn't a huge burden on either code maintenance, public API or binary size.

Also, do you know what's up with that bump in the black line compared to the red line? Is that also something that we should be worried about?

Appears to just be an artefact of the randomness, running a few more samples all look generally similar to:

cdf-exp-density

It would also be nice to have unit tests for this kind of thing, but I suppose that it's difficult to test for seeing how you're taking a billion samples

It's not just the huge number of samples, since one can just run the tests with a smaller sample size: these are inherently random, so every reasonable test would have a non-zero chance of failure. Of course, if one is doing significance-level tests, one can choose the p-value such that the chance of failure is very small (when the implementation is correct), however this is at the cost of having less power to detect an incorrect implementation. Compensating for this by increasing the sample size makes the tests very slow.

I'm planning to add Gamma (I have an implementation that is essentially blocked on this) and derived distributions next, and to land that I'll write either a pile of probabilistic tests in a separate repo (on Rust CI, so that failures don't block other PRs) or as a run-by-hand file in src/test/run-pass.

@alexcrichton
Copy link
Member

Just wanted to say, excellent analysis of this problem! In the meantime though, let's leave this open for awhile and see if you managed to uncover the underlying problem (no rush at all of course). If it does take longer than expected, we can probably merge this.

@huonw
Copy link
Member Author

huonw commented Oct 31, 2013

I think I've tracked the actual bug in the Ziggurat implementation, will submit a PR doing that correction when I've checked it properly.

@huonw huonw closed this Oct 31, 2013
@huonw huonw deleted the paper-over-exp branch December 4, 2014 02:03
flip1995 pushed a commit to flip1995/rust that referenced this pull request Dec 29, 2022
Null fn lints

Adds lints to check for code, that assumes nullable `fn()`.

### Lint examples:

`transmute_null_to_fn`:
```rust
error: transmuting a known null pointer into a function pointer
  --> $DIR/transmute_null_to_fn.rs:9:23
   |
LL |         let _: fn() = std::mem::transmute(std::ptr::null::<()>());
   |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ this transmute results in undefined behavior
   |
   = help: try wrapping your function pointer type in `Option<T>` instead, and using `None` as a null pointer value
```

`fn_null_check`:
```rust
error: function pointer assumed to be nullable, even though it isn't
  --> $DIR/fn_null_check.rs:13:8
   |
LL |     if (fn_ptr as *mut ()).is_null() {}
   |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = help: try wrapping your function pointer type in `Option<T>` instead, and using `is_none` to check for null pointer value
```

Closes rust-lang#1644

---

changelog: Improvement: [`transmuting_null`]: Now detects `const` pointers to all types
[rust-lang#10099](rust-lang/rust-clippy#10099)
changelog: New lint: [`transmute_null_to_fn`]
[rust-lang#10099](rust-lang/rust-clippy#10099)
changelog: New lint: [`fn_null_check`]
[rust-lang#10099](rust-lang/rust-clippy#10099)
<!-- changelog_checked (This is just a flag for me, please don't add it manually) -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants