replace `RandomState`'s per-instance seed with a per-thread one #27246

apasel422 · 2015-07-23T20:32:01Z

closes #27243

apasel422 · 2015-07-23T20:32:08Z

r? @gankro

rust-highfive · 2015-07-23T20:32:14Z

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

alexcrichton · 2015-07-23T20:38:41Z

I'd like to personally see some more motivation for this change. For example I know of no technical drawbacks to what we're doing right now (e.g. are there benchmarks showing this as much faster?). It also gives us a pretty nice guarantee that you literally cannot rely on the ordering of elements in hash maps. Subtle bugs which may arise from that sort of dependence are wiped out and adding this may enable those to leak in.

I agree that this isn't cryptographically needed or anything, but it's objectively less code to keep doing what we're doing right now and I don't think it's slower, so I'm not sure there's much of a reason to change.

apasel422 · 2015-07-23T20:40:16Z

@alexcrichton I totally agree. I'll leave it to @gankro to justify it.

Gankra · 2015-07-23T21:05:10Z

I was assuming an implementation would evaluate the impact (particularly of global vs thread-local), honestly.

apasel422 · 2015-07-23T21:11:16Z

Benches are incoming.

apasel422 · 2015-07-23T21:21:36Z

Actually, it would be good if someone can help devise a non-naive benchmark for this. Both the per-thread and per-process versions will have some overhead that may only be visible with multiple contending threads.

Gankra · 2015-07-23T21:32:59Z

Agreed. Perhaps @frankmcsherry has thoughts.

apasel422 · 2015-07-23T21:36:41Z

I'll be doing some out-of-tree testing at https:/apasel422/random_state.

Gankra · 2015-07-23T21:38:52Z

@apasel422 In the issue thread, the idea of sharding work to a bunch of threads with hashmaps and then merging was definitely an interesting one.

frankmcsherry · 2015-07-23T21:48:08Z

I'd be curious to see the benches for creating and increasingly using HashMaps, with each implementation, to see how light a touch you need to have with the HashMap to notice the RNG overhead. E.g. you may only notice on fewer than 4 elements added, or you may see 2x overhead up to 100 elements. This would be a good sanity check to understand how important it is from a performance point of view (and, correct me if I'm wrong, that is really the main reason rather than design or anything?).

Gankra · 2015-07-23T21:51:18Z

Yes this would only be a performance thing.

apasel422 · 2015-07-23T21:53:20Z

These are the results of the naive, single-threaded benchmarks in my repo that only create the random state but don't do anything else:

test bench::global       ... bench:           0 ns/iter (+/- 0)
test bench::per_instance ... bench:          67 ns/iter (+/- 2)
test bench::per_thread   ... bench:           2 ns/iter (+/- 0)

Gankra · 2015-07-23T21:55:11Z

Note that 0 or 2 ns generally is a sign that LLVM no-op'd your benchmark. You may need to toss some black_box's in there.

apasel422 · 2015-07-23T21:56:08Z

Bencher::iter already uses the blackbox stuff: https:/rust-lang/rust/blob/master/src/libtest/lib.rs#L1108

alexcrichton · 2015-07-23T23:27:50Z

Yeah I've verified and none of those benchmarks are actually optimized to noops, but I also question how useful those benchmarks are. Most applications do not spend 90% of their time creating hash maps, but rather using the hash maps. In that sense although having global or thread-local state may be faster in the micro-benchmark sense I'd be super surprised if this ever actually showed up in a benchmark.

apasel422 · 2015-07-23T23:30:52Z

@alexcrichton Right. I'm thinking this change isn't worth it. Someone else can continue investigating this if they think it's worthwhile, but feel free to close this PR.

alexcrichton · 2015-07-23T23:31:43Z

@gankro what do you think?

Gankra · 2015-07-23T23:38:36Z

The std collections library goes through a fair bit of effort to ensure no-op collections are as minimal as possible, and I think that's a good philosophy to have. One option would be to just defer hasher initialization to allocation time (perhaps with mem::uninitialized? -- should be safe since we don't query the hasher when we're unallocated).

Gankra · 2015-07-23T23:41:22Z

We could also defer to the general wisdom that the compiler itself is basically a glorified hashmap benchmark and see if it has impact there (although does it make a lot of hashmaps...?).

Gankra · 2015-07-23T23:42:26Z

Oh no wait that's stupid it doesn't use this HashState.

alexcrichton · 2015-07-24T01:01:36Z

The std collections library goes through a fair bit of effort to ensure no-op collections are as minimal as possible, and I think that's a good philosophy to have.

Isn't this primarily to avoid unnecessary allocations, though? I wasn't under the impression that this was done for performance reasons.

One option would be to just defer hasher initialization to allocation time (perhaps with mem::uninitialized? -- should be safe since we don't query the hasher when we're unallocated).

I think this could work although we'd want to be somewhat careful here. Technically the keys only need to be initialized on the first insertion where we can modify the keys with &mut self, if we instead tried to modify it on first use then a find could require mutation, forcing our hand at using something like Cell, Mutex, or AtomicUsize

Gankra · 2015-07-24T01:20:27Z

I assumed we only cared about unnecessary allocations for performance reasons. I suppose you could waste a fair amount of memory if you had a collection of empty collections or something...?

Anyway the searching procedure already immediately branches for zero capacity, but after hashing. We could hoist the logic up to before the hashing, but that would in principle require checking twice or making search_hashed unsafe. Alternatively we could require hashers to provide a nullary state (e.g. all 0's) for pre-initialization.

alexcrichton · 2015-07-24T04:54:01Z

I vaguely remember there being some analysis that some absurdly large percentage of collections created never have any elements, hence the no-allocation-on-new, but I may also be misremembering.

apasel422 · 2015-07-29T18:08:00Z

Closing this until we have better motivation.

replace RandomState's per-instance seed with a per-thread one

9923c57

closes #27243

rust-highfive assigned huonw Jul 23, 2015

rust-highfive assigned Gankra and unassigned huonw Jul 23, 2015

apasel422 closed this Jul 29, 2015

apasel422 deleted the issue-27243 branch August 28, 2015 23:32

cristicbz mentioned this pull request Oct 9, 2015

Make RandomState a lazy singleton. Fixes #27243 #28916

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace `RandomState`'s per-instance seed with a per-thread one #27246

replace `RandomState`'s per-instance seed with a per-thread one #27246

apasel422 commented Jul 23, 2015

apasel422 commented Jul 23, 2015

rust-highfive commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

frankmcsherry commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

apasel422 commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

Gankra commented Jul 23, 2015

Gankra commented Jul 23, 2015

Gankra commented Jul 23, 2015

alexcrichton commented Jul 24, 2015

Gankra commented Jul 24, 2015

alexcrichton commented Jul 24, 2015

apasel422 commented Jul 29, 2015

replace RandomState's per-instance seed with a per-thread one #27246

replace RandomState's per-instance seed with a per-thread one #27246

Conversation

apasel422 commented Jul 23, 2015

apasel422 commented Jul 23, 2015

rust-highfive commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

frankmcsherry commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

apasel422 commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

Gankra commented Jul 23, 2015

Gankra commented Jul 23, 2015

Gankra commented Jul 23, 2015

alexcrichton commented Jul 24, 2015

Gankra commented Jul 24, 2015

alexcrichton commented Jul 24, 2015

apasel422 commented Jul 29, 2015

replace `RandomState`'s per-instance seed with a per-thread one #27246

replace `RandomState`'s per-instance seed with a per-thread one #27246