Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace RandomState's per-instance seed with a per-thread one #27246

Closed
wants to merge 1 commit into from
Closed

replace RandomState's per-instance seed with a per-thread one #27246

wants to merge 1 commit into from

Conversation

apasel422
Copy link
Contributor

closes #27243

@apasel422
Copy link
Contributor Author

r? @gankro

@rust-highfive rust-highfive assigned Gankra and unassigned huonw Jul 23, 2015
@rust-highfive
Copy link
Collaborator

r? @huonw

(rust_highfive has picked a reviewer for you, use r? to override)

@alexcrichton
Copy link
Member

I'd like to personally see some more motivation for this change. For example I know of no technical drawbacks to what we're doing right now (e.g. are there benchmarks showing this as much faster?). It also gives us a pretty nice guarantee that you literally cannot rely on the ordering of elements in hash maps. Subtle bugs which may arise from that sort of dependence are wiped out and adding this may enable those to leak in.

I agree that this isn't cryptographically needed or anything, but it's objectively less code to keep doing what we're doing right now and I don't think it's slower, so I'm not sure there's much of a reason to change.

@apasel422
Copy link
Contributor Author

@alexcrichton I totally agree. I'll leave it to @gankro to justify it.

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

I was assuming an implementation would evaluate the impact (particularly of global vs thread-local), honestly.

@apasel422
Copy link
Contributor Author

Benches are incoming.

@apasel422
Copy link
Contributor Author

Actually, it would be good if someone can help devise a non-naive benchmark for this. Both the per-thread and per-process versions will have some overhead that may only be visible with multiple contending threads.

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

Agreed. Perhaps @frankmcsherry has thoughts.

@apasel422
Copy link
Contributor Author

I'll be doing some out-of-tree testing at https:/apasel422/random_state.

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

@apasel422 In the issue thread, the idea of sharding work to a bunch of threads with hashmaps and then merging was definitely an interesting one.

@frankmcsherry
Copy link
Contributor

I'd be curious to see the benches for creating and increasingly using HashMaps, with each implementation, to see how light a touch you need to have with the HashMap to notice the RNG overhead. E.g. you may only notice on fewer than 4 elements added, or you may see 2x overhead up to 100 elements. This would be a good sanity check to understand how important it is from a performance point of view (and, correct me if I'm wrong, that is really the main reason rather than design or anything?).

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

Yes this would only be a performance thing.

@apasel422
Copy link
Contributor Author

These are the results of the naive, single-threaded benchmarks in my repo that only create the random state but don't do anything else:

test bench::global       ... bench:           0 ns/iter (+/- 0)
test bench::per_instance ... bench:          67 ns/iter (+/- 2)
test bench::per_thread   ... bench:           2 ns/iter (+/- 0)

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

Note that 0 or 2 ns generally is a sign that LLVM no-op'd your benchmark. You may need to toss some black_box's in there.

@apasel422
Copy link
Contributor Author

Bencher::iter already uses the blackbox stuff: https:/rust-lang/rust/blob/master/src/libtest/lib.rs#L1108

@alexcrichton
Copy link
Member

Yeah I've verified and none of those benchmarks are actually optimized to noops, but I also question how useful those benchmarks are. Most applications do not spend 90% of their time creating hash maps, but rather using the hash maps. In that sense although having global or thread-local state may be faster in the micro-benchmark sense I'd be super surprised if this ever actually showed up in a benchmark.

@apasel422
Copy link
Contributor Author

@alexcrichton Right. I'm thinking this change isn't worth it. Someone else can continue investigating this if they think it's worthwhile, but feel free to close this PR.

@alexcrichton
Copy link
Member

@gankro what do you think?

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

The std collections library goes through a fair bit of effort to ensure no-op collections are as minimal as possible, and I think that's a good philosophy to have. One option would be to just defer hasher initialization to allocation time (perhaps with mem::uninitialized? -- should be safe since we don't query the hasher when we're unallocated).

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

We could also defer to the general wisdom that the compiler itself is basically a glorified hashmap benchmark and see if it has impact there (although does it make a lot of hashmaps...?).

@Gankra
Copy link
Contributor

Gankra commented Jul 23, 2015

Oh no wait that's stupid it doesn't use this HashState.

@alexcrichton
Copy link
Member

The std collections library goes through a fair bit of effort to ensure no-op collections are as minimal as possible, and I think that's a good philosophy to have.

Isn't this primarily to avoid unnecessary allocations, though? I wasn't under the impression that this was done for performance reasons.

One option would be to just defer hasher initialization to allocation time (perhaps with mem::uninitialized? -- should be safe since we don't query the hasher when we're unallocated).

I think this could work although we'd want to be somewhat careful here. Technically the keys only need to be initialized on the first insertion where we can modify the keys with &mut self, if we instead tried to modify it on first use then a find could require mutation, forcing our hand at using something like Cell, Mutex, or AtomicUsize

@Gankra
Copy link
Contributor

Gankra commented Jul 24, 2015

I assumed we only cared about unnecessary allocations for performance reasons. I suppose you could waste a fair amount of memory if you had a collection of empty collections or something...?

Anyway the searching procedure already immediately branches for zero capacity, but after hashing. We could hoist the logic up to before the hashing, but that would in principle require checking twice or making search_hashed unsafe. Alternatively we could require hashers to provide a nullary state (e.g. all 0's) for pre-initialization.

@alexcrichton
Copy link
Member

I vaguely remember there being some analysis that some absurdly large percentage of collections created never have any elements, hence the no-allocation-on-new, but I may also be misremembering.

@apasel422
Copy link
Contributor Author

Closing this until we have better motivation.

@apasel422 apasel422 closed this Jul 29, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make HashMap use a per-process or per-thread seed instead of a per-instance seed
6 participants