Make HashMap use a per-process or per-thread seed instead of a per-instance seed #27243

Gankra · 2015-07-23T18:57:45Z

Currently every call to HashMap::new makes a new RandomState that makes two new calls to the thread_rng to create its seed. This is, to my knowledge cryptographic overkill, as there is no clear distinction between a single HashMap with a seed and many HashMaps with the same seed, as long as that seed is not hardcoded (and therefore able to be attacked offline).

There is no argument to be had on minimizing the risk of a single compromise, as the SipHash security model is designed to make it impossible to compromise a single map. If our users disagree with this, we may consider exposing a constructor for RandomState that explicitly reseeds it.

This can be implemented via Once or through a Thread Local Static. Whichever is more efficient.

CC @alexcrichton

Gankra · 2015-07-23T18:57:55Z

Willing to mentor

apasel422 · 2015-07-23T19:15:00Z

@gankro I'd be willing to take a look at this.

Gankra · 2015-07-23T19:22:27Z

Awesome! Go for it!

alexcrichton · 2015-07-23T19:31:03Z

Note that this is done today for ease of implementation, if a global seed
is painful to add I wouldn't bother
On Thu, Jul 23, 2015 at 12:22 PM Alexis Beingessner <
[email protected]> wrote:

Awesome! Go for it!

—
Reply to this email directly or view it on GitHub
#27243 (comment).

frankmcsherry · 2015-07-23T20:57:57Z

Just a quick comment. There are definitely other reasons to value a seeded HashMap other than DoS attacks, if that is interesting. The most common one I deal with is when I partition work among worker threads, possibly using a hash function, and then hope that when they arrive they will not have rotted out some bits of their hashes and make HashMap perf all wonky. The "per-thread" version might not suffer from this much, but you could easily imagine other ways to shoot yourself in the foot if you re-use seeded hashes too often.

I need cluster-wide consistency, so I actually seed all my own stuff and personally rot out the bits, but ... Just thought I'd mention it!

Gankra · 2015-07-23T21:01:20Z

@frankmcsherry could you elaborate on this idea of bits rotting out?

frankmcsherry · 2015-07-23T21:14:41Z

Sure, sorry. It may be tricky to make a version using only HashMap, so perhaps the answer is ... "bah!" :)

Let's say you have a bunch of workers thinking they'd like to count the words in all the strings they have. So, they distribute the words among themselves using their favorite hash function. Once the words next arrive at each worker, each puts their words in a HashMap to count them up. If the hash function they used to distribute the strings is related to the one in the HashMap (e.g. the same) the low order bits will be fixed, and bad perf can happen.

So, you may say: "hey well, you don't get to use HashMap's hash function to do the distribution", and I totally don't plan on trying that, but if anything bad happens because of how the keys are laid out (say I pre-aggregate all my word counts, and ship them, using HashMap.drain()) you can get accidentally crummy performance.

Random stuff like this happens, and you beat your head against the wall trying to understand what went wrong, and then you learn about number theory and that you and the HashMap implementor chose the same prime number somewhere, and ...

So, probably don't sweat it. :)

Gankra · 2015-07-23T21:28:41Z

Ah, neat! Thanks!

This reduces how much rand is required in std, and makes creating hash-maps noticably faster. The first invocation of HashMap::new() in a thread goes from ~130_000 ns to ~1500 ns (i.e. 86x faster) and later invocations go from 10-20 ns to a more consistent 1.8-2.0ns. The mean for many invocations in a loop, on a single thread, goes from ~77ns to ~1.9ns, and the standard deviation drops from 2800ns to 33ns (the *massive* variance of the old scheme comes from the occasional reseeding of the thread rng). These new numbers are only slightly higher than creating the `RandomState` outside the loop and calling `with_state` in it (i.e. only measuring the non-random parts of hashmap initialisation): 1.3 and 18 ns respectively. This new scheme has the slight downside of being consistent on a single thread, so users may unintentionally rely on the order being fixed (e.g. two hashmaps with the same contents). Closes rust-lang#27243.

This is a rebase and extension of rust-lang#31356 where we cache the keys in thread local storage. This should give us a nice speed bost in creating hash maps along with mostly retaining the property that all maps have a nondeterministic iteration order. Closes rust-lang#27243

std: Cache HashMap keys in TLS This is a rebase and extension of #31356 where we not only cache the keys in thread local storage but we also bump each key every time a new `HashMap` is created. This should give us a nice speed bost in creating hash maps along with retaining the property that all maps have a nondeterministic iteration order. Closes #27243

Gankra added A-libs E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Jul 23, 2015

Gankra self-assigned this Jul 23, 2015

apasel422 mentioned this issue Jul 23, 2015

replace RandomState's per-instance seed with a per-thread one #27246

Closed

cristicbz mentioned this issue Oct 8, 2015

Make RandomState a lazy singleton. Fixes #27243 #28916

Closed

huonw mentioned this issue Feb 2, 2016

Seed HashMaps thread-locally, straight from the OS. #31356

Closed

alexcrichton mentioned this issue May 1, 2016

std: Cache HashMap keys in TLS #33318

Merged

bors closed this as completed in #33318 May 20, 2016

sfackler mentioned this issue Sep 19, 2016

Revert "std: Cache HashMap keys in TLS" #36565

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make HashMap use a per-process or per-thread seed instead of a per-instance seed #27243

Make HashMap use a per-process or per-thread seed instead of a per-instance seed #27243

Gankra commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

frankmcsherry commented Jul 23, 2015

Gankra commented Jul 23, 2015

frankmcsherry commented Jul 23, 2015

Gankra commented Jul 23, 2015

Make HashMap use a per-process or per-thread seed instead of a per-instance seed #27243

Make HashMap use a per-process or per-thread seed instead of a per-instance seed #27243

Comments

Gankra commented Jul 23, 2015

Gankra commented Jul 23, 2015

apasel422 commented Jul 23, 2015

Gankra commented Jul 23, 2015

alexcrichton commented Jul 23, 2015

frankmcsherry commented Jul 23, 2015

Gankra commented Jul 23, 2015

frankmcsherry commented Jul 23, 2015

Gankra commented Jul 23, 2015