Isolate random seed for cachem #29

torfason · 2022-05-10T11:21:35Z

It is a widespread practice to rely on set.seed() to construct pseudo-random sets of analyses, but in a reproducible manner.

Some functions within the cachem package modify this seed, but it is not obvious to users of the package that they should do so. This can (and did in my case) lead to unexpected and hard to debug behavior when cachem is introduced into such analysis directly or through intermediate packages that rely on cachem for its effectiveness as a caching library.

The issue can be demonstrated with a simple reprex:

library(cachem)
library(testthat)
cm <- cache_mem()
cd <- cache_disk(tempdir())

set.seed(42)
expect_equal(sample(99999, 1), 61413)

set.seed(42)
cm$set("x",letters)
expect_equal(sample(99999, 1), 61413)

set.seed(42)
cd$set("x",letters)
expect_equal(sample(99999, 1), 61413)
#> Error: sample(99999, 1) not equal to 61413.
#> 1/1 mismatches
#> [1] 73236 - 61413 == 11823

The above also demonstrates that seed updates depend on the type of the cache, which can further complicate things (something that works with a memory cache can stop working when switching to a layered or disk cache).

My suggested fix would be to isolate the usage of the random seed wherever cachem needs access to randomness, so that the global seed is not affected by calls to cachem functions.

A discussion of this issue can be found here:

https://stackoverflow.com/questions/23090958/isolate-randomness-of-a-local-environment-from-the-global-r-process

And an implementation of this within the shiny package (which is the approach I would recommend) can be found here:

https:/rstudio/shiny/pull/434/files

I'd be happy to discuss the best approach to this fix/enhancement, and provide an eventual pull request if it seems like it would be well-received.

The text was updated successfully, but these errors were encountered:

wch · 2022-05-11T18:47:02Z

@torfason I made a quick PR. If you could test it out, I'd appreciate it. Also, if you are able to figure out how to test it in the Rcpp case mentioned in the PR, that would be very helpful, although I can understand if it's too esoteric -- I can't remember the details myself even though I wrote all of that.

torfason · 2022-05-12T17:19:40Z

The PR looks great to me, fixes all the cases that I had handy.

I'm afraid, though, that I won't be of much use regarding the Rcpp case. I haven't done c/c++ bindings at all, and that discussion in the shiny thread went quickly over my head :-)

For what it's worth, I did a quick and dirty incorporation of RNG state tests into the full test suite. With those changes (see https:/torfason/cachem/tree/random-seed-tests - didn't do a PR), 10 tests fail on the main branch (all in cache-disk) but everything passes on your random-seed branch:

# main branch
ℹ Loading cachem
ℹ Testing cachem
✓ | F  W S  OK | Context
x | 10      41 | cache-disk [6.1s]
✓ |         65 | cache-mem [2.0s]
✓ |         13 | utils    

# random-seed branch
ℹ Loading cachem
ℹ Testing cachem
✓ | F W S  OK | Context
✓ |        51 | cache-disk [6.1s]
✓ |        65 | cache-mem [2.0s]
✓ |         2 | random
✓ |        13 | utils

Not sure if it is attractive to use this approach in the main repo, but if so, it could be done with minimal clutter. Apart from set.seed() at the start of tests and an expectation at the end, the changes involved pulling out randomization used in the tests themselves. But that could be minimized by adding a utility function returning a list of random vectors, and doing all the random stuff at the very start of each test. Your call, on or off, and I'd be happy to refactor this dirty test implementation if that is helpful.

Also, I can continue to be on the lookout for any strangeness regarding this, although my own code is pretty straight-forward R code, so if the tests don't trigger any Rcpp issues, I don't think it's likely that my regular code would do that.

Thanks for responding so quickly!

wch linked a pull request May 11, 2022 that will close this issue

Make sure cache objects don't affect RNG state #30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Isolate random seed for cachem #29

Isolate random seed for cachem #29

torfason commented May 10, 2022

wch commented May 11, 2022

torfason commented May 12, 2022 •

edited

Loading

Isolate random seed for cachem #29

Isolate random seed for cachem #29

Comments

torfason commented May 10, 2022

wch commented May 11, 2022

torfason commented May 12, 2022 • edited Loading

torfason commented May 12, 2022 •

edited

Loading