Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Isolate random seed for cachem #29

Open
torfason opened this issue May 10, 2022 · 2 comments · May be fixed by #30
Open

Isolate random seed for cachem #29

torfason opened this issue May 10, 2022 · 2 comments · May be fixed by #30

Comments

@torfason
Copy link

It is a widespread practice to rely on set.seed() to construct pseudo-random sets of analyses, but in a reproducible manner.

Some functions within the cachem package modify this seed, but it is not obvious to users of the package that they should do so. This can (and did in my case) lead to unexpected and hard to debug behavior when cachem is introduced into such analysis directly or through intermediate packages that rely on cachem for its effectiveness as a caching library.

The issue can be demonstrated with a simple reprex:

library(cachem)
library(testthat)
cm <- cache_mem()
cd <- cache_disk(tempdir())

set.seed(42)
expect_equal(sample(99999, 1), 61413)

set.seed(42)
cm$set("x",letters)
expect_equal(sample(99999, 1), 61413)

set.seed(42)
cd$set("x",letters)
expect_equal(sample(99999, 1), 61413)
#> Error: sample(99999, 1) not equal to 61413.
#> 1/1 mismatches
#> [1] 73236 - 61413 == 11823

The above also demonstrates that seed updates depend on the type of the cache, which can further complicate things (something that works with a memory cache can stop working when switching to a layered or disk cache).

My suggested fix would be to isolate the usage of the random seed wherever cachem needs access to randomness, so that the global seed is not affected by calls to cachem functions.

A discussion of this issue can be found here:

And an implementation of this within the shiny package (which is the approach I would recommend) can be found here:

I'd be happy to discuss the best approach to this fix/enhancement, and provide an eventual pull request if it seems like it would be well-received.

@wch wch linked a pull request May 11, 2022 that will close this issue
@wch
Copy link
Member

wch commented May 11, 2022

@torfason I made a quick PR. If you could test it out, I'd appreciate it. Also, if you are able to figure out how to test it in the Rcpp case mentioned in the PR, that would be very helpful, although I can understand if it's too esoteric -- I can't remember the details myself even though I wrote all of that.

@torfason
Copy link
Author

torfason commented May 12, 2022

The PR looks great to me, fixes all the cases that I had handy.

I'm afraid, though, that I won't be of much use regarding the Rcpp case. I haven't done c/c++ bindings at all, and that discussion in the shiny thread went quickly over my head :-)

For what it's worth, I did a quick and dirty incorporation of RNG state tests into the full test suite. With those changes (see https:/torfason/cachem/tree/random-seed-tests - didn't do a PR), 10 tests fail on the main branch (all in cache-disk) but everything passes on your random-seed branch:

# main branch
ℹ Loading cachem
ℹ Testing cachem
✓ | F  W S  OK | Context
x | 10      41 | cache-disk [6.1s]
✓ |         65 | cache-mem [2.0s]
✓ |         13 | utils    

# random-seed branch
ℹ Loading cachem
ℹ Testing cachem
✓ | F W S  OK | Context
✓ |        51 | cache-disk [6.1s]
✓ |        65 | cache-mem [2.0s]
✓ |         2 | random
✓ |        13 | utils

Not sure if it is attractive to use this approach in the main repo, but if so, it could be done with minimal clutter. Apart from set.seed() at the start of tests and an expectation at the end, the changes involved pulling out randomization used in the tests themselves. But that could be minimized by adding a utility function returning a list of random vectors, and doing all the random stuff at the very start of each test. Your call, on or off, and I'd be happy to refactor this dirty test implementation if that is helpful.

Also, I can continue to be on the lookout for any strangeness regarding this, although my own code is pretty straight-forward R code, so if the tests don't trigger any Rcpp issues, I don't think it's likely that my regular code would do that.

Thanks for responding so quickly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants