-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues with Stochastic MuZero #60
Comments
Thanks for sharing the minimal example. I can clear one confusion: The You can take an inspiration from the bandit in the tests: I will improve the documentation for the chance_recurrent_fn. Sorry for the confusion. |
@fidlej Thanks for your reply. Perhaps the argument can be renamed to |
Fixes #60. PiperOrigin-RevId: 555288953
@fidlej Any idea about the |
You can see that the output.search_tree contains only the actions relevant for the decision nodes. The zeros in the children_rewards then make sense. The reward is zero for the children of the decision nodes. |
Fixes #60. PiperOrigin-RevId: 555288953
Fixes #60. PiperOrigin-RevId: 555288953
I'm having issues with
mctx.stochastic_muzero_policy
. Here's an example:The first issue is that the
children_rewards
are all 0, despite the fact thatchance_recurrent_fn
always yields a positive reward.The second issue is that the final weight of the zeroth action (which receives an additional reward of 100) is not higher than the rest, despite a large number of simulations.
Any idea what might be causing these issues?
The text was updated successfully, but these errors were encountered: