-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Need testing? #9
Comments
That would be much appreciated! Unfortunately I'm still waiting for the latest results from Danijar (fixing the bug in the RNN would improve upon the original results), and in order to check that this code is fine we'd want to compare results so that means DeepMind Control Suite. If you're interested in getting baseline results for your own work then perhaps it would be interesting to have some results on |
Hi @Kaixhin, I'm running headless, and I'm able to run symbolic mode fine, but I can't run in non-sybolic mode. I have
Are you also running headless? |
I am not running headless, as I'm using |
I have a MuJoCo license now! Is there a sweep of MuJoCo environment tests you would like run? |
Great news! I've started a run on |
Will do. Other than the environment, should I use the default parser arguments of your latest commit? |
Latest commit should be same settings as PlaNet camera ready so yes - only other change is the recommended action repeat per environment, which you can see in |
Thanks, I started a run on cartpole-swingup, finger-spin, cheetah-run, and ball_in_cup-catch. |
Hi @Kaixhin, things are looking good! I will continue to train for another day and then update, but it seems that scores are meeting or approaching Danijar's. Thanks again for making this port. |
Awesome! If possible, do you mind finding a way to send me all the data once done (checkpoints and results)? Perhaps via a file sharing service, and I'll let you know once I've downloaded it all because it would take a lot of space. I'll also take the rewards and final model and make them available as a release. I've got results for |
Awesome! Do let me know if you do something else with PlaNet, but I think the results are good. I'll close this once you get me all the data. |
@SG2 if you still have some capacity would you be able to run the same environments again with the latest commit? Among various improvements I've made changes to the image processing to match what was actually done in the original (I missed some of this originally). Should only take half the time now since I've set the default number of episodes to 1000 like in the camera ready. |
Hi @Kaixhin, I'm on it! |
Would you also be able to get results for |
Here are my test results for commit ee9b996. |
Thanks a lot! Uploaded all figures for release v1.0 and v1.1. Unfortunately |
No problem, thanks again for the port! Yes, I'll work on getting all the results from v1.0 and v1.1 to you. (I also trained all six agents on v1.0, before doing v1.1.) |
Out of curiosity: are these results all comparable to the original PlaNet implementation? Can you explain why the cup-catch performance collapses during the middle of training and then recovers? |
Apart from the high variance in cup-catch, which makes it hard to tell without more seeds if it's the same or a bit worse, results with tag 1.0 seem to be comparable. 1.1, which adds the 5-bit quantisation, noise and observation normalisation/centering, and is hence closer to the original, unfortunately seems to be worse on walker-walk and cup-catch. Have now noted this with the releases. Not sure about cup-catch collapse, but one thing in terms of the task is that it either gets the ball in and gets reward, or not, so the score can vary a lot based on success on this precise task. |
@SG2 It's very glad to know that you have successfully run the entire code. while,when I run the non-sybolic mode,it always collapses just at 300 or 750 episodes( the whole episode is 1000 ). I don't know what's the reason.Would you have met this problem?Thank you very much. |
@longfeizhang617 I'm sorry to hear that. What environment are you training? Also did you continue to let it run? You can see in my last result plots that it collapses for cup catch and then recovers. I never saw it collapse and then stay collapsed forever. |
@SG2 Thanks for your attention.the default environment of training is Pendulum-v0. I suspect that planner part caused the memory overflow and thus the error,but i'm not sure.I have decreased the experience_size to be 100000(the oriainal is 1000000),and changed the batch-size/chunk-size/overshooting-distance from 50 to 30.While,the issue is still stayed.Is the experiment condition limiting the result ? My experiment condition includes a 16G memory , a GeForce GTX 1080Ti GPU. |
@longfeizhang617 I added support for Gym environments for people to try PlaNet without needing MuJoCo. However the original paper only includes experiments for DeepMind Confirm Suite, so you would have to tune hyperparameters for any other environment. I'll make a note on the README. |
@Kaixhin Thank you sincerely. I really haven't got the licenses of MuJoCo,so I just try Planet in Gym environment .You have inspired me that it maybe collapsed because of the mismatching of hyperparameters. I will try to tune hyperparameters.These days,I also communicate with others,at first,i suspect that there is somthing wrong in the iteration of the code,but @SG2 have accomplished it ,so i have to try more. It's really headless. |
@Kaixhin @xsamgreen Do you have approximate training time stats for single/multi GPU setup on any of the symbolic environments ? It'd be really helpful to have training stats for a few envs(symbolic or otherwise) in the readme. Btw, thank you for the code and experiments, they're really helpful !! |
Hi, thank you for making this port :)
In you conversation with Danijar, I read that you're limited in your abilities to test because of GPU availability? I'm interested in building from your code, and I'd be happy to help run tests for you. I have four Titan Xps I could dedicate to it for a bit. My limitation is that I don't have a Mujoco license (I'm working on it) so testing would be limited to Gym environments.
The text was updated successfully, but these errors were encountered: