Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need testing? #9

Closed
0xsamgreen opened this issue May 11, 2019 · 26 comments
Closed

Need testing? #9

0xsamgreen opened this issue May 11, 2019 · 26 comments
Assignees

Comments

@0xsamgreen
Copy link

Hi, thank you for making this port :)

In you conversation with Danijar, I read that you're limited in your abilities to test because of GPU availability? I'm interested in building from your code, and I'd be happy to help run tests for you. I have four Titan Xps I could dedicate to it for a bit. My limitation is that I don't have a Mujoco license (I'm working on it) so testing would be limited to Gym environments.

@Kaixhin
Copy link
Owner

Kaixhin commented May 11, 2019

That would be much appreciated! Unfortunately I'm still waiting for the latest results from Danijar (fixing the bug in the RNN would improve upon the original results), and in order to check that this code is fine we'd want to compare results so that means DeepMind Control Suite.

If you're interested in getting baseline results for your own work then perhaps it would be interesting to have some results on Pendulum-v0 and MountainCarContinuous-v0 (both symbolic and visual observations) anyway? We currently have no idea what appropriate hyperparameters are, and as one of those people who also don't have easy access to MuJoCo licenses I know it'd be nice to have some completely open source reference results.

@0xsamgreen
Copy link
Author

Hi @Kaixhin,

I'm running headless, and I'm able to run symbolic mode fine, but I can't run in non-sybolic mode. I have render set to False, but I get the following error:

                          Options
                          seed: 1
                          disable_cuda: False
                          env: Pendulum-v0
                          symbolic_env: False
                          max_episode_length: 1000
                          experience_size: 1000000
                          activation_function: relu
                          embedding_size: 1024
                          hidden_size: 200
                          belief_size: 200
                          state_size: 30
                          action_repeat: 2
                          action_noise: 0.3
                          episodes: 2000
                          seed_episodes: 5
                          collect_interval: 100
                          batch_size: 50
                          chunk_size: 50
                          overshooting_distance: 50
                          overshooting_kl_beta: 1
                          overshooting_reward_scale: 1
                          global_kl_beta: 0.1
                          free_nats: 2
                          learning_rate: 0.001
                          action_noise: 0.3
                          episodes: 2000
                          seed_episodes: 5
                          collect_interval: 100
                          batch_size: 50
                          chunk_size: 50
                          overshooting_distance: 50
                          overshooting_kl_beta: 1
                          overshooting_reward_scale: 1
                          global_kl_beta: 0.1
                          free_nats: 2
                          learning_rate: 0.001
                          grad_clip_norm: 1000
                          planning_horizon: 12
                          optimisation_iters: 10
                          candidates: 1000
                          top_candidates: 100
                          test_interval: 25
                          test_episodes: 10
                          checkpoint_interval: 25
                          checkpoint_experience: False
                          load_experience: False
                          load_checkpoint: 0
                          render: False
Traceback (most recent call last):
  File "main.py", line 85, in <module>
    observation, done, t = env.reset(), False, 0
  File "/home/sgreen/working/planet.pt/env.py", line 87, in reset
    return torch.tensor(cv2.resize(self._env.render(mode='rgb_array'), (64, 64), interpolation=cv2.INTER_LINEAR).transpose(2, 0, 1), dtype=torch.float32).div_(255).unsqueeze(dim=0)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/core.py", line 249, in render
    return self.env.render(mode, **kwargs)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/envs/classic_control/pendulum.py", line 61, in render
    from gym.envs.classic_control import rendering
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/gym/envs/classic_control/rendering.py", line 27, in <module>
    from pyglet.gl import *
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/gl/__init__.py", line 239, in <module>
    import pyglet.window
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 1896, in <module>
    gl._create_shadow_window()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/gl/__init__.py", line 208, in _create_shadow_window
    _shadow_window = Window(width=1, height=1, visible=False)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/xlib/__init__.py", line 166, in __init__
    super(XlibWindow, self).__init__(*args, **kwargs)
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 501, in __init__
    display = get_platform().get_default_display()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/window/__init__.py", line 1845, in get_default_display
    return pyglet.canvas.get_display()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/canvas/__init__.py", line 82, in get_display
    return Display()
  File "/ascldap/users/sgreen/anaconda3/envs/planet.pt/lib/python3.7/site-packages/pyglet/canvas/xlib.py", line 86, in __init__
    raise NoSuchDisplayException('Cannot connect to "%s"' % name)
pyglet.canvas.xlib.NoSuchDisplayException: Cannot connect to "None"

Are you also running headless?

@Kaixhin
Copy link
Owner

Kaixhin commented May 13, 2019

I am not running headless, as I'm using gym's render functionality to get image-based observations. I'm afraid you'll need to run with either a real or fake display.

@0xsamgreen
Copy link
Author

I have a MuJoCo license now! Is there a sweep of MuJoCo environment tests you would like run?

@Kaixhin
Copy link
Owner

Kaixhin commented May 17, 2019

Great news! I've started a run on walker-walk, so any of the others.

@0xsamgreen
Copy link
Author

Will do. Other than the environment, should I use the default parser arguments of your latest commit?

@Kaixhin
Copy link
Owner

Kaixhin commented May 17, 2019

Latest commit should be same settings as PlaNet camera ready so yes - only other change is the recommended action repeat per environment, which you can see in env.py.

@0xsamgreen
Copy link
Author

Thanks, I started a run on cartpole-swingup, finger-spin, cheetah-run, and ball_in_cup-catch.

@0xsamgreen
Copy link
Author

Hi @Kaixhin, things are looking good! I will continue to train for another day and then update, but it seems that scores are meeting or approaching Danijar's. Thanks again for making this port.

cartpole-swingup
test_results_cartpole-swingup

cheetah-run
test_rewards-cheetah-run

cup-catch
test_rewards_cup-catch

finger-spin
test_rewards_finger-spin

@Kaixhin
Copy link
Owner

Kaixhin commented May 18, 2019

Awesome! If possible, do you mind finding a way to send me all the data once done (checkpoints and results)? Perhaps via a file sharing service, and I'll let you know once I've downloaded it all because it would take a lot of space. I'll also take the rewards and final model and make them available as a release. I've got results for walker-walk, so just kicked off a run for reacher-easy.

@0xsamgreen
Copy link
Author

Here are the final test result plots! I'm looking into sharing the checkpoints and result logs.

cartpole-swingup
newplot (3)

cheetah-run
newplot (4)

cup-catch
newplot (5)

finger-spin
newplot (6)

@Kaixhin
Copy link
Owner

Kaixhin commented May 20, 2019

Awesome! Do let me know if you do something else with PlaNet, but I think the results are good. I'll close this once you get me all the data.

@Kaixhin
Copy link
Owner

Kaixhin commented May 29, 2019

@SG2 if you still have some capacity would you be able to run the same environments again with the latest commit? Among various improvements I've made changes to the image processing to match what was actually done in the original (I missed some of this originally). Should only take half the time now since I've set the default number of episodes to 1000 like in the camera ready.

@0xsamgreen
Copy link
Author

Hi @Kaixhin, I'm on it!

@Kaixhin
Copy link
Owner

Kaixhin commented May 30, 2019

Would you also be able to get results for walker-walk and reacher-easy? I've got just about enough space on Google Drive to get the results for all 6 tasks, so just email me and I'll share a folder with you that you can put everything into.

@0xsamgreen
Copy link
Author

Here are my test results for commit ee9b996.

cartpole-swingup
image

cartpole-balance
image

cheetah-run
image

cup-catch
image

finger-spin
image

reacher-easy
image

walker-walk
image

@Kaixhin
Copy link
Owner

Kaixhin commented Jun 5, 2019

Thanks a lot! Uploaded all figures for release v1.0 and v1.1. Unfortunately walker_walk doesn't look that good either. Added notes on discrepancies to v1.0 - would be good if you can pass on the data from both sets of experiments; I'll upload final trained models for both.

@0xsamgreen
Copy link
Author

No problem, thanks again for the port! Yes, I'll work on getting all the results from v1.0 and v1.1 to you. (I also trained all six agents on v1.0, before doing v1.1.)

@maximecb
Copy link
Contributor

maximecb commented Jun 6, 2019

Out of curiosity: are these results all comparable to the original PlaNet implementation? Can you explain why the cup-catch performance collapses during the middle of training and then recovers?

@Kaixhin
Copy link
Owner

Kaixhin commented Jun 6, 2019

Apart from the high variance in cup-catch, which makes it hard to tell without more seeds if it's the same or a bit worse, results with tag 1.0 seem to be comparable. 1.1, which adds the 5-bit quantisation, noise and observation normalisation/centering, and is hence closer to the original, unfortunately seems to be worse on walker-walk and cup-catch. Have now noted this with the releases. Not sure about cup-catch collapse, but one thing in terms of the task is that it either gets the ball in and gets reward, or not, so the score can vary a lot based on success on this precise task.

@longfeizhang617
Copy link

@SG2 It's very glad to know that you have successfully run the entire code. while,when I run the non-sybolic mode,it always collapses just at 300 or 750 episodes( the whole episode is 1000 ). I don't know what's the reason.Would you have met this problem?Thank you very much.

@0xsamgreen
Copy link
Author

@longfeizhang617 I'm sorry to hear that. What environment are you training? Also did you continue to let it run? You can see in my last result plots that it collapses for cup catch and then recovers. I never saw it collapse and then stay collapsed forever.

@longfeizhang617
Copy link

@SG2 Thanks for your attention.the default environment of training is Pendulum-v0. I suspect that planner part caused the memory overflow and thus the error,but i'm not sure.I have decreased the experience_size to be 100000(the oriainal is 1000000),and changed the batch-size/chunk-size/overshooting-distance from 50 to 30.While,the issue is still stayed.Is the experiment condition limiting the result ? My experiment condition includes a 16G memory , a GeForce GTX 1080Ti GPU.

@Kaixhin
Copy link
Owner

Kaixhin commented Jun 18, 2019

@longfeizhang617 I added support for Gym environments for people to try PlaNet without needing MuJoCo. However the original paper only includes experiments for DeepMind Confirm Suite, so you would have to tune hyperparameters for any other environment. I'll make a note on the README.

@longfeizhang617
Copy link

@Kaixhin Thank you sincerely. I really haven't got the licenses of MuJoCo,so I just try Planet in Gym environment .You have inspired me that it maybe collapsed because of the mismatching of hyperparameters. I will try to tune hyperparameters.These days,I also communicate with others,at first,i suspect that there is somthing wrong in the iteration of the code,but @SG2 have accomplished it ,so i have to try more. It's really headless.

@Kaixhin Kaixhin closed this as completed Jul 3, 2019
@vballoli
Copy link

@Kaixhin @xsamgreen Do you have approximate training time stats for single/multi GPU setup on any of the symbolic environments ? It'd be really helpful to have training stats for a few envs(symbolic or otherwise) in the readme. Btw, thank you for the code and experiments, they're really helpful !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants