Skip to content

arth-shukla/ppo-gym-cartpole

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PPO for OpenAI Gym Cartpole

Link

WandB: https://wandb.ai/arth-shukla/PPO%20Gym%20Cart%20Pole

Papers Used

Proximal Policy Optimization Algorithms: https://arxiv.org/pdf/1707.06347.pdf

Technologies Used

Algorithms/Concepts: PPO, Experience Replay

AI Development: Pytorch (Torch, Cuda), OpenAI Gym, WandB

Evaluation and Inference

More episode videos available on WandB: https://wandb.ai/arth-shukla/PPO%20Gym%20Cart%20Pole

The PPO Model currently only supports discrete action spaces (categorical distribution). In OpenAI Gym Cartpole, by episode 136, the agent is able to effectively "beat" cartpole:

Episode 136 Video

Future Experiments

First I want to implement algorithms that came before PPO (DQNs or earlier actor-critic algorithms like DDPG, etc) to get a stronger understanding of the math. Also, I'll get a change to make agents for popular environemnts like Mario.

I also want to tackle more challenging game environments, like the DM Control Suite. To do this, I'll explore PPO for continuous action spaces (through normal distributions), other similarly effective models like SAN, and models like RecurrentPPO which offer some implemenation challenges.

Finally, there are some other options for experience replay I'd like to implement, like Prioritized ER.

About Me

Arth Shukla Site | GitHub | LinkedIn

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages