Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify envs to be compatible with twrl #8

Closed
wants to merge 32 commits into from
Closed

Modify envs to be compatible with twrl #8

wants to merge 32 commits into from

Conversation

SeanNaren
Copy link
Collaborator

Just opening this PR to track progress based on this PR. Need to test each environment with twrl to make sure they behave correctly, if you spot any issues or have any feedback let me know :)

It may also be nice to set the rewards similar to gym as well for comparison sake!

@Kaixhin
Copy link
Owner

Kaixhin commented Oct 1, 2016

Looks good! Might as well copy as much of the API as we can. Could rename actionSpec to actionSpace? Think that returning it from a function call is a bit cleaner than having it as a property, but we could change that if you feel that it's better. actionTaken from env:step(action) should definitely be replaced with the optional info, and we can leave it up to people how they want to add information to the info table. A render would be nice, but beyond the scope of this PR.

Otherwise agreed with making the reward structure the same as gym, provided the environment dynamics are the same. Anyone who was using rlenvs seriously for benchmarking can simply use v1.

@SeanNaren
Copy link
Collaborator Author

Could you point me to where actionTaken is? Can't seem to find it. I've updated the names of the function, hopefully that's what you had in mind!

I'll get to work on the reward schemes for things that cross over to gym!

@Kaixhin
Copy link
Owner

Kaixhin commented Oct 2, 2016

Ah sorry that is only mentioned in the docs, but isn't implemented in any of the envs provided - the context can be found in Kaixhin/Atari#56. I've notified the author of the PR that we're moving towards a more gym-compatible API, so to put that on hold for now.

Sorry but do you mind doing stateSpace to getStateSpace, etc.? Since it's implemented as a function in rlenvs, I think the getter/setter nomenclature makes it a little clearer.

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Oct 19, 2016

I tried to change rewards for the CartPole environment to match the reward scheme for gym but ran into issues.

For gym there is +1 for the time the pole is up, 0 for the time the pole is down. Just a sanity check should this then be reward = 1 and this be reward = 0? Getting strange behaviour when replacing this environment with the gym one in torch-twrl!

Once I resolved why this is the case I want to add an additional function to the init that will list all environments that are possible, if that sounds good!

EDIT: I think I found the reason for the inconsistencies, gym has max steps for each of their environment (terminate after those amount of steps to prevent it going forever). Should we implement this into rlenvs?

@Kaixhin
Copy link
Owner

Kaixhin commented Oct 20, 2016

Yep sanity check looks correct to me - let's go ahead with changing the reward to match.

I see you've already implemented the list - looks good. Remember to document in the README as well.

Is the max steps setting for all environments (including Atari)? If it's only there for the simple envs then it would be a good thing to add it into opts for these environments, with the gym defaults (if any).

FYI I'm going to add the environment from Towards Deep Symbolic Reinforcement Learning into the v1 branch in about 3 weeks. If you could merge and update it into twrl that would be appreciated :)

@SeanNaren
Copy link
Collaborator Author

Sounds good, I'll work towards that. So for the max step thing, here is how it's done for a few environments in openai-gym. How about I expose the step method in the Env class, and reference a variable called maxSteps that are set per environment. Using this variable I return termination in the Env class? Something like this:

local classic = require 'classic'

local Env = classic.class('Env')

-- Denote interfaces
Env:mustHave('start')
Env:mustHave('act') -- change name here to prevent overload
Env:mustHave('getStateSpace')
Env:mustHave('getActionSpace')
Env:mustHave('getRewardSpace')

function Env:step(action) 
    local reward, state, terminal = self:act(action)
    self.currentStep = self.currentStep == nil and 0 or self.currentStep
    if self.maxSteps and self.currentStep % self.maxSteps == 0 then -- env defines maxSteps
        terminal = true
        self.currentStep = 0
    end
    self.currentStep = self.currentStep + 1
end

return Env

@Kaixhin
Copy link
Owner

Kaixhin commented Oct 20, 2016

Overall the solution is good, but a few suggestions. I'd go with Env:mustHave('_step') to make it pretty clear that the internal method is hidden/allow people to use act for whatever reason. Secondly, if you haven't already, it's worth checking exactly how gym defines the max amount of timesteps - does it include the state from starting the environment? Finally, I'm not sure why you went for self.currentStep % self.maxSteps == 0 instead of just self.currentStep == self.maxSteps?

@SeanNaren
Copy link
Collaborator Author

Not entirely sure tbh, had a brain fart :P Made the changes and will now go through and add default steps as done in openai-gym (I've set the default to 1000, which is the default for openAI-gym, but if it's an issue lemme know!).

@Kaixhin
Copy link
Owner

Kaixhin commented Oct 21, 2016

I think the cleanest way to do this is to have Env.lua actually implement _init with self.maxSteps = opts.maxSteps or 1000, and have every env first call super._init(self, opts). This is where it should go, and it'll stop recalculating it on every step. Also, if I'm correct, this is going to run into problems unless you reset self.currentStep in start as well. Which probably means handling the logic in _init, start and step and using super calls.

After super._init(self, opts) I think it's worth matching the maxSteps to gym if it actually differs from 1000 according to this.

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Oct 24, 2016

I've made a few changes, added an init function in the Env.lua class as well as added the super.init call to all the envs. I've added default max steps and some logic in the Env.lua class to handle different step cases, lemme know of any feedback!

Also for rendering, shall we make it possible to render via the Env.lua class by passing in a render boolean in the opts? I can get to work on that if you think it would be good!

Env:mustHave('_step')
Env:mustHave('getStateSpace')
Env:mustHave('getActionSpace')
Env:mustHave('getRewardSpace')

function Env:_init(opts)
if opts.timeStepLimit and opts.maxSteps then
self.maxSteps = math.min(opts.timeStepLimit, opts.maxSteps)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this how gym does things? To me it makes more sense that if the user specifies maxSteps, then this should override any defaults. And then if timeStepLimit is available, that should be used. Then 1000. Makes the logic a little bit simpler too.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll double check but I do think this is correct (in terms of same as openAI-gym), however since I think this is mainly done for leaderboard purposes (as Kory said) I think we could make an exception here, what do you think?

@Kaixhin
Copy link
Owner

Kaixhin commented Oct 24, 2016

@korymath Rendering would be nice - any suggestions for how best to integrate it with gym/twrl?

@korymath
Copy link
Collaborator

A boolean in the opts would be a good means by which to pass the option, but is perhaps not the full solution.

My inclination is that this would allow for a strictly Lua RL infrastructure, free of connection to gym, and thus the rendering could be completely separate from that rendering. Having not done much graphics work in Lua, I am not sure how to best make this happen.

One potential means would be to save the transitions, and then format them in such a way that they can be loaded and replayed in the gym infra (that is, if the environments are exactly the same as those in gym, but that may be limiting, and require offline processing in between learning and display).

I would argue, that, while rendering is nice, it is not critical to learning (unless there is a learning signal derived from pixels, or a human observer of the visualization, or maybe a third case?). Thus, even a basic render of the learning performance could be generated every N steps, much like how the OpenAI gym allows for a video to be rendered.

Nice works on the max steps discussion, that is an interesting quirk of the OpenAI gym code, that i think is based on the online leaderboard, and the desire for shorter episode lengths for direct comparison.

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Oct 24, 2016

I added a :render() method which will check to make sure that a :getDisplay() method exists to retrieve a displayable screen and render it using qt (check the Env.lua class). I've also updated the experiment.lua class, check it out!

@korymath
Copy link
Collaborator

This looks great!

Thoughts on how to handle execution on a server? Maybe similar to gym and
allow creation of a 'fake' display for output?

On 24 October 2016 at 11:14, Sean Naren [email protected] wrote:

I added a :render() method which will check to make sure that a
:getDisplay() method exists to retrieve a displayable screen (check the
Env.lua class). I've also updated the experiment.lua class, check it out!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8 (comment), or mute
the thread
https:/notifications/unsubscribe-auth/AAK3s7JfHSknlRmkY3I3CCXuN3Qhz5SXks5q3OdkgaJpZM4KLzuC
.


Kory Mathewson

@SeanNaren
Copy link
Collaborator Author

Thanks @korymath! That's an interesting idea, maybe similar to how the gym API is currently interfaced with? If this is deemed necessary I could start thinking how to implement this (probably out the scope of this PR)

@korymath
Copy link
Collaborator

Sounds great!

Agree it is probably another large feature build.

On 25 October 2016 at 12:49, Sean Naren [email protected] wrote:

Thanks @korymath https:/korymath! That's an interesting
idea, maybe similar to how the gym API is currently interfaced with? If
this is deemed necessary I could start thinking how to implement this
(probably out the scope of this PR)


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8 (comment), or mute
the thread
https:/notifications/unsubscribe-auth/AAK3s7h1dASukqgzOELIIlAC_g95a2raks5q3k8ygaJpZM4KLzuC
.


Kory Mathewson


function Env:render()
if self.qt and self.getDisplay then
self.window = self.window == nil and image.display({ image = self:getDisplay(), zoom = 10 }) or self.window
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zoom should be part of opts and handled by Env - 10 was hard-coded for Catch, but the default of 1 is appropriate for Atari.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall I set the default to 10 for Catch or leave this exposed for the user as well?

Copy link
Owner

@Kaixhin Kaixhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to set a default zoom for Catch - just have it set as part of the opts in experiment.lua.

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Nov 9, 2016

Seems like the merge of the branch replayed the commits on top... once I've finished with XOWorld I think I'll rebase the branch and open a new PR, is that alright?

Also there is a small issue here, comma missing at the end :)

EDIT: XOWorld is an awesome environment!! Should be fully implemented now and should wrap up everything. Just thinking about unit-tests over here to try make sure everything works correctly now!

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 9, 2016

Thanks for spotting that - now fixed on v1 and merged into master.

Yes a rebase would clean this up, good job for suggesting :) There shouldn't be any more changes due now.

@SeanNaren
Copy link
Collaborator Author

Thanks, in terms of unit tests any suggestions on functionality to verify and check?

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 9, 2016

Good question. mustHave already helps a little bit. Seeing what I just did, maybe static syntax checking would be useful? Going further, argcheck might be worth adding in.

I suppose you could test an environment by loading it, starting it, and doing at least one step. Not sure if there's anything that needs to be done if someone tries to step before start. Checking maxSteps works really depends on the environment (e.g. Cartpole will fail quickly with a random agent), unless you manually set it to a very low number.

I started thinking about testing Atari, but there's licensing issues with ROMs. I thought maybe there might be a homebrew ROM that we could include in the repo, but then it'd need to be coded into ALE...

@korymath
Copy link
Collaborator

Feel free to use torch-twrl as a testing base if you want. It has some basic testing functionality built up here: https:/twitter/torch-twrl/blob/master/test/test-gym.lua

It runs a few tests for different environments, the runTest code is from the gym-http-api, you can see the full extent of it here: https:/openai/gym-http-api/blob/master/binding-lua/test_api.lua It is a pretty verbose test that runs through the RL environment functionality.

It is quite easy to find the ROMs if you are particularly motivated.

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Nov 14, 2016

Thanks for that, I can definitely try imitate those tests for all the rlenvs environments!

EDIT: Also saw malmo added (which is awesomesauce!) I'll pull this in and get this implemented as well.
EDIT2: I will wait!

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 14, 2016

@SeanNaren I'm trying to get Malmo working myself, so bear with me a while. New plan is to target v1, then merge into master (they're now the same, even though the v2 rockspec shouldn't be in v1, but oh well). I'll aim to get it done by the end of today, but will report back once I've got it working successfully.

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 14, 2016

OK that was a bit of a mess but I seem to have knocked down the requirements nicely and condensed the code. Had a quick look, but couldn't get qlua working - the screen does however display in the running client. Also couldn't get it working with Atari because of some issue with luaposix, which is a shame, but I won't have time to investigate further.

@SeanNaren
Copy link
Collaborator Author

Right I think that's minecraft handled (I need to still test that it works, will do once I'm home on my own machine) and now just the tests.

Currently I just run through every environment but need to add some assertions here, however not sure what I could assert on. In the torch-twrl tests we assert on a successful run through (return true as long as everything runs correctly).

@korymath
Copy link
Collaborator

Great work!

You can do simple assertions on type of variables response, or an assertion
range.

ex:

runTest could return an error code with a pcall, if the return is nice then
assert on that.

On 17 November 2016 at 03:30, Sean Naren [email protected] wrote:

Right I think that's minecraft handled (I need to still test that it
works, will do once I'm home on my own machine) and now just the tests.

Currently I just run through every environment but need to add some
assertions here
https:/Kaixhin/rlenvs/blob/twrl/tests/test.lua, however not
sure what I could assert on. In the torch-twrl tests we assert on a
successful run through (return true as long as everything runs correctly).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8 (comment), or mute
the thread
https:/notifications/unsubscribe-auth/AAK3syevaHnwcMmWAfxEET18at7EHqVIks5q_CzBgaJpZM4KLzuC
.


Kory Mathewson

@SeanNaren
Copy link
Collaborator Author

SeanNaren commented Nov 25, 2016

Added assertions, check them out! I also rebased the branch into one nice commit, check here. I think the changes are done here so it be great if you could have a quick look to let me know if anything needs changing!

What do we plan to merge into? I can then open the PR on the rebased branch.

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 25, 2016

Unfortunately I'll be away this weekend so can't look at it till next week, but the plan is to merge this into master and update the readme to show installing the v2 rockspec. It's a bit messy, but then we can look at the v1 branch and make sure that is fine (e.g. by deleting the v2 rockspec inside that branch, but hopefully that's it).

This way anyone who's still using the old API (e.g. the Atari repo) can still do so by requiring the v1 rockspec. :) No need to backport new environments unless we're requested to. twrl will of course use v2.

@SeanNaren
Copy link
Collaborator Author

Sounds like a plan and no worries, whenever you get time!

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 28, 2016

Looks good! Firstly, can you merge the latest Minecraft stuff (both in the env and the readme)? Secondly, if you could use 2-space indentation for consistency that'd be awesome. I'll have another look after and then we should be able to merge twrl-rebase into master.

@SeanNaren
Copy link
Collaborator Author

Having issues merging master into the rebased branch. Can you see any commits from master not in the rebased branch?

@Kaixhin
Copy link
Owner

Kaixhin commented Nov 28, 2016

Ah sorry the latest work should be in v1, which should be merged into twrl-rebase, which should then eventually be merged into master.

@SeanNaren SeanNaren mentioned this pull request Nov 30, 2016
@SeanNaren
Copy link
Collaborator Author

This is continued in #14

@SeanNaren SeanNaren closed this Nov 30, 2016
@SeanNaren SeanNaren deleted the twrl branch December 15, 2016 10:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants