Modify envs to be compatible with twrl #8

SeanNaren · 2016-10-01T17:17:25Z

Just opening this PR to track progress based on this PR. Need to test each environment with twrl to make sure they behave correctly, if you spot any issues or have any feedback let me know :)

It may also be nice to set the rewards similar to gym as well for comparison sake!

Kaixhin · 2016-10-01T17:57:38Z

Looks good! Might as well copy as much of the API as we can. Could rename actionSpec to actionSpace? Think that returning it from a function call is a bit cleaner than having it as a property, but we could change that if you feel that it's better. actionTaken from env:step(action) should definitely be replaced with the optional info, and we can leave it up to people how they want to add information to the info table. A render would be nice, but beyond the scope of this PR.

Otherwise agreed with making the reward structure the same as gym, provided the environment dynamics are the same. Anyone who was using rlenvs seriously for benchmarking can simply use v1.

SeanNaren · 2016-10-02T12:15:24Z

Could you point me to where actionTaken is? Can't seem to find it. I've updated the names of the function, hopefully that's what you had in mind!

I'll get to work on the reward schemes for things that cross over to gym!

Kaixhin · 2016-10-02T12:22:43Z

Ah sorry that is only mentioned in the docs, but isn't implemented in any of the envs provided - the context can be found in Kaixhin/Atari#56. I've notified the author of the PR that we're moving towards a more gym-compatible API, so to put that on hold for now.

Sorry but do you mind doing stateSpace to getStateSpace, etc.? Since it's implemented as a function in rlenvs, I think the getter/setter nomenclature makes it a little clearer.

SeanNaren · 2016-10-19T09:40:16Z

I tried to change rewards for the CartPole environment to match the reward scheme for gym but ran into issues.

For gym there is +1 for the time the pole is up, 0 for the time the pole is down. Just a sanity check should this then be reward = 1 and this be reward = 0? Getting strange behaviour when replacing this environment with the gym one in torch-twrl!

Once I resolved why this is the case I want to add an additional function to the init that will list all environments that are possible, if that sounds good!

EDIT: I think I found the reason for the inconsistencies, gym has max steps for each of their environment (terminate after those amount of steps to prevent it going forever). Should we implement this into rlenvs?

Kaixhin · 2016-10-20T12:25:34Z

Yep sanity check looks correct to me - let's go ahead with changing the reward to match.

I see you've already implemented the list - looks good. Remember to document in the README as well.

Is the max steps setting for all environments (including Atari)? If it's only there for the simple envs then it would be a good thing to add it into opts for these environments, with the gym defaults (if any).

FYI I'm going to add the environment from Towards Deep Symbolic Reinforcement Learning into the v1 branch in about 3 weeks. If you could merge and update it into twrl that would be appreciated :)

SeanNaren · 2016-10-20T14:19:15Z

Sounds good, I'll work towards that. So for the max step thing, here is how it's done for a few environments in openai-gym. How about I expose the step method in the Env class, and reference a variable called maxSteps that are set per environment. Using this variable I return termination in the Env class? Something like this:

local classic = require 'classic'

local Env = classic.class('Env')

-- Denote interfaces
Env:mustHave('start')
Env:mustHave('act') -- change name here to prevent overload
Env:mustHave('getStateSpace')
Env:mustHave('getActionSpace')
Env:mustHave('getRewardSpace')

function Env:step(action) 
    local reward, state, terminal = self:act(action)
    self.currentStep = self.currentStep == nil and 0 or self.currentStep
    if self.maxSteps and self.currentStep % self.maxSteps == 0 then -- env defines maxSteps
        terminal = true
        self.currentStep = 0
    end
    self.currentStep = self.currentStep + 1
end

return Env

Kaixhin · 2016-10-20T17:04:02Z

Overall the solution is good, but a few suggestions. I'd go with Env:mustHave('_step') to make it pretty clear that the internal method is hidden/allow people to use act for whatever reason. Secondly, if you haven't already, it's worth checking exactly how gym defines the max amount of timesteps - does it include the state from starting the environment? Finally, I'm not sure why you went for self.currentStep % self.maxSteps == 0 instead of just self.currentStep == self.maxSteps?

SeanNaren · 2016-10-21T14:21:26Z

Not entirely sure tbh, had a brain fart :P Made the changes and will now go through and add default steps as done in openai-gym (I've set the default to 1000, which is the default for openAI-gym, but if it's an issue lemme know!).

Kaixhin · 2016-10-21T21:18:53Z

I think the cleanest way to do this is to have Env.lua actually implement _init with self.maxSteps = opts.maxSteps or 1000, and have every env first call super._init(self, opts). This is where it should go, and it'll stop recalculating it on every step. Also, if I'm correct, this is going to run into problems unless you reset self.currentStep in start as well. Which probably means handling the logic in _init, start and step and using super calls.

After super._init(self, opts) I think it's worth matching the maxSteps to gym if it actually differs from 1000 according to this.

…o environments

SeanNaren · 2016-10-24T10:45:14Z

I've made a few changes, added an init function in the Env.lua class as well as added the super.init call to all the envs. I've added default max steps and some logic in the Env.lua class to handle different step cases, lemme know of any feedback!

Also for rendering, shall we make it possible to render via the Env.lua class by passing in a render boolean in the opts? I can get to work on that if you think it would be good!

Kaixhin · 2016-10-24T16:16:50Z

rlenvs/Env.lua

 Env:mustHave('_step')
 Env:mustHave('getStateSpace')
 Env:mustHave('getActionSpace')
 Env:mustHave('getRewardSpace')

+function Env:_init(opts)
+ if opts.timeStepLimit and opts.maxSteps then
+ self.maxSteps = math.min(opts.timeStepLimit, opts.maxSteps)


Is this how gym does things? To me it makes more sense that if the user specifies maxSteps, then this should override any defaults. And then if timeStepLimit is available, that should be used. Then 1000. Makes the logic a little bit simpler too.

I'll double check but I do think this is correct (in terms of same as openAI-gym), however since I think this is mainly done for leaderboard purposes (as Kory said) I think we could make an exception here, what do you think?

Kaixhin · 2016-10-24T16:18:41Z

@korymath Rendering would be nice - any suggestions for how best to integrate it with gym/twrl?

korymath · 2016-10-24T16:28:31Z

A boolean in the opts would be a good means by which to pass the option, but is perhaps not the full solution.

My inclination is that this would allow for a strictly Lua RL infrastructure, free of connection to gym, and thus the rendering could be completely separate from that rendering. Having not done much graphics work in Lua, I am not sure how to best make this happen.

One potential means would be to save the transitions, and then format them in such a way that they can be loaded and replayed in the gym infra (that is, if the environments are exactly the same as those in gym, but that may be limiting, and require offline processing in between learning and display).

I would argue, that, while rendering is nice, it is not critical to learning (unless there is a learning signal derived from pixels, or a human observer of the visualization, or maybe a third case?). Thus, even a basic render of the learning performance could be generated every N steps, much like how the OpenAI gym allows for a video to be rendered.

Nice works on the max steps discussion, that is an interesting quirk of the OpenAI gym code, that i think is based on the online leaderboard, and the desire for shorter episode lengths for direct comparison.

SeanNaren · 2016-10-24T17:14:12Z

I added a :render() method which will check to make sure that a :getDisplay() method exists to retrieve a displayable screen and render it using qt (check the Env.lua class). I've also updated the experiment.lua class, check it out!

korymath · 2016-10-25T18:24:01Z

This looks great!

Thoughts on how to handle execution on a server? Maybe similar to gym and
allow creation of a 'fake' display for output?

On 24 October 2016 at 11:14, Sean Naren [email protected] wrote:

I added a :render() method which will check to make sure that a
:getDisplay() method exists to retrieve a displayable screen (check the
Env.lua class). I've also updated the experiment.lua class, check it out!

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8 (comment), or mute
the thread
https:/notifications/unsubscribe-auth/AAK3s7JfHSknlRmkY3I3CCXuN3Qhz5SXks5q3OdkgaJpZM4KLzuC
.

Kory Mathewson

SeanNaren · 2016-10-25T18:49:21Z

Thanks @korymath! That's an interesting idea, maybe similar to how the gym API is currently interfaced with? If this is deemed necessary I could start thinking how to implement this (probably out the scope of this PR)

korymath · 2016-10-25T18:55:29Z

Sounds great!

Agree it is probably another large feature build.

On 25 October 2016 at 12:49, Sean Naren [email protected] wrote:

Thanks @korymath https:/korymath! That's an interesting
idea, maybe similar to how the gym API is currently interfaced with? If
this is deemed necessary I could start thinking how to implement this
(probably out the scope of this PR)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8 (comment), or mute
the thread
https:/notifications/unsubscribe-auth/AAK3s7h1dASukqgzOELIIlAC_g95a2raks5q3k8ygaJpZM4KLzuC
.

Kory Mathewson

Kaixhin · 2016-10-26T09:38:31Z

rlenvs/Env.lua

+
+function Env:render()
+ if self.qt and self.getDisplay then
+ self.window = self.window == nil and image.display({ image = self:getDisplay(), zoom = 10 }) or self.window


zoom should be part of opts and handled by Env - 10 was hard-coded for Catch, but the default of 1 is appropriate for Atari.

Shall I set the default to 10 for Catch or leave this exposed for the user as well?

Kaixhin

No need to set a default zoom for Catch - just have it set as part of the opts in experiment.lua.

SeanNaren · 2016-11-09T11:11:47Z

Seems like the merge of the branch replayed the commits on top... once I've finished with XOWorld I think I'll rebase the branch and open a new PR, is that alright?

Also there is a small issue here, comma missing at the end :)

EDIT: XOWorld is an awesome environment!! Should be fully implemented now and should wrap up everything. Just thinking about unit-tests over here to try make sure everything works correctly now!

Kaixhin · 2016-11-09T11:43:08Z

Thanks for spotting that - now fixed on v1 and merged into master.

Yes a rebase would clean this up, good job for suggesting :) There shouldn't be any more changes due now.

SeanNaren · 2016-11-09T12:08:26Z

Thanks, in terms of unit tests any suggestions on functionality to verify and check?

Kaixhin · 2016-11-09T12:56:04Z

Good question. mustHave already helps a little bit. Seeing what I just did, maybe static syntax checking would be useful? Going further, argcheck might be worth adding in.

I suppose you could test an environment by loading it, starting it, and doing at least one step. Not sure if there's anything that needs to be done if someone tries to step before start. Checking maxSteps works really depends on the environment (e.g. Cartpole will fail quickly with a random agent), unless you manually set it to a very low number.

I started thinking about testing Atari, but there's licensing issues with ROMs. I thought maybe there might be a homebrew ROM that we could include in the repo, but then it'd need to be coded into ALE...

korymath · 2016-11-10T00:58:26Z

Feel free to use torch-twrl as a testing base if you want. It has some basic testing functionality built up here: https:/twitter/torch-twrl/blob/master/test/test-gym.lua

It runs a few tests for different environments, the runTest code is from the gym-http-api, you can see the full extent of it here: https:/openai/gym-http-api/blob/master/binding-lua/test_api.lua It is a pretty verbose test that runs through the RL environment functionality.

It is quite easy to find the ROMs if you are particularly motivated.

SeanNaren · 2016-11-14T13:41:03Z

Thanks for that, I can definitely try imitate those tests for all the rlenvs environments!

EDIT: Also saw malmo added (which is awesomesauce!) I'll pull this in and get this implemented as well.
EDIT2: I will wait!

Kaixhin · 2016-11-14T13:43:32Z

@SeanNaren I'm trying to get Malmo working myself, so bear with me a while. New plan is to target v1, then merge into master (they're now the same, even though the v2 rockspec shouldn't be in v1, but oh well). I'll aim to get it done by the end of today, but will report back once I've got it working successfully.

Kaixhin · 2016-11-14T14:45:31Z

OK that was a bit of a mess but I seem to have knocked down the requirements nicely and condensed the code. Had a quick look, but couldn't get qlua working - the screen does however display in the running client. Also couldn't get it working with Atari because of some issue with luaposix, which is a shame, but I won't have time to investigate further.

SeanNaren · 2016-11-17T10:30:24Z

Right I think that's minecraft handled (I need to still test that it works, will do once I'm home on my own machine) and now just the tests.

Currently I just run through every environment but need to add some assertions here, however not sure what I could assert on. In the torch-twrl tests we assert on a successful run through (return true as long as everything runs correctly).

korymath · 2016-11-17T20:58:28Z

Great work!

You can do simple assertions on type of variables response, or an assertion
range.

ex:

runTest could return an error code with a pcall, if the return is nice then
assert on that.

On 17 November 2016 at 03:30, Sean Naren [email protected] wrote:

Right I think that's minecraft handled (I need to still test that it
works, will do once I'm home on my own machine) and now just the tests.

Currently I just run through every environment but need to add some
assertions here
https:/Kaixhin/rlenvs/blob/twrl/tests/test.lua, however not
sure what I could assert on. In the torch-twrl tests we assert on a
successful run through (return true as long as everything runs correctly).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#8 (comment), or mute
the thread
https:/notifications/unsubscribe-auth/AAK3syevaHnwcMmWAfxEET18at7EHqVIks5q_CzBgaJpZM4KLzuC
.

Kory Mathewson

SeanNaren · 2016-11-25T10:25:39Z

Added assertions, check them out! I also rebased the branch into one nice commit, check here. I think the changes are done here so it be great if you could have a quick look to let me know if anything needs changing!

What do we plan to merge into? I can then open the PR on the rebased branch.

Kaixhin · 2016-11-25T14:30:08Z

Unfortunately I'll be away this weekend so can't look at it till next week, but the plan is to merge this into master and update the readme to show installing the v2 rockspec. It's a bit messy, but then we can look at the v1 branch and make sure that is fine (e.g. by deleting the v2 rockspec inside that branch, but hopefully that's it).

This way anyone who's still using the old API (e.g. the Atari repo) can still do so by requiring the v1 rockspec. :) No need to backport new environments unless we're requested to. twrl will of course use v2.

SeanNaren · 2016-11-25T15:42:39Z

Sounds like a plan and no worries, whenever you get time!

Kaixhin · 2016-11-28T17:08:03Z

Looks good! Firstly, can you merge the latest Minecraft stuff (both in the env and the readme)? Secondly, if you could use 2-space indentation for consistency that'd be awesome. I'll have another look after and then we should be able to merge twrl-rebase into master.

SeanNaren · 2016-11-28T18:59:05Z

Having issues merging master into the rebased branch. Can you see any commits from master not in the rebased branch?

Kaixhin · 2016-11-28T21:36:15Z

Ah sorry the latest work should be in v1, which should be merged into twrl-rebase, which should then eventually be merged into master.

SeanNaren · 2016-11-30T11:57:56Z

This is continued in #14

Modified specs to follow gym API

70eb229

SeanNaren added 2 commits October 2, 2016 13:01

Update function names

a56188c

Experiment uses new specs

1fe33fa

Changed function calls to getters

522b8a4

table with all rlenv envs

9e92921

Update function calls, added new steps method, updated README

1c276f2

Added timestep limits, exposed start function added _start function t…

63cf6db

…o environments

Kaixhin reviewed Oct 24, 2016

View reviewed changes

Added render

c20c8b8

Kaixhin reviewed Oct 26, 2016

View reviewed changes

Added zoom option

a9ea21e

Kaixhin reviewed Oct 26, 2016

View reviewed changes

SeanNaren added 2 commits October 26, 2016 13:18

Added zoom variable to experiment

430a2b4

fixed variable name

358a8c7

SeanNaren added 3 commits November 9, 2016 11:12

Modified XOWorld to new api standards

c1f3b16

1 channel for XOWorld

9114305

Added super call

796bc58

Merge branch 'master' into twrl

eb8ea9b

SeanNaren added 2 commits November 14, 2016 15:04

Added base tests

52ad76d

Merge remote-tracking branch 'origin/master' into twrl

1096f31

Kaixhin mentioned this pull request Nov 14, 2016

Fix terminal state condition #12

Merged

SeanNaren added 2 commits November 17, 2016 10:06

Modified Minecraft functions to support api

59da7cd

Exclude minecraft from tests

2d23ed7

Added assertions to tests

55f90eb

SeanNaren mentioned this pull request Nov 30, 2016

Twrl-rebased #14

Merged

SeanNaren closed this Nov 30, 2016

SeanNaren deleted the twrl branch December 15, 2016 10:08

Modify envs to be compatible with twrl #8

Modify envs to be compatible with twrl #8

Conversation

SeanNaren commented Oct 1, 2016

Kaixhin commented Oct 1, 2016

SeanNaren commented Oct 2, 2016

Kaixhin commented Oct 2, 2016 • edited Loading

SeanNaren commented Oct 19, 2016 • edited Loading

Kaixhin commented Oct 20, 2016 • edited Loading

SeanNaren commented Oct 20, 2016

Kaixhin commented Oct 20, 2016

SeanNaren commented Oct 21, 2016

Kaixhin commented Oct 21, 2016

SeanNaren commented Oct 24, 2016 • edited Loading

Kaixhin Oct 24, 2016

Choose a reason for hiding this comment

SeanNaren Oct 24, 2016

Choose a reason for hiding this comment

Kaixhin commented Oct 24, 2016

korymath commented Oct 24, 2016

SeanNaren commented Oct 24, 2016 • edited Loading

korymath commented Oct 25, 2016

SeanNaren commented Oct 25, 2016

korymath commented Oct 25, 2016

Kaixhin Oct 26, 2016

Choose a reason for hiding this comment

SeanNaren Oct 26, 2016

Choose a reason for hiding this comment

Kaixhin left a comment

Choose a reason for hiding this comment

SeanNaren commented Nov 9, 2016 • edited Loading

Kaixhin commented Nov 9, 2016

SeanNaren commented Nov 9, 2016

Kaixhin commented Nov 9, 2016

korymath commented Nov 10, 2016

SeanNaren commented Nov 14, 2016 • edited Loading

Kaixhin commented Nov 14, 2016

Kaixhin commented Nov 14, 2016

SeanNaren commented Nov 17, 2016

korymath commented Nov 17, 2016

SeanNaren commented Nov 25, 2016 • edited Loading

Kaixhin commented Nov 25, 2016

SeanNaren commented Nov 25, 2016

Kaixhin commented Nov 28, 2016

SeanNaren commented Nov 28, 2016

Kaixhin commented Nov 28, 2016

SeanNaren commented Nov 30, 2016

Kaixhin commented Oct 2, 2016 •

edited

Loading

SeanNaren commented Oct 19, 2016 •

edited

Loading

Kaixhin commented Oct 20, 2016 •

edited

Loading

SeanNaren commented Oct 24, 2016 •

edited

Loading

SeanNaren commented Oct 24, 2016 •

edited

Loading

SeanNaren commented Nov 9, 2016 •

edited

Loading

SeanNaren commented Nov 14, 2016 •

edited

Loading

SeanNaren commented Nov 25, 2016 •

edited

Loading