Categories
dev Locomotion The Sentient Table

Spinning Up

OpenAI Spinning Up https://spinningup.openai.com/en/latest/

So I’ve got it working, OpenAI’s PPO. https://github.com/openai/spinningup/issues/142 – Needs a workaround to run your own envs.

But I can’t work out how to increase the exploration “factor”.

It’s some sort of application of a Gaussian distribution of noise, I think, which is the simple idea.

It looks like clipratio is maybe what i need.

hmm but ppo don’t want it.

parser.add_argument('--env', type=str, default='HalfCheetah-v2')
parser.add_argument('--hid', type=int, default=64)
parser.add_argument('--l', type=int, default=2)
parser.add_argument('--gamma', type=float, default=0.99)
parser.add_argument('--seed', '-s', type=int, default=0)
parser.add_argument('--cpu', type=int, default=4)
parser.add_argument('--steps', type=int, default=4000)
parser.add_argument('--epochs', type=int, default=50)
parser.add_argument('--exp_name', type=str, default='ppo')


def ppo(env_fn, actor_critic=core.mlp_actor_critic, ac_kwargs=dict(), seed=0,
        steps_per_epoch=4000, epochs=50, gamma=0.99, 


clip_ratio=0.2, 


pi_lr=3e-4,
        vf_lr=1e-3, train_pi_iters=80, train_v_iters=80, lam=0.97, max_ep_len=1000,
        target_kl=0.01, logger_kwargs=dict(), save_freq=10):

https://github.com/openai/spinningup/issues/12

ok so i tried SAC algo too, and the issue i have now is

(, AttributeError(“‘list’ object has no attribute ‘reshape’”,), )

So the thing is the dimensionality

“FetchReach environment has Dict observation space (because it packages not only arm position, but also the target location into the observation), and spinning up does not implement support for Dict observation spaces yet. One thing you can do is add a FlattenDictWrapper from gym (for example usage see, for instance,

env = FlattenDictWrapper(env, [‘observation’, ‘desired_goal’])

Spinning Up implementations currently only support envs with Box observation spaces (where observations are real-valued vectors). These environments have Dict observation spaces, so each obs is a dict of (key, vector) pairs. If you want to test things out in these envs, I recommend doing it as a hacking project! 🙂 “