We are committed to working to help make Unity the go-to platform for Artificial Intelligence (AI) research. In the past few weeks, we’ve seen research groups taking notice, with OpenAI using Unity to help train a robot hand to perform a grasping task, and a group at UC Berkeley using it to test a new Curiosity-based Learning approach. Today we are happy to share the next wave of improvements and resources to help fulfill our mission of supporting the AI research community.
This release includes a new version of the ML-Agents toolkit (v0.5) with more flexible action specification and curricula, a research paper we’ve written on ML-Agents and the Unity platform, a Gym interface for researchers to more easily integrate ML-Agents environments into their training workflows, and a new suite of learning environments which replicate some of the Continuous Control benchmarks used by many Deep Reinforcement Learning researchers.
With the growing adoption of Unity and the ML-Agents toolkit as a research platform, we have received numerous requests for a paper outlining the platform which can be referenced. We are happy to finally release a preprint of our paper “Unity: A General Platform for Intelligent Agents,” which is now available now on arXiv. Within this reference paper, we describe our vision for Unity as a simulation platform that builds on and extends the capabilities of other similar platforms and outline the ML-Agents toolkit as a research tool, discussing both the fundamental design, as well as the additional features. We also provide benchmark results on our example environments using the Proximal Policy Optimization algorithm, as well as a small cohort of Unity employees who provided a human-benchmark against which to compare. With these baselines in place, we look forward to seeing research groups outperform our results, and achieve “superhuman” performance on as many of the example environments as possible.
When we first released the ML-Agents toolkit, we provided a custom Python API for interacting with learning environments. We did this because we wanted to provide a powerful and flexible way of interacting with our environments which wasn’t limited by pre-existing conventions. This allowed us to enable scenarios involving multi-agent and multi-brain learning with complex mixed observation spaces. Many in the research community have asked about the availability of a gym wrapper for these environments as well. For those unfamiliar, gym is a standardized and popular means of interacting with simulation environments. As such, we are happy to share that we’ve created a gym interface which can be used to interact with Unity environments. If you are a researcher who has built an experimentation pipeline around using gyms, this means you will be able to easily swap out other gym environments for Unity ones. To learn more about the gym interface, see our package page.
For the past year, one of our community members, Joe Booth, has been working on re-implementing the classic set of Continuous Control benchmarks typically seen in Deep Reinforcement Learning literature as Unity environments using the ML-Agents toolkit. The environments include the Walker, Hopper, Humanoid, and Ant based on the environments available in the DeepMind Control Suite and OpenAI Gym. Collectively we are calling these the Marathon Environments since in all cases the goal is for the agents to learn to run forward as quickly and consistently as possible. We are providing these for the research community to use as an easy way to get started with benchmarking algorithms against these classic tasks.
They were made possible by the contributions of Joe Booth, a veteran in the games industry turned Machine Learning researcher. Click here to download and get started with the environments yourself. Read below to hear from Joe himself on how he made this possible.
In Joe’s own words…
“I wanted to see if the research on continuous control and locomotion from OpenAI, DeepMind and others would transfer to a modern day game engine such as Unity and PhysX. Imagine a future where a game designer could input a YouTube URL of a desired animation and the AI would mimic this while dynamically reacting to a changing environment - super cool! By creating a framework to reproduce these benchmarks within Unity one is able to take incremental steps.
When implementing a paper or novel idea, one can first test on a simple model such as Hopper and have confidence that results will scale up to more complex models such as Walker or Humanoid. You can see how I use this incremental approach in my research into mastering dynamic environments, controllers, and style transfer.
I’m excited to see how others will use Marathon Environments. With the addition of Gym we have the opportunity to bring many new bleeding edge algorithms to ML-Agents such as HER or MAML and I would gladly support or partner in these efforts.”
Expanded Discrete Action Space - We have changed the way discrete action spaces work to allow for agents using this space type to make multiple action selections at once. While the previous versions of ML-Agents only allowed agents to select a single discrete action at a time, v0.5 allows you to create action branches for your agent. Each branch can contain a different fixed number of possible actions which can be selected from. During runtime, the agent will choose one action for each branch when a decision is requested. Concretely, this means that it is now possible for an agent to both move in a chosen direction and jump, as is now the case in WallJump. We have also modified the BananaCollector environment, making it possible for the agents in that environment to move, turn, and potentially fire their laser.
Discrete Action Masking - Under certain circumstances, an agent should be disallowed from performing a specific action. For example, if an agent is in the air, it does not make sense for it to jump. In v0.5, you can now specify impossible actions for the next decision to your agents. When collecting observations, you can optionally specify one or more impossible actions for each action branch of the agent. The agent will not try to perform any of those actions at the next decision step. This is meant to make it easy to prevent agents from doing impossible actions without the need for additional code in the action methods.
We modified the GridWorld environment to mask the actions that would involve the agent attempting to walk into a wall. By doing this, the agent does not lose time when exploring the grid and learns a lot faster. See the figure below for a comparison of the learning process with and without action masking.
Meta-Curriculum - Curriculum learning is a great feature that allows you to create environments that get increasingly hard as the agent progresses. We introduced this feature in v0.2 and we have improved it for v0.5. You can now use meta-curriculums which enables you to create curriculum scenarios in environments that use multiple Brains. You can do so by specifying a curriculum for each Brain independently. You can now create a learning environment where multiple kinds of agents can learn, at their own pace. We included curriculum examples for the WallJump environment.
We look forward to seeing how the research community takes advantage of this collection of new resources and improved features. Going forward, we plan to continue to support both the research and game developer communities with our work and releases. If you have comments, feedback, or questions, feel free to reach out to us on our GitHub issues page, or email us directly at firstname.lastname@example.org. Happy training!