There are various novel training strategies that we can employ with multiple agents and/or brains in an environment, from adversarial and cooperative self-play to imitation and curriculum learning. In this tutorial, we will look at how to build multi-agent environments in Unity as well as explore adversarial self-play.
This tutorial is an excerpt taken from the book ‘Learn Unity ML-Agents – Fundamentals of Unity Machine Learning’ by Micheal Lanham.
Let’s get started!
A multi-agent environment consists of multiple interacting intelligent agents competing against each other, thereby, making the game more engaging. It started out as just a fun experiment for game developers, but as it turns out, letting agents compete against themselves can really amp up training.
There are a few configurations we can set up when working with multiple agents. The BananaCollector example we will look at, uses a single brain shared among multiple competing agents. Open up Unity and follow this exercise to set up the scene:
python python/learn.py python/python.exe --run-id=banana1 --train
You will notice that the mean and standard deviation of reward accumulates quickly in this example. This is the result of a few changes in regards to reward values for one, but this particular example is well-suited for multi-agent training. Depending on the game or simulation you are building, using multi-agents with a single brain could be an excellent way to train.
Feel free to go back and enable multiple environments in order to train multiple agents in multiple environments using multiple A3C agents. Next, we will look at another example that features adversarial self-play using multiple agents and multiple brains.
The last example we looked at is best defined as a competitive multi-agent training scenario where the agents are learning by competing against each other to collect bananas or freeze other agents out. Now, we will look at another similar form of training that pits agent vs. agent using an inverse reward scheme called Adversarial self-play. Inverse rewards are used to punish an opposing agent when a competing agent receives a reward. Let’s see what this looks like in the Unity ML-Agents Soccer (football) example by following this exercise:
python python/learn.py python/python.exe --run-id=soccor1 --train
The StrikerBrain is currently getting a negative reward and the GoalieBrain is getting a positive reward. Using inverse rewards allows the two brains to train to a common goal, even though they are self-competing against each other as well. In the next example, we are going to look at using our trained brains in Unity as internal brains.
It can be fun to train agents in multiple scenarios, but when it comes down to it, we ultimately want to be able to use these agents in a game or proper simulation. Now that we have a training scenario already set up to entertain us, let’s enable it so that we can play soccer (football) against some agents. Follow this exercise to set the scene so that you can use an internal brain:
This is a great example and quickly shows how easily you can build agents for most game scenarios given enough training time and setup. What’s more is that the decision code is embedded in a light TensorFlow graph that blazes trails around other Artificial Intelligence solutions. We are still not using new brains we have trained, so we will do that next.
Here, we will use the brains we previously trained as agent’s brains in our soccer (football) game. This will give us a good comparison to how the default Unity trained brain compares against the one we trained in our first exercise.
We are getting to the fun stuff now and you certainly don’t want to miss the following exercise where we will be using a trained brain internally in a game we can play:
The first thing you will notice is that the agents don’t quite play as well. That could be because we didn’t use all of our training options. Now would be a good time to go back and retrain the soccer (football) brains using A3C and other options we covered thus far.
In this tutorial, we were able to play with several variations of training scenarios. We started by looking at extending our training to multi-agent environments that still used a single brain. Next, we looked at a variety of multi-agent training called Adversarial self-play, that allows us to train pairs of agents using a system of inverse rewards.
If you found this post useful, and want to learn other methods such as imitation and curriculum learning, then be sure to check out the book ‘Learn Unity ML-Agents – Fundamentals of Unity Machine Learning’.
Implementing Unity game engine and assets for 2D game development [Tutorial]
Creating interactive Unity character animations and avatars [Tutorial]
Unity 2D & 3D game kits simplify Unity game development for beginners
I remember deciding to pursue my first IT certification, the CompTIA A+. I had signed…
Key takeaways The transformer architecture has proved to be revolutionary in outperforming the classical RNN…
Once we learn how to deploy an Ubuntu server, how to manage users, and how…
Key-takeaways: Clean code isn’t just a nice thing to have or a luxury in software projects; it's a necessity. If we…
While developing a web application, or setting dynamic pages and meta tags we need to deal with…
Software architecture is one of the most discussed topics in the software industry today, and…