Each of these cars is controlled by an Artificial Intelligence (AI) in the racing game Trackmania. This AI is not very intelligent yet.. But that’s normal : it has just started to learn. In fact, I want to use a method called Reinforcement Learning to make this AI learn by itself How to drive as fast as possible. I also want it to become intelligent enough to master various combinations of turns without ever falling off the road. And to ensure this, the AI will have to pass a final challenge : to complete this giant track. But first of all, How is a simple computer program supposed to learn things ? It’s not the first time I’m experimenting with AI in Trackmania. And to achieve this, i’m using a method called Machine Learning. First, I’m running a program that controls the car in-game to make it turn and accelerate. So the AI Can choose between 6 different actions. But how can it decide which action to take ? The AI needs to get information about the game. It receives that in the form of numbers called inputs. Some inputs describe the state of the car, such as its current speed and acceleration. Others indicate how the car is positioned on the road section it’s currently crossing. And the last inputs indicate what’s further ahead. This is now what the AI sees when playing. But how can it interpret that ? It needs to sort of use this data in an intelligent way. To link inputs To the desired action, the AI is going to use a neural network, which basically acts like a brain. Now, all that remains is to parameterize the neural network so that it results in fast driving. And that’s where Machine Learning comes into play. As I said earlier, the objective here Is that the AI learns to drive by itself. So it will have to experiment with different strategies, through trial and error, to progressively select the neural network that leads to the best driving. One way to do this would be to use a genetic algorithm. I’ve already tried that in Trackmania and it works fairly well. Basically, the idea is to start with a population of several AIs, each with its own neural network. All AIs compete on the same map, and the best ones are selected and reassembled through a process similar to Natural Selection. This can be repeated for many generations, to get a better and better neural network. One problem with this method is that you only compare the different AIs based on their end result. To make an AI progress, it might be better to give it feedback on what it did Well or not so well during the race. So it’s time to try something else : Reinforcement Learning. And this goes with a crucial idea : the concept of reward. This time, the AI has only one goal in mind : to get as many rewards as possible. The idea of reinforcement learning is to learn to pick the action that brings the most reward, in any situation. In fact, this is quite like a pet being trained, which will interpret pleasure or food intake as a positive reinforcement. But in Trackmania, there is no food. So how can we define Rewards ? the AI can take 10 actions per second. Each action will be associated with a reward equal to the distance traveled up to the next action. So the faster the AI goes, the more rewards it gets. If the AI ever tries to go the wrong way, it will receive a punishment, Which is actually just a negative reward. And if the AI falls off the road, it will be directly punished by a zero reward, but also indirectly by stopping the race. Which means no more rewards. Now, it’s time to start training. To learn which inputs and actions lead to which reward, The AI must first gather information about the game. This is the exploration phase. the AI simply takes random actions and doesn’t use its neural network for the moment. The runs are driven one by one. And after a thousand of them, here is what the AI has explored of the map so far. Each line corresponds to one race trajectory. the AI has already collected plenty of data about the rewards it can expect to get for various sets of inputs and actions. Now, it’s time to use this data to train its neural network. This is the role of the reinforcement learning algorithm. There Are many different variants of this method and here I chose to use one called Deep Q Learning. Basically, for a given set of inputs, the role of the neural network is to predict the expected Reward for each possible action. But which reward are we talking about ? is it an immediate one ? In Trackmania, although some actions may result in an immediate positive reward, they may have negative consequences in the long run. Sometimes, it may be useful to sacrifice short-term incomes, For example by slowing down when approaching a turn, in order to gain more long-term reward. the AI therefore needs to consider the long-term consequences of each action. To achieve this, the AI tries to imagine the cumulative reward that it’s most likely to obtain in the future. Although the long term is important, an action still has more impact in the short term. Thus, the events in the immediate future are weighted more. So each time the AI gets inputs, its neural network tries to predict the expected cumulative reward for each possible action. And the AI just Selects the one with the highest value. Let’s resume training where we left off. In parallel to driving, the AI is continuously trying to improve its neural network with the data it collects. But by only doing random exploration, the AI ends up not having much new to learn. Instead Of just exploring, it’s time for the AI to also start exploiting the knowledge it has acquired, meaning using its neural network instead of just acting randomly. the AI is still a bit immature though, to only rely on its neural network. If it does too much exploitation, it Will just experience the same things over and over again, which will not teach it much. For now, I’m setting the proportion of exploration at 90%, and I’ll decrease it progressively during training. After more than 20 000 attempts on this map, here is the best run the AI has done so far. The AI Drives quite carefully, and it’s not too bad for a start ! It has definitely learned something. Going further into the map, it seems a bit more complicated, and the AI ends up falling. Time to get back to training ! At this point, you might think that the AI hasn’t learned much, After training on the same map for so many hours. But I think it’s quite normal. Reinforcement learning is known to require a large number of iterations to work. The time displayed here is in-game time. Fortunately, training is faster in practice, Since I can increase the game speed using a tool called TMInterface. This project would probably not have been possible without this tool, so a big thanks to Donadigo, its developer. The AI has made some nice progress. The driving style it learned in the first turns Seems to apply well to the following ones, which shows a good capacity of generalization. The AI has now reached a 5% exploration, which I will not decrease further. It seems that the AI is stuck and can no longer progress. Here is its current personal best. In the first part of the map, the AI shows very little hesitation. This first portion has a lot of turns and short straights. But then the AI arrives in a new section with mainly long straight lines. Its driving becomes a little sketchy. At one point, it even stops, as if it’s afraid to continue. After a long minute, it finally decides to continue, and dies. The AI seems to have difficulty adapting to this new type of road. Or maybe it just needs more time. To be sure, I decided to push the training a little longer. After 10 000 more attempts, the AI hasn’t made much progress. It still has a lot of trouble with long straight lines. There may be several reasons for this, but I think the main one is overfitting, which is common in machine learning. In the exploration phase, the AI practiced the same first Few turns over and over again. Its neural network became a specialist of this kind of trajectories, learning them almost by heart, as if nothing else existed. But when the AI faces a new situation, the driving style it learned in the past is no longer appropriate : it needs to adapt. In a way, Adapting means questioning everything it has learned in the past. If the AI tries to drastically change its strategy to adapt to this new roads, it risks to break everything that was working for the first few turns. When there is overfitting, there is no generalization. So what’s the solution ? Maybe the AI could drive each run on a different map, to constantly learn new things. But at this point, I really don’t want to spend hours building dozens of different maps. So, I’m gonna do things differently. I’m going to restart training from the beginning. But now, Each time the AI will start a new run, it will spawn at a random location on the map, with a random speed and a random orientation. This should limit overfitting, since the AI will be forced to consider many different situations from the beginning. This time, the AI is learning way faster. However, perhaps the AI managed to cover long distances just because it spawned in easy sections of the map. The real challenge is still to complete the track from start to finish. From now on, I will regularly test the AI outside of training, On a normal race. Outside of training, I remove any exploration to optimize the AI’s performance. I also increase the action frequency from 10 to 30 per second. The AI is able to drive in all sections of the map, so there is clearly less Overfitting this time ! Now, the AI only has to combine everything in one run. In this attempt, the AI manages to surpass its previous record, going further than ever. But it fails within 500 meters of the finish. It has never been so close to finish this map. And finally, a few attempts later, and after 53 hours of training, AI gets this run. The AI was able to complete 230 turns without ever falling. Sounds good, but is the AI fast ? Now, it’s my turn to drive, to compare. After a few attempts, I made a run of 4 minutes and 44 seconds. Without using the brake of course, for a fair comparison. So yeah, the AI is not very fast. But training is not over ! Now, the AI has one goal : to finish this map as fast as possible. 6 minutes and 28 seconds. After this run, I continued training, and the AI kept getting slightly faster on average, more consistent too, but it never managed to beat its personal best. With this version of its neural network, the AI drives quite aggressively, and takes most Turns very sharply. It’s quite surprising to see it survived the whole race with such a driving style. But it’s the best the AI has found. Perhaps there is still a way to improve the AI’s record One last time, still with the same neural network. If I randomly force some actions of the AI at the beginning, here, the AI will have to adapt to this small perturbation. And this is the start of a completely different run. Now, I can repeat this a few hundred times to see what happens. And Here is the final improvement of AI’s record. Not a big improvement, but it was visually worth it ! There is still a big gap with human performance, but I’m still very happy with the result. Trackmania is a game that requires a lot of practice, even for humans, And from my experience I’m pretty sure this AI could beat a good amount of beginners. If there’s anything AI is doing well, it’s generalization. It can adapt to any new map with a similar road structure. I even tried to change the road surface to see if it could drive on grass, And AI is doing quite well ! Same thing on dirt, even though the AI has never experienced these surfaces during training. But can it still survive on a new map, with a mix of road dirt and grass surfaces, and a few slopes and obstacles ? So yeah of course there is room to improve this AI. But with reinforcement learning, it seems that the main limitation is always the same : training time. Even with a tool to increase game speed. That’s why I never venture into more complex maps, and that’s why I try to Limit any complexity in general : few inputs, no breaks, not too many actions per second, and so on. Anyway for now, the AI has deserved to rest after those long hours of training. And maybe it will be back one day, with new surprises ! Video Information
This video, titled ‘A.I. Learns to Drive From Scratch in Trackmania’, was uploaded by Yosh on 2022-03-12 15:00:22. It has garnered 7090976 views and 106348 likes. The duration of the video is 00:16:51 or 1011 seconds.
I made an A.I. that teaches itself to drive in the racing game Trackmania, using Machine-Learning. I used Deep-Q-Learning, a Reinforcement Learning algorithm.
Again, a big thanks to Donadigo for TMInterface !
Contact : Discord – yosh_tm Twitter – https://twitter.com/yoshtm1