Ubisoft Builds New AI Algorithm that Uses Reinforcement Learning to Teach Driving to Itself

Game developers are attempting to capitalize on machine learning (ML), and Ubisoft has gone a long way towards popularizing ML and reinforcement learning (RL). One big advantage machine learning holds over traditional games that use scripting is that ML based games can learn from the player and react in a more dynamic and realistic fashion, including ways to use reinforcement learning to teach cars to drive themselves.

One primary example of a game that showcases the uses of machine learning in video games is Sega and Creative Assembly’s Alien: Isolation. This title uses ML for the first AI behind the Xenomorph that stalked the player, although the beginning act featured more scripting as ML requires a human to help teach the ML agent in order to react dynamically. In this case the ML agent is the Xenomorph and the human teaching the ML is the player itself.

Ubisoft has a history with AI-powered development tools. Ubisoft started implementing machine learning on a much larger scale starting with Assassin’s Creed: Origins with the goal of more accurately recreating Ptolemaic Egypt. The process was time-consuming as developers had to teach the ML to recognize and use hieroglyphics, and there were many bugs that needed to be ironed out. Ubisoft has developed an AI tool called “Commit Assistant” that detects bugs and implements fixes with little oversight.

Ubisoft La Forge, Ubisoft’s prototyping space, dedicated resources to researching and developing reinforcement learning. Machine learning is notably slower than RL as it requires much more training data to learn how to react to scenarios. RL is able to learn, to teach itself, and can implement previous ideas in a continuous world with variables that are unaccounted for. Contrarily, ML only learns from specific variables.

La Forge is attempting to make reinforcement learning more practical to implement in video games. Most RL focuses on environments where AI agents preform and perfect continuous or discrete actions. A continuous action is one that is repeated indefinitely, while a discrete action has a definite start and end. As in the case with driving, the continuous action would be steering and accelerating, while the discrete action would be braking.

The issue with video games using reinforcement learning is that most RL AIs preform either continuous or discrete actions, but video games often need to employ both. Games either have specific scenarios or they have systems that use both discrete and continuous action types.

La Forge proposed hybrid AI algorithm based off the work at Berkeley University of California on the Soft-Actor Critic AI. The new AI algorithm is called Hybrid Soft Actor-Critic (Hybrid SAC) and can possibly fix common issues with reinforcement learning. Soft Actor-Critic (SAC) is a model-less RL algorithm. A model for a ML/RL is a mathematical equation that represents scenarios. Variables are given a numerical value which is processed though the algorithm.

SAC was created for preforming continuous tasks, but not designed for discrete types. La Forge’s Hybrid SAC extended the original SAC to preform continuous, discrete, and mixed actions. La Forge tested the new algorithm by having a car be the agent. The objective is for the agent car to drive along a given path as fast as possible. The agent was given two continuous actions—steering and driving—while the E-brake was the discrete action. The discrete action was the key actor for the agent staying on the path while going at such high speeds.

We showed that Hybrid SAC can be successfully applied to train a car on a high-speed driving task in a commercial video game.

In La Forge’s paper on practical reinforcement learning in video games, the researchers noted that their approach to Hybrid SAC can allow for a wide range of possible interactions that agents can have with their environments. This can potentially open up AIs playing games to have as much freedom as human players who can manually input continuous and discrete actions at the same time, such as running and jumping. Hybrid SAC could set a new benchmark in reinforcement learning for video games as well as other possible applications which are yet to be tested.

Researchers at La Forge offered advice to developers using the Hybrid SAC. They advise to start with basic frameworks and to figure out what actions are continuous and discrete, then to keep the algorithm simple with a few continuous actions that have dependency on a discrete action. They finally suggest duplicating dependencies as much as possible to make it close to independent.

Hybrid SAC was tested in a “commercial game,” and VentureBeat speculates that The Crew or The Crew 2 uses La Forge’s reinforcement learning research, though the commercial title is undisclosed.  It has not been confirmed as of yet that Hybrid SAC is planned to be implemented in any upcoming releases, but it could be in Watch Dogs: Legion, which is expected to release on  March 3rd, 2020.

 

Griffin Gilman: Gaming may very well be half of my personality, so it is only natural that I write about them. The best genre is RPGs, while the best game is Nier Automata. That's not an opinion but a matter of facts.
Related Post