Reinforcement learning makes for shitty AI teammates in co-op games


This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Artificial intelligence has proven that complicated board and video games are no longer the exclusive domain of the human mind. From chess to Go to StarCraft, AI systems that use reinforcement learning algorithms have outperformed human world champions in recent years.

But despite the high individual performance of RL agents, they can become frustrating teammates when paired with human players, according to a study by AI researchers at MIT Lincoln Laboratory. The study, which involved cooperation between humans and AI agents in the card game Hanabi, shows that players prefer the classic and predictable rule-based AI systems over complex RL systems.

The findings, presented in a paper published on arXiv, highlight some of the underexplored challenges of applying reinforcement learning to real-world situations and can have important implications for the future development of AI systems that are meant to cooperate with humans.

Finding the gap in reinforcement learning

Deep reinforcement learning, the algorithm used by state-of-the-art game-playing bots, starts by providing an agent with a set of possible actions in the game, a mechanism to receive feedback from the environment, and a goal to pursue. Then, through numerous episodes of gameplay, the RL agent gradually goes from taking random actions to learning sequences of actions that can help it maximize its goal.

Early research of deep reinforcement learning relied on the agent being pretrained on gameplay data from human players. More recently, researchers have been able to develop RL agents that can learn games from scratch through pure self-play without human input.

In their study, the researchers at MIT Lincoln Laboratory were interested in finding out if a reinforcement learning program that outperforms humans could become a reliable coworker to humans.

“At a very high level, this work was inspired by the question: What technology gaps exist that prevent reinforcement learning (RL) from being applied to real-world problems, not just video games?” Dr. Ross Allen, AI researcher at Lincoln Laboratory and co-author of the paper, told TechTalks. “While many such technology gaps exist (e.g., the real world is characterized by uncertainty/partial-observability, data scarcity, ambiguous/nuanced objectives, disparate timescales of decision making, etc.), we identified the need to collaborate with humans as a key technology gap for applying RL in the real-world.”

Adversarial vs cooperative games