AI is now learning to evolve like earthly lifeforms


This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence.

Hundreds of millions of years of evolution have blessed our planet with a wide variety of lifeforms, each intelligent in its own fashion. Each species has evolved to develop innate skills, learning capacities, and a physical form that ensure its survival in its environment.

But despite being inspired by nature and evolution, the field of artificial intelligence has largely focused on creating the elements of intelligence separately and fusing them together after development. While this approach has yielded great results, it has also limited the flexibility of AI agents in some of the basic skills found in even the simplest lifeforms.

In a new paper published in the scientific journal Nature, AI researchers at Stanford University present a new technique that can help take steps toward overcoming some of these limits. Titled “Deep Evolutionary Reinforcement Learning,” the new technique uses a complex virtual environment and reinforcement learning to create virtual agents that can evolve both in their physical structure and learning capacities. The findings can have important implications for the future of AI and robotics research.

Evolution is hard to simulate

Credit: Ben Dickson / TechTalks

In nature, body and brain evolve together. Across many generations, every animal species has gone through countless cycles of mutation to grow limbs, organs, and a nervous system to support the functions it needs in its environment. Mosquitos have thermal vision to spot body heat. Bats have wings to fly and an echolocation apparatus to navigate dark places. Sea turtles have flippers to swim and a magnetic field detector system to travel very long distances. Humans have an upright posture that frees their arms and lets them see the far horizon, hands and nimble fingers that can manipulate objects, and a brain that makes them the best social creatures and problem solvers on the planet.

Interestingly, all these species descended from the first lifeform that appeared on Earth several billion years ago. Based on the selection pressures caused by the environment, the descendants of those first living beings evolved in many different directions.

Studying the evolution of life and intelligence is interesting. But replicating it is extremely difficult. An AI system that would want to recreate intelligent life in the same way that evolution did would have to search a very large space of possible morphologies, which is extremely expensive computationally. It would need a lot of parallel and sequential trial-and-error cycles.

AI researchers use several shortcuts and predesigned features to overcome some of these challenges. For example, they fix the architecture or physical design of an AI or robotic system and focus on optimizing the learnable parameters. Another shortcut is the use of Lamarckian rather than Darwinian evolution, in which AI agents pass on their learned parameters to their descendants. Yet another approach is to train different AI subsystems separately (vision, locomotion, language, etc.) and then tack them on together in a final AI or robotic system. While these approaches speed up the process and reduce the costs of training and evolving AI agents, they also limit the flexibility and variety of results that can be achieved.

Deep Evolutionary Reinforcement Learning

Deep Evolutionary Reinforcement Learning structure
Credit: Ben Dickson / TechTalks

In their new work, the researchers at Stanford aim to bring AI research a step closer to the real evolutionary process while keeping the costs as low as possible. “Our goal is to elucidate some principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control,” they write in their paper.

Their framework is called Deep Evolutionary Reinforcement Learning. In DERL each agent uses deep reinforcement learning to acquire the skills required to maximize its goals during its lifetime. DERL uses Darwinian evolution to search the morphological space for optimal solutions, which means that when a new generation of AI agents are spawned, they only inherit the physical and architectural traits of their parents (along with slight mutations). None of the learned parameters are passed on across generations.

“DERL opens the door to performing large-scale in silico experiments to yield scientific insights into how learning and evolution cooperatively create sophisticated relationships between environmental complexity, morphological intelligence, and the learnability of control tasks,” the researchers write.

Simulating evolution

For their framework, the researchers used MuJoCo, a virtual environment that provides highly accurate rigid-body physics simulation. Their design space is called UNIversal aniMAL (UNIMAL), in which the goal is to create morphologies that learn locomotion and object-manipulation tasks in a variety of terrains.

Each agent in the environment is composed of a genotype that defines its limbs and joints. The direct descendant of each agent inherits the parent’s genotype and goes through mutations that can create new limbs, remove existing limbs, or make small modifications to characteristics such as the degrees of freedom or the size of limbs.

Each agent is trained with reinforcement learning to maximize rewards in various environments. The most basic task is locomotion, in which the agent is rewarded for the distance it travels during an episode. Agents whose physical structure are better suited for traversing terrain learn faster to use their limbs for moving around.

To test the system’s results, the researchers generated agents in three types of terrains: flat (FT), variable (VT), and variable terrains with modifiable objects (MVT). The flat terrain puts the least selection pressure on the agents’ morphology. The variable terrains, on the other hand, force the agents to develop a more versatile physical structure that can climb slopes and move around obstacles. The MVT variant has the added challenge of requiring the agents to manipulate objects to achieve their goals.

The benefits of DERL

The benefits of DERL
Credit: Ben Dickson / TechTalks