TL;DR Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning

Posted on 2017-12-05

This is a Too Long; Didn’t Read summary of the paper Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning. Inspired by the blog post on the same paper published by the Berkeley Artifical Intelligence Research (BAIR) Lab last week.

Reading Time: ~ 90 seconds


Current learning-based methods often use deep neural networks, which are powerful but not data efficient. They learn by failing over and over again in simulations, often millions of times. This sample inefficiency is one of the main bottlenecks to leveraging learning-based methods in the real-world.

The approach presented by the paper is able to learn to walk and follow a trajectory with significantly less data. The increase in sample efficiency brings these learning-based methods out of the simulation world and into the realm of feasibility in the real-world.

In model-based learning, the agent exploits a previously learned model to accomplish a task. In model-free learning, the agent simply relies on some trial-and-error experience for action selection.

Model-based approaches can very quickly become competent at a task, whereas model-free approaches require immense amounts of data, but can become experts

The approach taken here used a deep neural network with a model-based algorithm. The deep neural network is within a model predictive control (MPC) framework, whereby the system calculates a plan, implements the first step, and then repeats the process. An intermediate planning horizon was used to avoid greedy behavior while minimizing the detrimental effects of an inaccurate model.

Diagram of algorithm. Source: BAIR blog

Diagram of algorithm. Source: BAIR blog

This experiment was initially a simulation, but the paper’s authors used a small six-legged robot to do trajectory following and demonstrate potential for real-world applications. Using this model-based approach, they managed to train the robot using just 17 minutes worth of data, and outperformed a commonly used differential drive method for locomotion.

However, the performance of this model-based algorithm was lower than that of a very good model-free algorithm, so a hybrid approach was suggested to combine the sample efficiency of a model-based solution with the potential of a model-free one.

After training a deep neural network as part of the model-based approach, it can be extracted to initialise a model-free approach. The sample efficiency would still increase by orders of magnitude over pure model-free approaches, but it would be able to achieve the same expertise after significantly less learning.