Daniel J. Mankowitz
Senior Research Scientist at Deepmind
I am a Senior Research Scientist at Google Deepmind. In 2018, I completed my PhD in Hierarchical Reinforcement Learning under the supervision of Professor Shie Mannor, at the Technion Israel Institute of Technology. I am the recipient of the 2016 Google PhD Fellowship.
I currently work on solving the key challenges preventing Reinforcement Learning algorithms from working on real-world applications at scale.
News and Announcements
Two papers accepted to ICLR 2021
Challenges of Real World Reinforcement Learning: Definitions, Benchmarks and Analysis has been accepted to the Special Edition Machine Learning Journal RL for Real Life
Co-organizing: Challenges of Real World RL NeurIPS workshop 2020
RL unplugged: Benchmarks for offline reinforcement learning accepted to NeurIPS 2020
Robust reinforcement learning for continuous control with model misspecification accpeted to ICLR 2020
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning has been accepted to NIPS 2018
Soft-Robust Actor-Critic Policy Gradient has been accepted to UAI 2018!
Transfer in deep reinforcement learning using successor features and generalised policy improvement has been accepted to ICML 2018
I recently joined Google Deepmind as a Research Scientist
March 2018: Completed my PhD!
November 2017: Learning Robust Options has been accepted to AAAI 2018!
August 2017: Shallow Updates for Deep Reinforcement Learning has been accepted to NIPS 2017!
July 2017 - October 2017: I completed an internship at Google Deepmind
May 2017: I am co-organized an ICML workshop entitled 'Lifelong Learning: A Reinforcement Learning Approach' at ICML 2017
November 2016: A Deep Hierarchical Approach to Lifelong Learning in Minecraft has been accepted to AAAI 2017!
October 2016: 3 papers accepted to EWRL 2016
August 2016: Adaptive Skills, Adaptive Partitions (ASAP) paper accepted to NIPS 2016!
June 2016: Successfully co-organized the Abstraction in Reinforcement Learning workshop with over 500 registered participants at ICML 2016.
March 2016: I am co-organizing an ICML Workshop entitled 'Abstraction in Reinforcement Learning' which has just been accepted to ICML 2016
Daniel J. Mankowitz and Gabriel Dulac-Arnold
Google AI Blog Post August 2020
Reinforcement Learning (RL) has proven to be effective in solving numerous complex problems ranging from Go, StarCraft and Minecraft to robot locomotion and chip design. In each of these cases, a simulator is available or the real environment is quick and inexpensive to access. Yet, there are still considerable challenges to deploying RL to real-world products and systems. For example, in physical control systems, such as robotics and autonomous driving, RL controllers are trained to solve tasks like grasping objects or driving on a highway. These controllers are susceptible to effects such as sensor noise, system delays, or normal wear-and-tear that can reduce the quality of input to the controller, leading to incorrect decision-making and potentially catastrophic failures....
Gabriel Dulac-Arnold, Daniel J. Mankowitz, Todd Hester
ICML 2019 RL4RealLife Workshop (Best paper award)
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are often hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. We present a set of nine unique challenges that must be addressed to productionize RL to real world problems. For each of these challenges, we specify the exact meaning of the challenge, present some approaches from the literature, and specify some metrics for evaluating that challenge. An approach that addresses all nine challenges would be applicable to a large number of real world problems. We also present an example domain that has been modified to present these challenges as a testbed for practical RL research.
Gabriel Dulac-Arnold*, Nir Levine*, Daniel J. Mankowitz*, Jerry Li, Cosmin Paduraru, Sven Gowal, Todd Hester
Under Review (2020)
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is beginning to show some successes in real-world scenarios. However, much of the research advances in RL are hard to leverage in real-world systems due to a series of assumptions that are rarely satisfied in practice. In this work, we identify and formalize a series of independent challenges that embody the difficulties that must be addressed for RL to be commonly deployed in real-world systems. For each challenge, we define it formally in the context of a Markov Decision Process, analyze the effects of the challenge on state-of-the-art learning algorithms, and present some existing attempts at tackling it. We believe that an approach that addresses our set of proposed challenges would be readily deployable in a large number of real world problems. Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark.
Chen Tessler, Daniel J. Mankowitz, Shie Mannor
Solving tasks in Reinforcement Learning is no easy feat. As the goal of the agent is to maximize the accumulated reward, it often learns to exploit loopholes and misspecifications in the reward signal resulting in unwanted behavior. While constraints may solve this issue, there is no closed form solution for general constraints. In this work we present a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. We prove the convergence of our approach and provide empirical evidence of its ability to train constraint satisfying policies.
Daniel J. Mankowitz*, Nir Levine*, Rae Jeong, Yuanyuan Shi, Jackie Kay, Abbas Abdolmaleki, Jost Tobias Springenberg, Timothy Mann, Todd Hester, Martin Riedmiller
In Proc. ICLR 2020
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms. We specifically focus on incorporating robustness into a state-of-the-art continuous control RL algorithm called Maximum a-posteriori Policy Optimization (MPO). We achieve this by learning a policy that optimizes for a worst case expected return objective and derive a corresponding robust entropy-regularized Bellman contraction operator. In addition, we introduce a less conservative, soft-robust, entropy-regularized objective with a corresponding Bellman operator. We show that both, robust and soft-robust policies, outperform their non-robust counterparts in nine Mujoco domains with environment perturbations. In addition, we show improved robust performance on a high-dimensional, simulated, dexterous robotic hand. Finally, we present multiple investigative experiments that provide a deeper insight into the robustness framework. This includes an adaptation to another continuous control RL algorithm as well as learning the uncertainty set from offline data. Performance videos can be found online at this https URL.
Daniel J. Mankowitz, Timothy Mann, Shie Mannor
In Proc. NeurIPS 2016, Barcelona, Spain
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.
Daniel J. Mankowitz, Augustin Žídek, André Barreto, Dan Horgan, Matteo Hessel, John Quan, Junhyuk Oh, Hado van Hasselt, David Silver, Tom Schaul
In Proc. RLDM 2019
Some real-world domains are best characterized as a single task, but for others this perspective is limiting. Instead, some tasks continually grow in complexity, in tandem with the agent's competence. In continual learning, also referred to as lifelong learning, there are no explicit task boundaries or curricula. As learning agents have become more powerful, continual learning remains one of the frontiers that has resisted quick progress. To test continual learning capabilities we consider a challenging 3D domain with an implicit sequence of tasks and sparse rewards. We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.
Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor
In Proc. AAAI 2017, San Francisco, USA
We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.