Daniel J.
Mankowitz

About Me
I am a Staff Research Scientist at Deepmind. In 2018, I completed my PhD in Hierarchical Reinforcement Learning under the supervision of Professor Shie Mannor, at the Technion Israel Institute of Technology. I am a recipient of the Google PhD Fellowship.
I currently work on solving the key challenges preventing Reinforcement Learning algorithms from working on real-world applications at scale. Some of the real-world applications I have worked on include controlling physical systems such as Heating Ventilation and Air-Conditioning (HVAC) control, video compression, code-generation and recommender systems.
I have released a Google AI blog post detailing the various challenges of real world RL as well as a suite we have open-sourced to accelerate research toward solving these challenges.
Skills

Work History

Education

Featured Work


Google AI



Video streaming usage has seen a significant rise as entertainment, education,and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users....





Reinforcement Learning (RL) has proven to be effective in solving numerous complex problems ranging from Go, StarCraft and Minecraft to robot locomotion and chip design. In each of these cases, a simulator is available or the real environment is quick and inexpensive to access. Yet, there are still considerable challenges to deploying RL to real-world products and systems....

Selected Publications
Paper | ICML 2019 RL4RealLife Workshop (2019) Best paper award
We present a set of nine unique challenges that must be addressed to productionize RL to real world problems....

Paper | Springer Special Issue on RL for Real Life (2021)
Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark...

Paper | ICLR (2019)
In this work we present a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. We prove the convergence of our approach and provide empirical evidence of its ability to train constraint satisfying policies.

Paper | AAAI 2017
We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base ... The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.

Paper | ICLR 2020
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms....

Paper | NeurIPS 2016
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them.

Paper | RLDM 2019
We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.
