Daniel J.


Staff Research Scientist @Deepmind
Email: daniel (dot) mankowitz (at) gmail (dot) com


About Me

I am a Staff Research Scientist at Deepmind. In 2018, I completed my PhD in Hierarchical Reinforcement Learning under the supervision of Professor Shie Mannor, at the Technion Israel Institute of Technology. I am a recipient of the Google PhD Fellowship.

I currently work on solving the key challenges preventing Reinforcement Learning algorithms from working on real-world applications at scale. Some of the real-world applications I have worked on include controlling physical systems, video compression and recommender systems. 


I have released a Google AI blog post detailing the various challenges of real world RL as well as a suite we have open-sourced to accelerate research toward solving these challenges.



Screen Shot 2022-03-03 at 1.10.14 AM.png

Work History



Education Daniel_edited.png

Featured Work

Video streaming usage has seen a significant rise as entertainment, education,and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users....

L2e gif.gif

Reinforcement Learning (RL) has proven to be effective in solving numerous complex problems ranging from Go, StarCraft and Minecraft to robot locomotion and chip design. In each of these cases, a simulator is available or the real environment is quick and inexpensive to access. Yet, there are still considerable challenges to deploying RL to real-world products and systems....


Selected Publications

Paper     |      ICML 2019 RL4RealLife Workshop (2019) Best paper award

We present a set of nine unique challenges that must be addressed to productionize RL to real world problems....


Paper     |     Springer Special Issue on RL for Real Life (2021)

Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark...


Paper     |     ICLR (2019)

In this work we present a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. We prove the convergence of our approach and provide empirical evidence of its ability to train constraint satisfying policies.


Paper     |     AAAI 2017

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base ... The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.


Paper     |     ICLR 2020

We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms....

Screen Shot 2022-02-18 at 4.52.50 PM.png

Paper     |     NeurIPS 2016

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. 

Screen Shot 2022-02-18 at 4.54.11 PM.png

Paper     |     RLDM 2019

We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.

Screen Shot 2022-02-18 at 4.35.50 PM.png