Staff Research Scientist @Deepmind
Email: daniel (dot) mankowitz (at) gmail (dot) com
I am a Staff Research Scientist at Deepmind. In 2018, I completed my PhD in Hierarchical Reinforcement Learning under the supervision of Professor Shie Mannor, at the Technion Israel Institute of Technology. I am a recipient of the Google PhD Fellowship.
I currently work on solving the key challenges preventing Reinforcement Learning algorithms from working on real-world applications at scale. Some of the real-world applications I have worked on include controlling physical systems such as Heating Ventilation and Air-Conditioning (HVAC) control, video compression, code-generation and recommender systems.
I have released a Google AI blog post detailing the various challenges of real world RL as well as a suite we have open-sourced to accelerate research toward solving these challenges.
Video streaming usage has seen a significant rise as entertainment, education,and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users....
Competition-Level Code Generation with AlphaCode
Published in Science (2022)
Featured in top 10 break-throughs of the year by Science (2022)
Programming is a powerful and ubiquitous problem-solving tool. Systems that can assist programmers or even generate programs themselves could make programming more productive and accessible. Recent transformer-based neural network models show impressive code generation abilities yet still perform poorly on more complex tasks requiring problem-solving skills, such as competitive programming problems. Here, we introduce AlphaCode, a system for code generation that achieved an average ranking in the top 54.3% in simulated evaluations on recent programming competitions on the Codeforces platform. AlphaCode solves problems by generating millions of diverse programs using specially trained transformer-based networks and then filtering and clustering those programs to a maximum of just 10 submissions. This result marks the first time an artificial intelligence system has performed competitively in programming competitions.
This paper is a technical overview of DeepMind and Google's recent work on reinforcement learning for controlling commercial cooling systems... resulting in energy savings of approximately 9% and 13% respectively at the two live experiment sites.
Reinforcement Learning (RL) has proven to be effective in solving numerous complex problems ranging from Go, StarCraft and Minecraft to robot locomotion and chip design. In each of these cases, a simulator is available or the real environment is quick and inexpensive to access. Yet, there are still considerable challenges to deploying RL to real-world products and systems....
Paper | ICML 2019 RL4RealLife Workshop (2019) Best paper award
We present a set of nine unique challenges that must be addressed to productionize RL to real world problems....
Paper | Springer Special Issue on RL for Real Life (2021)
Our proposed challenges are implemented in a suite of continuous control environments called realworldrl-suite which we propose an as an open-source benchmark...
Paper | ICLR (2019)
In this work we present a novel multi-timescale approach for constrained policy optimization, called `Reward Constrained Policy Optimization' (RCPO), which uses an alternative penalty signal to guide the policy towards a constraint satisfying one. We prove the convergence of our approach and provide empirical evidence of its ability to train constraint satisfying policies.
Paper | AAAI 2017
We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base ... The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.
Paper | ICLR 2020
We provide a framework for incorporating robustness -- to perturbations in the transition dynamics which we refer to as model misspecification -- into continuous control Reinforcement Learning (RL) algorithms....
Paper | NeurIPS 2016
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them.
Paper | RLDM 2019
We propose a novel agent architecture called Unicorn, which demonstrates strong continual learning and outperforms several baseline agents on the proposed domain. The agent achieves this by jointly representing and learning multiple policies efficiently, using a parallel off-policy learning setup.