Daniel J. Mankowitz
PhD Reinforcement Learning/Machine Learning
PhD Fellowship Recipient 2016
News and Announcements:
Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning has been accepted to NIPS 2018
Soft-Robust Actor-Critic Policy Gradient has been accepted to UAI 2018!
Transfer in deep reinforcement learning using successor features and generalised policy improvement has been accepted to ICML 2018
I recently joined Google Deepmind as a Research Scientist
March 2018: Completed my PhD!
November 2017: Learning Robust Options has been accepted to AAAI 2018!
August 2017: Shallow Updates for Deep Reinforcement Learning has been accepted to NIPS 2017!
July 2017 - October 2017: I completed an internship at Google Deepmind
May 2017: I am co-organized an ICML workshop entitled 'Lifelong Learning: A Reinforcement Learning Approach' at ICML 2017
November 2016: A Deep Hierarchical Approach to Lifelong Learning in Minecraft has been accepted to AAAI 2017!
October 2016: 3 papers accepted to EWRL 2016
August 2016: Adaptive Skills, Adaptive Partitions (ASAP) paper accepted to NIPS 2016!
June 2016: Successfully co-organized the Abstraction in Reinforcement Learning workshop with over 500 registered participants at ICML 2016.
March 2016: I am co-organizing an ICML Workshop entitled 'Abstraction in Reinforcement Learning' which has just been accepted to ICML 2016
Previously given talks at:
Daniel J. Mankowitz, Timothy Mann, Shie Mannor
In Proc. NIPS 2016, Barcelona, Spain
We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.
Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor
In Proc. AAAI 2017, San Francisco, USA
We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.
Daniel J. Mankowitz, Timothy Mann, Shie Mannor
In Proc. ICML 2014, Beijing, China
High-level skills relieve planning algorithms from low-level details. But when the skills are poorly designed for the domain, the resulting plan may be severely suboptimal. Sutton et al. 1999 made an important step towards resolving this problem by introducing a rule that automatically improves a set of skills called options. This rule terminates an option early whenever switching to another option gives a higher value than continuing with the current option. However, they only analyzed the case where the improvement rule is applied once. We show conditions where this rule converges to the optimal set of options. A new Bellman-like operator that simultaneously improves the set of options is at the core of our analysis. One problem with the update rule is that it tends to favor lower-level skills. Therefore we introduce a regularization term that favors longer duration skills. Experimental results demonstrate that this approach can derive a good set of high-level skills even when the original set of skills cannot solve the problem.
Nor Levin, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor
In Proc. NIPS 2017, Long Beach, USA
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.
Daniel J. Mankowitz, Submramanian Ramamoorthy
In Proc. RoboCup Symposium, RoboCup, 2013, Eindhoven, Netherlands
We address the problem of devising vision-based feature extraction for the purpose of localisation on resource constrained robots that nonetheless require reasonably agile visual processing. We present modifications to a state-of-the-art Feature Extraction Algorithm (FEA) called Binary Robust Invariant Scalable Keypoints (BRISK) . A key aspect of our contribution is the combined use of BRISK0 and U-BRISK as the FEA detector-descriptor pair for the purpose of localisation. We present a novel scoring function to find optimal parameters for this FEA. Also, we present two novel geometric matching constraints that serve to remove invalid interest point matches, which is key to keeping computations tractable. This work is evaluated using images captured on the Nao humanoid robot. In experiments, we show that the proposed procedure outperforms a previously implemented state-of-the-art vision-based FEA called 1D SURF (developed by the rUNSWift RoboCup SPL team), on the basis of accuracy and generalisation performance. Our experiments include data from indoor and outdoor environments, including a comparison to datasets such as based on Google Streetview.
Daniel J. Mankowitz, Andrew Paverd
This paper presents a novel approach to cellular network coverage analysis and demonstrates the capabilities of a prototype system. Location specific network measurements are obtained from consumer mobile devices within the network. Crowd sourcing is used to generate a sufficiently large dataset of measurements. By visualising these measurements in a location-based context, this system can be used to produce high accuracy network coverage maps, improve the identification of cell boundaries, observe detailed cell level measurements and analyse the dynamic characteristics of a cellular network.