Daniel J. Mankowitz

PhD Reinforcement Learning/Machine Learning

I am a Research Scientist at Google Deepmind. In 2018, I completed my PhD in Reinforcement Learning under the supervision of Professor Shie Mannor, at the Technion Israel Institute of Technology.

 

PhD Fellowship Recipient 2016

News and Announcements:

  • March 2018: Completed my PhD!

  • November 2017: Learning Robust Options has been accepted to AAAI 2018!

  • July 2017 - October 2017: I completed an internship at Google Deepmind

  • October 2016: 3 papers accepted to EWRL 2016

 

 

Previously given talks at:

Selected Publications

Daniel J. Mankowitz, Timothy Mann, Shie Mannor

In Proc. NIPS 2016, Barcelona, Spain

We introduce the Adaptive Skills, Adaptive Partitions (ASAP) framework that (1) learns skills (i.e., temporally extended actions or options) as well as (2) where to apply them. We believe that both (1) and (2) are necessary for a truly general skill learning framework, which is a key building block needed to scale up to lifelong learning agents. The ASAP framework can also solve related new tasks simply by adapting where it applies its existing learned skills. We prove that ASAP converges to a local optimum under natural conditions. Finally, our experimental results, which include a RoboCup domain, demonstrate the ability of ASAP to learn where to reuse skills as well as solve multiple tasks with considerably less experience than solving each task from scratch.

Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J. Mankowitz, Shie Mannor

In Proc. AAAI 2017, San Francisco, USA

We propose a lifelong learning system that has the ability to reuse and transfer knowledge from one task to another while efficiently retaining the previously learned knowledge-base. Knowledge is transferred by learning reusable skills to solve tasks in Minecraft, a popular video game which is an unsolved and high-dimensional lifelong learning problem. These reusable skills, which we refer to as Deep Skill Networks, are then incorporated into our novel Hierarchical Deep Reinforcement Learning Network (H-DRLN) architecture using two techniques: (1) a deep skill array and (2) skill distillation, our novel variation of policy distillation (Rusu et. al. 2015) for learning skills. Skill distillation enables the HDRLN to efficiently retain knowledge and therefore scale in lifelong learning, by accumulating knowledge and encapsulating multiple reusable skills into a single distilled network. The H-DRLN exhibits superior performance and lower learning sample complexity compared to the regular Deep Q Network (Mnih et. al. 2015) in sub-domains of Minecraft.

Daniel J. Mankowitz, Timothy Mann, Shie Mannor

In Proc. ICML 2014, Beijing, China

High-level skills relieve planning algorithms from low-level details. But when the skills are poorly designed for the domain, the resulting plan may be severely suboptimal. Sutton et al. 1999 made an important step towards resolving this problem by introducing a rule that automatically improves a set of skills called options. This rule terminates an option early whenever switching to another option gives a higher value than continuing with the current option. However, they only analyzed the case where the improvement rule is applied once. We show conditions where this rule converges to the optimal set of options. A new Bellman-like operator that simultaneously improves the set of options is at the core of our analysis. One problem with the update rule is that it tends to favor lower-level skills. Therefore we introduce a regularization term that favors longer duration skills. Experimental results demonstrate that this approach can derive a good set of high-level skills even when the original set of skills cannot solve the problem.

Nor Levin, Tom Zahavy, Daniel J. Mankowitz, Aviv Tamar, Shie Mannor

In Proc. NIPS 2017, Long Beach, USA

Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach -- the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.

Daniel J. Mankowitz, Submramanian Ramamoorthy

In Proc. RoboCup Symposium, RoboCup, 2013, Eindhoven, Netherlands

We address the problem of devising vision-based feature extraction for the purpose of localisation on resource constrained robots that nonetheless require reasonably agile visual processing. We present modifications to a state-of-the-art Feature Extraction Algorithm (FEA) called Binary Robust Invariant Scalable Keypoints (BRISK) [8]. A key aspect of our contribution is the combined use of BRISK0 and U-BRISK as the FEA detector-descriptor pair for the purpose of localisation. We present a novel scoring function to find optimal parameters for this FEA. Also, we present two novel geometric matching constraints that serve to remove invalid interest point matches, which is key to keeping computations tractable. This work is evaluated using images captured on the Nao humanoid robot. In experiments, we show that the proposed procedure outperforms a previously implemented state-of-the-art vision-based FEA called 1D SURF (developed by the rUNSWift RoboCup SPL team), on the basis of accuracy and generalisation performance. Our experiments include data from indoor and outdoor environments, including a comparison to datasets such as based on Google Streetview.

Daniel J. Mankowitz, Andrew Paverd

IEEE Eurocon, 2011, Lisbon, Portugal

This paper presents a novel approach to cellular network coverage analysis and demonstrates the capabilities of a prototype system. Location specific network measurements are obtained from consumer mobile devices within the network. Crowd sourcing is used to generate a sufficiently large dataset of measurements. By visualising these measurements in a location-based context, this system can be used to produce high accuracy network coverage maps, improve the identification of cell boundaries, observe detailed cell level measurements and analyse the dynamic characteristics of a cellular network.

This site was designed with the
.com
website builder. Create your website today.
Start Now