Overview

The unifying theme of my research is learning in the temporal domain. Many challenging problems cannot be cast as i.i.d., but rather involve long temporal sequences, possibly with changing dynamics of the environment and agents that change their policy while learning. In the passive setting, the goal is to learn how to map observed sequences to next-step predictions, target sequences or class labels. In the active setting, the task is generally framed as reinforcement learning, where the learning agent takes decisions that directly affect its environment, and the rewards it can obtain. Useful for both scenarios are function approximators that can handle input sequences of arbitrary length, and among the most powerful of them are recurrent neural networks.

Carrot and stick

Reinforcement learning

Reinforcement learning (RL) concerns itself with learning which actions to take in each sequential state, in an environment which provides positive or negative reward signals (i.e., the proverbial carrot and stick). In the past, I have helped develop an expectation-maximization RL algorithm[2] designed for use in conjunction with recurrent neural networks, and define a taxonomic distinction between ontogenetic and phylogenetic methods[7].

My recent focus has been on modular RL: Most reinforcement learning solutions assume a single, monolithic learning mechanism, whereas in practice, there are many reasons to decompose the learning task and spread it among many specialist components[1], that each learn a limited domain of expertise; only together can they solve the task as a whole. The novel modular architecture Mark Ring and I are developing relies on Q-error prediction to choose among its components, which enables multiple linear controllers to jointly solve highly non-linear problems[22], and permits spatially organizing RL modules by their behavior[25] (akin to the motor cortex). We developed a learning rule that produces an emerging temporal coherence, where sequential actions are grouped together [28], simultaneously with their spatial organization.

In addition, I am currently looking into methods that produce predictive representations (based on Sutton's General Value Functions, or 'forecasts'), evaluating their usefulness and studying how well they generalize[32], to determine whether they are a good candidate for a general-purpose representation in RL, and continual learning more specifically (hint: they probably are).

Optimization

Optimization problems that are too difficult or complex to model directly can be solved in a 'black box' manner[5]. Our Natural Evolution Strategies (NES[3], see the dedicated page for details), is one such family of algorithms, which iteratively update a search distribution from which search points are drawn and evaluated, which is then updated in the direction of higher expected fitness, using ascent along the natural gradient. At the BBOB-2012 workshop, NES variants were extensively benchmarked and found to achieve comparably to the state-of-the-art[W4][W5][W6][W7][W8]. Further recent developments include an extension of the approach to non-Gaussian distributions[21], a variant for multi-objective problems[19], two algorithm variants that scale linearly with problem dimension[30], and a convergence proof[27].

Large optimization problems can be solved much more efficiently whenever it is possible to obtain gradient information (e.g. when training large, but differentiable neural networks). The bread-and-butter algorithm in this case is stochastic gradient descent (SGD), for which we are currently developing algorithmic improvements, like our adaptive learning rate (schedules)[33] that rely on tracking sample gradient statistics during optimization; these are particularly helpful if for streaming or non-stationary datasets, and avoid the hassle of parameter tuning. This scheme can also be extended to apply to non-smooth loss functions, sparse gradients, and mini-batch parallelization[31].

Neural networks

Artificial neural networks are a powerful class of non-linear function approximators. When connections between neurons can be time-delayed, they are known as recurrent neural networks (RNN), which are state-of-the-art on a range of problems with temporal structure, like time series prediction or connected handwriting recognition.

One particular type of RNN uses gating units that protect an internal state for an arbitrary amount of time. These 'long short-term memory' cells (LSTM) capture long-term time dependencies, a crucial property for RL, where typical reward structures involve long and variable time lags. I have used multi-dimensional RNNs (that is with more than one temporal dimension) on games with strong spatial structure[6][8]. Training RNNs with gradient methods is harder than for feed-forward networks, such that black-box methods can provide a robust alternative, despite making use of less information.

Currently, I am investigating how to best combine recurrent architectures with deep networks, and how convolutions along the time-axis can reduce the temporal resolution at the top. In the process, we found improved stochastic optimization methods that speed up learning in such large networks.

Exploration

Artificial curiosity

Humans explore the world quite efficiently, going for areas/topics that they do not know well, but where they also expect to be able to learn more about the world. In AI this idea has been formally introduced as artificial curiosity.

We're currently aiming to improve the exploration strategies of existing algorithms based on this idea, namely in the context of costly optimization[20] (with applications to metamaterials[29]), where evaluations of the objective are too expensive to explore randomly.

Coherence progress is a restricted measure of interestingness, that only depends on compression[23][24]. Because of its applicability to any type of compressor, it allows for an easy and domain-specific implementation, but at the cost of no longer considering the agent's adaptation.

Neurons

Application: Neuroscience

Recent advances in brain implants and brain-machine interfaces have unlocked vast amounts of temporal data (like voltage readings and spike trains), that are expected to encode a wealth of information (e.g., movement intentions), even if the dependencies are highly non-linear.

Novel decoding algorithms are likely to be necessary for extracting high-dimensional sequences from this type of data. Working with Bijan Peseran's team, I am currently applying recurrent neural networks (with temporal convolutions) to this task, and using advanced stochastic optimization algorithms to train them.

Go Board

Application: Games

Games are very appropriate application domains for a variety of machine learning techniques (most notably for reinforcement learning), because they are clean, have controllable complexity and are fun to work with.

In fact, they may even lend themselves to practical measures of intelligence[13][T1][36] if a suitable game description language is used[35]. A draft implementation of such a language is py-vgdl[34], built on top of pygame.

I'm particularly interested in board games with spatial structure and patterns like Go (for which used a customized type of multi-dimensional recurrent neural network[6][8][12]), or games with complex graph structure like Sokoban, where a modular approach based on artificial economies[1] was successful.