The unifying theme of my research is learning in the temporal domain. Many challenging problems cannot be cast as i.i.d., but rather involve long temporal sequences, possibly with changing dynamics of the environment and agents that change their policy while learning.
In the passive setting, the goal is to learn how to map observed sequences to next-step predictions, target sequences or class labels. In the active setting, the task is generally framed as reinforcement learning, where the learning agent takes decisions that directly affect its environment, and the rewards it can obtain. Useful for both scenarios are function approximators that can handle input sequences of arbitrary length, and among the most powerful of them are recurrent neural networks.

Reinforcement learning (RL) concerns itself with learning which actions to take in each sequential state, in an environment which provides positive or negative reward signals (i.e., the proverbial carrot and stick).
In the past, I have helped develop an expectation-maximization RL algorithm[2] designed for use in conjunction with recurrent neural networks, and define a taxonomic distinction between ontogenetic and phylogenetic methods[7].
My current focus is on modular RL: Most reinforcement learning solutions assume a single, monolithic learning mechanism, whereas in practice, there are many reasons to decompose the learning task and spread it among many specialist components[1], that each learn a limited domain of expertise; only together can they solve the task as a whole. The novel modular architecture Mark Ring and I are developing relies on Q-error prediction to choose among its components, which enables multiple linear controllers to jointly solve highly non-linear problems[22], and permits spatially organizing RL modules by their behavior[25].
Artificial neural networks are a powerful class of non-linear function approximators. When connections between neurons can be time-delayed, they are known as recurrent neural networks (RNN), which are state-of-the-art on a range of problems with temporal structure, like time series prediction or connected handwriting recognition.
One particular type of RNN uses gating units that protect an internal state for an arbitrary amount of time. These 'long short-term memory' cells (LSTM) capture long-term time dependencies, a crucial property for RL, where typical reward structures involve long and variable time lags. I have used multi-dimensional RNNs (that is with more than one temporal dimension) on games with strong spatial structure[6][8]. Training RNNs with gradient methods is harder than for feed-forward networks, such that black-box methods can provide a robust alternative, despite making use of less information.
Currently, I am investigating how to best combine recurrent architectures with deep networks, and how convolutions along the time-axis can reduce the temporal resolution at the top.
Optimization problems that are too difficult or complex to model directly can be solved in a `black box' manner[5]. Our Natural Evolution Strategies (NES[3], see the dedicated page for details), is one such family of algorithms, which iteratively update a search distribution from which search points are drawn and evaluated, which is then updated in the direction of higher expected fitness, using ascent along the natural gradient.
Recent developments include an extension of the approach to non-Gaussian distributions[21], a variant for multi-objective problems[19], two algorithm variants that scale linearly with problem dimension[*], and a convergence proof[27].

Humans explore the world quite efficiently, going for areas/topics that they do not know well, but where they also expect to be able to learn more about the world. In AI this idea has been formally introduced as artificial curiosity.
We're currently aiming to improve the exploration strategies of existing algorithms based on this idea, namely in the context of costly optimization[20], where evaluations of the objective are too expensive to explore randomly.
Coherence progress is a restricted measure of interestingness, that only depends on compression[23][24]. Because of its applicability to any type of compressor, it allows for an easy and domain-specific implementation, but at the cost of no longer considering the agent's adaptation.

Recent advances in brain implants and brain-machine interfaces have unlocked vast amounts of temporal data (like voltage readings and spike trains), that are expected to encode a wealth of information (e.g., movement intentions), even if the dependencies are highly non-linear.
Novel decoding algorithms are likely to be necessary for extracting high-dimensional sequences from this type of data. Working with Bijan Peseran's team, I am currently applying recurrent neural networks (with temporal convolutions) to this task.

I find games to be very appropriate application domains for a variety of machine learning techniques, because they are more controllable and clean (and fun to work with, of course). In fact, they may even lend themselves to practical measures of intelligence[*].
I'm particularly interested in board games with spatial structure and patterns like Go (for which used a customized type of multi-dimensional recurrent neural network[6][8][12]), or games with complex graph structure like Sokoban, where a modular approach based on artificial economies[1] was successful.