Douglas Eck Assistant Professor, Department of Computer Science and Operations Research University of Montreal, Quebec, Canada Time-Warped Hierarchical Structure in Music and Speech: A Sequence Prediction Challenge One hallmark of music is the nested temporal structure found in meter and grouping structure. Meter is experienced by a listener as recurring strong and weak beats (e.g. "STRONG weak weak" in the 3/4 time signature of a waltz). Metrical structure is generally hierarchical and can bridge multiple bars of music, thus involving very long timescales. Grouping structure is found in the organization of musical melody into distinct phrases and motifs. Like meter, grouping structure involves long timescales. However, unlike meter, melodic phrasing is not always governed by low-order integer relationships (e.g. 1:3 for the 3/4 waltz). For the case of performed music (as opposed to transcribed music) the challenge of sequence prediction is complicated by the fact that the events do not occur with lockstep reliability at particular points in time. First there is the issue that performers are unable to reproduce perfect "metronomic" timing. But such timing "jitter" is not important because it can be discarded as noise. More important are the purposeful timing effects found in a musical performance. Musicians warp metronomic time significantly in order to create effects like rubato and swing. Such expressive timing effects cannot be discarded as noise and, in fact, can be useful insofar as they are correlated with underlying meter and grouping structure. Nor is time-warped hierarchical structure limited only to music: phoneme structure in speech is similar in many ways. I will relate these observations to the more general problem of vanishing gradients in recurrent neural networks. I will discuss how certain simple approaches can cope with time-warping while other simple approaches can handle temporal hierarchy. Time permitting I will offer candidate solutions that deal with both issues at once. However the focus of this short talk will be less on solutions and more on better understanding why time-warped hierarchical temporal structure is both challenging and pervasive.