.
Computer Vision with Fast & Deep / Recurrent Neural Nets: Best Results. By Juergen Schmidhuber. Includes adapted HAL 9000 image from Kubrick's movie based on CLarke's Novel: 2001. Also includes inmages from the German Traffic Sign Recognition Benchmark.
.
.

Fast Deep / Recurrent Neural Networks Win Many Computer Vision Contests
Jürgen Schmidhuber, 2009-2012 (compare interview on KurzweilAI)

Computer vision and pattern recognition is becoming essential for thousands of practical applications. For example, the future of search engines lies in image and video recognition as opposed to traditional text search. Autonomous robots such as driverless cars depend on it, too. It even has lifesaving impact through medical applications such as cancer detection.

Since 2009, our neural computer vision team has won 8 (eight) first prizes in important and highly competitive international visual pattern recognition contests. Our neural nets also were the first machine learning methods to reach human-competitive or even superhuman performance on important benchmarks (details in the rightmost column):

8. ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images
7. ISBI 2012 Image Segmentation Challenge (with superhuman pixel error rate)
6. IJCNN 2011 Traffic Sign Recognition Competition (only our method achieved superhuman results)
5. ICDAR 2011 offline Chinese Handwriting Competition
4. Online German Traffic Sign Recognition Contest
3. ICDAR 2009 Arabic Connected Handwriting Competition
2. ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition
1. ICDAR 2009 French Connected Handwriting Competition. Compare the overview page on handwriting recognition.

Our "deep learning" methods also set records in important Machine Learning (ML) benchmarks (details in the rightmost column):

A. The NORB Object Recognition Benchmark
B. The CIFAR Image Classification Benchmark
C. The MNIST Handwritten Digits Benchmark (perhaps the most famous benchmark; we achieved the 1st human-competitive result in 2011)
D. The Weizmann & KTH Human Action Recognition Benchmarks

Remarkably, none of 1-8 & A-C above required the traditional sophisticated computer vision techniques developed over the past six decades or so. Instead, our biologically rather plausible systems are inspired by human brains, and learn to recognize objects from numerous training examples. We use deep, artificial, supervised, feedforward or recurrent (deep by nature) neural networks with many non-linear processing stages.

We started work on "deep learning" over two decades ago. Back then, Sepp Hochreiter (now professor) was an undergrad student working on Schmidhuber's neural net project. His 1991 thesis (PDF) formally showed that deep networks like the above are hard to train because they suffer from the now famous "vanishing gradient" problem. Since then we have developed various techniques to overcome this obstacle (e.g., see here).

Some but not all of our nets are inspired by early hierarchical neural systems such as Fukushima's Neocognitron (1980). We sometimes (but not always) profit from sparse network connectivity and techniques such as weight sharing & convolution (LeCun et al, 1995), contrast enhancement, and max-pooling (e.g., Scherer et al, 2010).

We use graphics cards or GPUs (mini-supercomputers for video games, see picture in 2nd column) to speed up learning by a factor of up to 50. Our committees of networks improve the results even further.

Note that the successes 1-8 & A-C above did NOT require any unsupervised pre-training, which is a bit depressing, as we have developed unsupervised learning algorithms for 20 years. However, our new systems' feature detectors (FDs) do resemble those found by our old unsupervised methods such as Predictability Minimization (1992; more: 1996) and Low-Complexity Coding and Decoding (LOCOCODE, 1999).

Our new systems' feature detectors resemble 
  those found by our old unsupervised methods such as
Predictability Minimization (1992). Jürgen Schmidhuber FD due to semi- linear PM (1992), made in 1996

Reference [14] also uses fast deep nets, this time to achieve superior hand gesture recognition. And reference [16] uses them to achieve superior steel defect detection, three times better than support vector machines (SVM) trained on commonly used feature descriptors.

Not all of our pattern recognizers use neural nets though. For D above, novel supervised and unsupervised kernel-based methods were employed. Reference [5] uses credal classifiers to classify textures. And reference [2] uses a fast voting scheme to answer image-based queries, successfully tested on the ZuBud database.
.

NORB Dataset: Best Results as of 2011, by Fast Deep Nets on GPUs. Segmentation of neuronal structures in EM stacks: Best Results as of 2012. (Juergen Schmidhuber)

Our simple training algorithms for deep, wide, artificial neural network architectures similar to those of biological brains now win many competitions and yield best known results on many famous benchmarks for visual pattern recognition. Shown here are example images from NORB and EM stacks (left), CIFAR-10, Weizmann, KTH (below), and the Traffic Sign Competition (above / below).
.

We are currently experiencing a second Neural Network ReNNaissance (see IJCNN 2011 keynote - the first one happened in the 1980s and early 90s). In many applications, our deep NNs are now outperforming all other methods including the theoretically less general and less powerful support vector machines (which for a long time had the upper hand, at least in practice). Check out the, in hindsight, not too optimistic predictions of our RNNaissance workshop at NIPS 2003, and compare the RNN book preface.

Computer Vision Team (ex-)members in Schmidhuber's lab(s): Dan Ciresan, Ueli Meier, Jonathan Masci, Somayeh Danafar, Alex Graves, Davide Migliore. For medical imaging, we also work with Alessandro Giusti in the group of Luca Maria Gambardella.

Our work builds on earlier work by great neural network pioneers including Werbos, Amari, LeCun, Hinton, Williams, Rumelhart, Poggio, von der Malsburg, Kohonen, Fukushima, and others.

GPUs used for the work described in: High-Performance Neural Networks for Visual Object Classification, arXiv:1102.0183v1 [cs.AI]. (Jürgen Schmidhuber) SELECTED PUBLICATIONS

[18] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. Neural Networks 32, p 333-338, 2012. PDF of preprint.

[17] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, 2012. PDF. Longer preprint arXiv:1202.2745v1 [cs.CV].

[16] J. Masci, U. Meier, D. Ciresan, G. Fricout, J. Schmidhuber. Steel Defect Classification with Max-Pooling Convolutional Neural Networks. Proc. IJCNN 2012. PDF.

[15] D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber. Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images. In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe, 2012. PDF. (See also ISBI EM Competition Abstracts.)

[14] J. Nagi, F. Ducatelle, G. A. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber, L. M. Gambardella. Max-Pooling Convolutional Neural Networks for Vision-based Hand Gesture Recognition. Proc. 3rd IEEE Intl. Conf. on Signal & Image Processing and Applications (ICSIPA), Kuala Lumpur, 2011. PDF.

[13] J. Schmidhuber, D. Ciresan, U. Meier, J. Masci, A. Graves. On Fast Deep Nets for AGI Vision. In Proc. Fourth Conference on Artificial General Intelligence (AGI-11), Google, Mountain View, California, 2011. PDF. Video.

[12] D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber. Convolutional Neural Network Committees For Handwritten Character Classification. 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China, 2011. PDF.

[11] U. Meier, D. C. Ciresan, L. M. Gambardella, J. Schmidhuber. Better Digit Recognition with a Committee of Simple Neural Nets. 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China, 2011. PDF.

[10] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011. PDF.

[9] D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber. Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs. ArXiv Preprint arXiv:1103.4487v1 [cs.LG], 23 Mar 2011.

[8] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. ArXiv preprint, 1 Feb 2011. Describes our special breed of max-pooling convolutional networks (MPCNN), now widely used by research labs and companies all over the world.

[7] S. Danafar, A. Giusti, J. Schmidhuber. New State-of-the-Art Recognizers of Human Actions. EURASIP Journal on Advances in Signal Processing, doi:10.1155/2010/202768, 2010. HTML.

[6] D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber. Deep Big Simple Neural Nets For Handwritten Digit Recognition. Neural Computation 22(12): 3207-3220, 2010. ArXiv Preprint arXiv:1003.0358v1 [cs.NE], 1 March 2010.

[5] G. Corani, A. Giusti, D. Migliore, J. Schmidhuber. Robust Texture Recognition Using Credal Classifiers. Proc. BMVC, p 78.1-78.10. BMVA Press, 2010. doi:10.5244/C.24.78. HTML.

[4] A. Graves, J. Schmidhuber. Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks. Advances in Neural Information Processing Systems 22, NIPS'22, p 545-552, Vancouver, MIT Press, 2009. PDF.

[3] A. Graves, S. Fernandez, J. Schmidhuber. Multi-Dimensional Recurrent Neural Networks. Intl. Conf. on Artificial Neural Networks ICANN'07, 2007. Preprint: arxiv:0705.2011. PDF.

[2] M.v.d. Giessen and J. Schmidhuber. Fast color-based object recognition independent of position and orientation. In W. Duch et al. (Eds.): Proc. Intl. Conf. on Artificial Neural Networks ICANN'05, LNCS 3696, pp. 469-474, Springer-Verlag Berlin Heidelberg, 2005. PDF.

Link to the first artificial fovea sequentially steered by a learning neural controller (1990). Jürgen Schmidhuber Ongoing work on active perception. While the methods above tend to work fine in many applications, they are passive learners - they do not learn to actively search for the most informative image parts. Humans, however, use sequential gaze shifts for pattern recognition. This can be much more efficient than the fully parallel one-shot approach. That's why we want to combine the algorithms above with variants of our old method of 1990 - back then we built what to our knowledge was the first artificial fovea sequentially steered by a learning neural controller. Without a teacher, it used a variant of reinforcement learning to create saccades and find targets in a visual scene (and to track moving targets), although computers were a million times slower back then:

[1] J. Schmidhuber, R. Huber. Learning to generate artificial fovea trajectories for target detection. International Journal of Neural Systems, 2(1 & 2):135-141, 1991 (figures omitted). PDF. HTML. HTML overview with figures.

More on active learning without a teacher in the overview pages on the Formal Theory of Creativity and Curiosity.


Copyright notice (2011): Fibonacci web design by Jürgen Schmidhuber, who will be delighted if you use this web page for educational and non-commercial purposes, including articles for Wikipedia and similar sites, provided you mention the source and provide a link.

Last update 2013
.

CIFAR Dataset: Best Results as of  2011, by Fast Deep Nets on GPUs. (Jürgen Schmidhuber)

COMPETITION DETAILS

Links to the original datasets of competitions and benchmarks, plus more information on the world records set by our team:

14. ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (MITOS Aperio images). There were 129 registered companies / institutes / universities from 40 countries, and 14 results. Our team (with Alessandro & Dan) clearly won the contest (over 20% fewer errors than the second best team).

13. ISBI 2012 Segmentation of neuronal structures in EM stacks challenge. See the TrakEM2 data sets of INI. Our team won the contest on all three evaluation metrics by a large margin, with superhuman performance in terms of pixel error (March 2012) [15]. (Ranks 2-6 for researchers at ETHZ, MIT, CMU, Harvard.)

12. IJCNN 2011 on-site Traffic Sign Recognition Competition (1st rank, 2 August 2011, 0.56% error rate, the only method better than humans, who achieved 1.16% on average; 3rd place for 1.69%) [10,18].

11. INI @ Univ. Bochum's online German Traffic Sign Recognition Benchmark, won through late night efforts of Dan & Ueli & Jonathan (1st & 2nd rank; 1.02% error rate, January 2011) [10].

10. NORB object recognition dataset, NY University, 2004. Our team set the new record on the standard set (2.53% error rate) in January 2011 [8], and achieved 2.7% on the full set [17] (best previous result by others: 5%).

9. The CIFAR-10 dataset of Univ. Toronto, 2009. Our team set the new record (19.51% error rate) on these rather challenging data in January 2011 [8], and improved this to 11.2% [17].

IJCNN 2011 on-site Traffic Sign Recognition Competition (1st rank, 2 August 2011, 0.56% error rate, the only method better than humans, who achieved 1.16% on average; 3rd place for 1.69%) (Jürgen Schmidhuber)

8. The MNIST dataset of NY University, 1998. Our team set the new record (0.35% error rate) in 2010 [6], tied it again in January 2011 [8], broke it again in March 2011 (0.31%) [9], and again (0.27%, ICDAR 2011) [12], and finally achieved the first human-competitive result: 0.23% [17] (mean of many runs; many individual runs yield better results, of course, down to 0.17% [12]).

7. The Chinese Handwriting Recognition Competition at ICDAR 2011 (offline). Our team won 1st and 2nd rank (CR(1): 92.18% correct; CR(10): 99.29% correct) in June 2011.

Three Connected Handwriting Recognition Competitions at ICDAR 2009 were won by our multi-dimensional LSTM recurrent neural networks [3,4] through the efforts of Alex:

6. Arabic Connected Handwriting Competition of Univ. Braunschweig

5. Handwritten Farsi/Arabic Character Recognition Competition

4. French Connected Handwriting Competition (PDF) based on data from the RIMES campaign

Note that 4-8 are treated in more detail in the page on handwriting recognition.

KTH and Weizmann Datasets: Best Results as of 2010, by Kernel-Based Recognizers of Human Actions. (Jürgen Schmidhuber)

3. The Weizmann Human Action Dataset of Weizmann Institute of Science, and the KTH Human Action Dataset of KTH Royal Insitute of Technology. New records set in 2010 [7], thanks to Somayeh's efforts.

2. The Outex Texture Database, Univ. Oulu, 2002 [5].

1. The ZuBuD database of pictures of buildings in Zürich, ETHZ, 2003 [2].


Here a 12 min Google Tech Talk video on fast deep / recurrent nets (only slides and voice) at AGI 2011, summarizing results as of August 2011:

Google Tech Talk video (13:05) on fast deep / recurrent neural networks for computer vision presented by Juergen Schmidhuber at AGI 2011 at Google HQ, Mountain View, CA.

Our algorithms not only were the first deep learning methods to win international competitions (since 2009) and to become human-competitive, they also have numerous immediate industrial and medical applications. Are you an industrial company that wants to solve interesting pattern recognition problems? Don't hesitate to contact JS. We already developed:

1. State-of-the-art handwriting recognition for a software services company.
2. State-of-the-art steel defect detection for the world's largest steel maker.
3. State-of-the-art low-cost pattern recognition for a leading automotive supplier.
4. Low-power variants of our methods for apps running on cell phone chips.
.