| . |
Fast Deep / Recurrent Neural Networks Win Many Computer Vision Contests
Jürgen Schmidhuber, 2009-2012
(compare interview on KurzweilAI)
Computer vision and pattern recognition is becoming essential for thousands of practical applications. For example,
the future of search engines lies in image and video recognition as opposed to
traditional text search. Autonomous robots such as
driverless cars depend on it, too. It even has
lifesaving impact through
medical applications such as cancer detection.
Since 2009, our neural computer vision team has won 8
(eight) first prizes in important
and highly competitive international visual pattern recognition contests.
Our neural nets also were the first machine learning methods to reach human-competitive or even superhuman performance on important benchmarks
(details in the rightmost column):
8. ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images
7. ISBI 2012 Image Segmentation Challenge (with superhuman pixel error rate)
6. IJCNN 2011 Traffic Sign Recognition Competition (only our method achieved superhuman results)
5. ICDAR 2011 offline Chinese Handwriting Competition
4. Online German Traffic Sign Recognition Contest
3. ICDAR 2009 Arabic Connected Handwriting Competition
2. ICDAR 2009 Handwritten Farsi/Arabic Character Recognition Competition
1. ICDAR 2009 French Connected Handwriting Competition.
Compare the overview page on handwriting recognition.
Our "deep learning" methods also set records in important Machine Learning (ML) benchmarks
(details in the rightmost column):
A. The NORB Object Recognition Benchmark
B. The CIFAR Image Classification Benchmark
C. The MNIST Handwritten Digits Benchmark (perhaps the most famous benchmark;
we achieved the 1st human-competitive result in 2011)
D. The Weizmann & KTH Human Action Recognition Benchmarks
Remarkably, none of 1-8 & A-C above required the traditional
sophisticated computer vision techniques developed over the past six decades or so.
Instead, our biologically rather plausible systems are inspired by human brains,
and learn to recognize objects from numerous training examples.
We use deep, artificial, supervised,
feedforward or recurrent (deep by nature) neural networks with many
non-linear processing stages.
We started work on "deep learning" over two decades ago. Back then,
Sepp Hochreiter (now professor)
was an undergrad student working on Schmidhuber's neural net project. His 1991 thesis
(PDF)
formally showed that deep networks like the above are hard to train because they suffer
from the now famous "vanishing gradient" problem.
Since then we have developed various techniques to overcome this obstacle
(e.g., see here).
Some but not all of our nets are inspired by early hierarchical neural systems
such as Fukushima's Neocognitron (1980).
We sometimes (but not always) profit from sparse network connectivity
and techniques such as weight sharing & convolution (LeCun et al, 1995),
contrast enhancement, and
max-pooling (e.g., Scherer et al, 2010).
We use
graphics cards or GPUs (mini-supercomputers
for video games, see picture in 2nd column) to speed up learning by a factor of up to 50.
Our committees of networks improve the results even further.
Note that the successes 1-8 & A-C above did NOT require
any unsupervised pre-training, which is a bit depressing,
as we have developed unsupervised learning algorithms for 20 years.
However, our new systems' feature detectors (FDs) do resemble
those found by our old unsupervised methods such as
Predictability Minimization (1992; more: 1996)
and Low-Complexity Coding and Decoding (LOCOCODE, 1999).
FD due to semi- linear PM (1992), made in
1996
Reference [14] also uses fast deep nets, this time
to achieve superior hand gesture recognition.
And reference [16] uses them to achieve superior
steel defect detection, three times better than
support vector machines (SVM)
trained on commonly used feature descriptors.
Not all of our pattern recognizers use neural nets though.
For D above, novel supervised and unsupervised kernel-based methods were employed.
Reference [5] uses credal classifiers to classify textures.
And reference [2] uses a fast voting scheme to answer image-based queries,
successfully tested
on the ZuBud database.
.
|
|
|
Our simple training algorithms for
deep, wide, artificial neural network architectures
similar to those of biological brains now win many competitions and yield
best known results on many famous benchmarks for visual pattern recognition. Shown here are
example images from
NORB and EM stacks (left), CIFAR-10, Weizmann, KTH (below), and the
Traffic Sign Competition (above / below).
|
|
. |
We are currently experiencing a second Neural Network
ReNNaissance (see IJCNN 2011 keynote - the first one happened in the 1980s and early 90s).
In many applications, our deep NNs are now outperforming all other methods
including
the theoretically less general and less powerful support vector machines
(which for a long time had the upper hand, at least in practice).
Check out the, in hindsight, not too optimistic
predictions of our RNNaissance workshop at NIPS 2003,
and compare the RNN book preface.
Computer Vision Team (ex-)members in Schmidhuber's lab(s):
Dan Ciresan,
Ueli Meier,
Jonathan Masci,
Somayeh Danafar,
Alex Graves,
Davide Migliore.
For medical imaging, we also work with
Alessandro Giusti
in the group of
Luca Maria Gambardella.
Our work builds on earlier work by great neural network pioneers including Werbos, Amari, LeCun, Hinton, Williams, Rumelhart, Poggio, von der Malsburg, Kohonen, Fukushima, and others.
SELECTED PUBLICATIONS
[18]
D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber.
Multi-Column Deep Neural Network for Traffic Sign Classification.
Neural Networks 32, p 333-338, 2012.
PDF of preprint.
[17]
D. C. Ciresan, U. Meier, J. Schmidhuber.
Multi-column Deep Neural Networks for Image Classification.
IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, 2012.
PDF.
Longer preprint
arXiv:1202.2745v1 [cs.CV].
[16]
J. Masci, U. Meier, D. Ciresan, G. Fricout, J. Schmidhuber.
Steel Defect Classification with Max-Pooling Convolutional Neural Networks.
Proc. IJCNN 2012. PDF.
[15]
D. Ciresan, A. Giusti, L. Gambardella, J. Schmidhuber.
Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images.
In Advances in Neural Information Processing Systems (NIPS 2012), Lake Tahoe,
2012. PDF. (See also
ISBI EM Competition Abstracts.)
[14]
J. Nagi, F. Ducatelle, G. A. Di Caro, D. Ciresan, U. Meier, A. Giusti, F. Nagi, J. Schmidhuber, L. M. Gambardella. Max-Pooling Convolutional Neural Networks for Vision-based Hand Gesture Recognition. Proc. 3rd IEEE Intl. Conf. on Signal & Image Processing and Applications (ICSIPA), Kuala Lumpur, 2011.
PDF.
[13]
J. Schmidhuber, D. Ciresan, U. Meier, J. Masci, A. Graves.
On Fast Deep Nets for AGI Vision.
In Proc. Fourth Conference on Artificial General Intelligence (AGI-11),
Google, Mountain View, California, 2011.
PDF.
Video.
[12]
D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber.
Convolutional Neural Network Committees For Handwritten Character Classification.
11th International Conference on Document Analysis and Recognition (ICDAR 2011),
Beijing, China, 2011.
PDF.
[11]
U. Meier, D. C. Ciresan, L. M. Gambardella, J. Schmidhuber.
Better Digit Recognition with a Committee of Simple Neural Nets.
11th International Conference on Document Analysis and Recognition (ICDAR 2011),
Beijing, China, 2011.
PDF.
[10]
D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber.
A Committee of Neural Networks for Traffic Sign Classification.
International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011.
PDF.
[9]
D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber.
Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs.
ArXiv Preprint
arXiv:1103.4487v1 [cs.LG], 23 Mar 2011.
[8]
D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber.
Flexible, High Performance Convolutional Neural Networks for Image Classification.
International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011.
ArXiv preprint, 1 Feb 2011.
Describes our special breed of max-pooling convolutional networks (MPCNN), now widely used by research labs and companies all over the world.
[7]
S. Danafar, A. Giusti, J. Schmidhuber. New State-of-the-Art Recognizers of Human Actions. EURASIP Journal on Advances in Signal Processing, doi:10.1155/2010/202768, 2010.
HTML.
[6]
D. C. Ciresan, U. Meier, L. M. Gambardella, J. Schmidhuber.
Deep Big Simple Neural Nets For Handwritten Digit Recognition.
Neural Computation 22(12): 3207-3220, 2010.
ArXiv Preprint
arXiv:1003.0358v1 [cs.NE], 1 March 2010.
[5] G. Corani, A. Giusti, D. Migliore, J. Schmidhuber.
Robust Texture Recognition Using Credal Classifiers.
Proc. BMVC, p 78.1-78.10. BMVA Press, 2010. doi:10.5244/C.24.78.
HTML.
[4]
A. Graves, J. Schmidhuber.
Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks.
Advances in Neural Information Processing Systems 22, NIPS'22, p 545-552,
Vancouver, MIT Press, 2009.
PDF.
[3]
A. Graves, S. Fernandez, J. Schmidhuber. Multi-Dimensional Recurrent
Neural Networks.
Intl. Conf. on Artificial Neural Networks ICANN'07,
2007.
Preprint: arxiv:0705.2011.
PDF.
[2]
M.v.d. Giessen and J. Schmidhuber.
Fast color-based object recognition independent of position and
orientation.
In W. Duch et al. (Eds.):
Proc. Intl. Conf. on Artificial Neural Networks ICANN'05,
LNCS 3696, pp. 469-474, Springer-Verlag Berlin Heidelberg, 2005.
PDF.
Ongoing work on active perception.
While the methods above tend to work fine in many applications,
they are passive learners - they do not learn to actively
search for the most informative image parts. Humans, however,
use sequential gaze shifts for pattern recognition.
This can be much more efficient than the fully parallel one-shot approach.
That's why we want to combine the algorithms above with variants of our old method of 1990 - back then
we built what to our knowledge was
the first artificial fovea sequentially steered by a learning neural controller.
Without a teacher, it used a variant of reinforcement learning
to create saccades and find targets in a visual scene (and to track moving targets), although computers were a million times slower back then:
[1]
J. Schmidhuber, R. Huber.
Learning to
generate artificial fovea trajectories for target detection.
International Journal of Neural Systems, 2(1 & 2):135-141, 1991
(figures omitted).
PDF.
HTML.
HTML overview with figures.
More on active learning without a teacher in the overview
pages on the Formal Theory of Creativity
and
Curiosity.
Copyright notice (2011):
Fibonacci web design by
Jürgen Schmidhuber,
who
will be delighted if you use this web page
for educational and non-commercial purposes, including
articles for
Wikipedia and similar sites,
provided you mention the source and provide a link.
Last update 2013
.
|
|
COMPETITION DETAILS
Links to the original datasets of competitions and benchmarks,
plus more information on the world records set by our team:
14.
ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (MITOS Aperio images).
There were 129 registered companies / institutes / universities from 40 countries, and 14 results.
Our team (with Alessandro & Dan) clearly
won the contest (over 20% fewer errors than the second best team).
13. ISBI 2012
Segmentation of neuronal structures in EM stacks challenge.
See the TrakEM2 data sets of INI.
Our team won the contest on all three evaluation metrics
by a large margin,
with superhuman performance in terms of pixel error (March 2012) [15].
(Ranks 2-6 for researchers at ETHZ, MIT, CMU, Harvard.)
12. IJCNN 2011 on-site
Traffic Sign Recognition Competition (1st rank, 2 August 2011, 0.56% error rate, the only method better than humans, who achieved 1.16% on average; 3rd place for 1.69%) [10,18].
11. INI @ Univ. Bochum's online
German Traffic Sign Recognition Benchmark, won through late night efforts of Dan & Ueli & Jonathan (1st & 2nd rank; 1.02% error rate, January 2011) [10].
10. NORB object recognition dataset, NY University, 2004.
Our team set the new record on the standard set (2.53% error rate) in January 2011 [8],
and achieved 2.7% on the full set [17] (best previous result by others: 5%).
9. The
CIFAR-10 dataset of Univ. Toronto, 2009.
Our team set the
new record (19.51% error rate) on these rather challenging data in January 2011 [8],
and improved this to 11.2% [17].
8. The MNIST dataset of NY University, 1998. Our team set the new record (0.35% error rate) in 2010 [6], tied it again
in January 2011 [8], broke it again in March 2011 (0.31%) [9], and again (0.27%, ICDAR 2011) [12],
and finally achieved the first human-competitive result: 0.23% [17] (mean of many runs; many individual runs
yield better results, of course, down to 0.17% [12]).
7. The Chinese Handwriting Recognition Competition at ICDAR 2011 (offline). Our team won 1st and 2nd rank (CR(1): 92.18% correct; CR(10): 99.29% correct) in June 2011.
Three Connected Handwriting Recognition Competitions at ICDAR 2009 were won by
our multi-dimensional LSTM recurrent neural networks [3,4] through
the efforts of Alex:
6.
Arabic Connected Handwriting Competition of Univ. Braunschweig
5. Handwritten Farsi/Arabic Character Recognition Competition
4.
French Connected Handwriting Competition (PDF) based on data from the RIMES campaign
Note that 4-8 are treated in more detail in the page on handwriting recognition.
3. The
Weizmann Human Action Dataset
of Weizmann Institute of Science, and the KTH Human Action Dataset of KTH Royal
Insitute of Technology. New records set in 2010 [7], thanks to Somayeh's efforts.
2.
The Outex Texture Database, Univ. Oulu, 2002 [5].
1.
The ZuBuD database of
pictures of buildings in Zürich, ETHZ, 2003 [2].
Here a
12 min Google Tech Talk video on fast deep / recurrent nets (only slides and voice)
at AGI 2011, summarizing results as of August 2011:
Our algorithms not only were the first deep learning methods to win international competitions (since 2009) and to become human-competitive, they also have numerous immediate industrial and medical applications.
Are you an industrial company that wants to solve
interesting pattern recognition problems? Don't hesitate to contact
JS.
We already developed:
1. State-of-the-art handwriting recognition for a software services company.
2. State-of-the-art steel defect detection for the world's largest steel maker.
3. State-of-the-art low-cost pattern recognition for a leading automotive supplier.
4. Low-power variants of our methods for apps running on cell phone chips.
.
|