First Superhuman Visual Pattern Recognition 2011: traffic signs

2011: First Superhuman Visual Pattern Recognition
twice better than humans
three times better than the closest artificial competitor
six times better than the best non-neural method

Jürgen Schmidhuber


Our Deep Learning Neural Networks (NN) were the first methods to achieve superhuman pattern recognition in an official international competition (with secret test set known only to the organisers) [1,2]. This was made possible through the work of my collaborators Dr. Dan Claudiu Ciresan & Dr. Ueli Meier & Jonathan Masci.

At IJCNN 2011 in San Jose (CA), our NN achieved 0.56% error rate in the IJCNN Traffic Sign Recognition Competition of INI/RUB [14,14b]. Humans got 1.16% on average (over 2 times worse - some humans will do better than that though). Another NN [13] got 1.69%. The best non-neural (natural or artificial) learner got 3.86% - over 6 times worse.

A few months earlier, our team already won the qualifying in a 1st stage online competition, albeit by a much smaller margin: 1.02% vs 1.03% for second place [13]. After the deadline, the organisers revealed that human performance on the test set was 1.19%. That is, the best methods already seemed human-competitive. However, during the 1st stage it was possible to incrementally gain information about the test set by probing it through repeated submissions. This is illustrated by better and better results obtained by various teams over time [14a] (the organisers eventually imposed a limit of ten resubmissions). In the final competition [14b] this was not possible.

I still remember when in 1997 many thought it a big deal that human chess world champion Kasparov was beaten by an IBM computer. But back then computers could not at all compete with little kids in visual pattern recognition, which seems much harder than chess from a computational perspective.

Kids are still much better general pattern recognisers, of course. But today our NN can already learn to rival them in important limited domains. And each decade we gain another factor of 100-1000 in terms of raw computational power per cent. Deep Learning is here to stay.

Traffic sign recognisers are obviously important for self-driving cars. 20 years ago, the first fully autonomous cars appeared in traffic (Ernst Dickmanns & Mercedes Benz, 1994) [3]. But for legal and safety reasons, a human had to be onboard. Superhuman pattern recognition could help to make temporarily empty robot taxis acceptable.

To achieve excellent traffic sign recognition, pure supervised gradient descent (40-year-old backprop [4a,4]) was applied [12,7,8] to our GPU-based Deep and Wide Multi-Column Committees of Max-Pooling Convolutional Neural Networks (MC GPU-MPCNN) [5,6] with alternating weight-sharing convolutional layers [10,12] and max-pooling layers [11,11a,7,8] topped by fully connected layers [4] (over two decades, LeCun's lab has invented many improvements of such CNN). This architecture is biologically rather plausible, inspired by early neuroscience-related work [9,10]. Additional tricks can be found in the papers [1,2,5,6,13,15].

Our deep and wide MC GPU-MPCNN also was the first system with human-competitive performance [6] of around 0.2% error rate on MNIST handwritten digits [12], perhaps the most famous benchmark of Machine Learning. Most if not all leading IT companies and research labs are now using our combination of techniques, too. Compare [15].

Can you spot the visual Fibonacci pattern in the graphics above?


References

[1] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. Multi-Column Deep Neural Network for Traffic Sign Classification. Neural Networks 32: 333-338, 2012. PDF of preprint.

[2] D. C. Ciresan, U. Meier, J. Masci, J. Schmidhuber. A Committee of Neural Networks for Traffic Sign Classification. International Joint Conference on Neural Networks (IJCNN-2011, San Francisco), 2011. PDF.

[3] J. Schmidhuber. Highlights of robot car history, 2005.

[4] Paul J. Werbos. Applications of advances in nonlinear sensitivity analysis. In R. Drenick, F. Kozin, (eds): System Modeling and Optimization: Proc. IFIP (1981), Springer, 1982.

[4a] S. Linnainmaa. The representation of the cumulative rounding error of an algorithm as a Taylor expansion of the local rounding errors. Master's Thesis (in Finnish), Univ. Helsinki, 1970. See chapters 6-7 and FORTRAN code on pages 58-60.

[5] D. C. Ciresan, U. Meier, J. Masci, L. M. Gambardella, J. Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence (IJCAI-2011, Barcelona), 2011. PDF. ArXiv preprint.

[6] D. C. Ciresan, U. Meier, J. Schmidhuber. Multi-column Deep Neural Networks for Image Classification. Proc. IEEE Conf. on Computer Vision and Pattern Recognition CVPR 2012, p 3642-3649, 2012. PDF. Longer TR: arXiv:1202.2745v1 [cs.CV]

[7] M. A. Ranzato, Y. LeCun: A Sparse and Locally Shift Invariant Feature Extractor Applied to Document Images. Proc. ICDAR, 2007

[8] D. Scherer, A. Mueller, S. Behnke. Evaluation of pooling operations in convolutional architectures for object recognition. In Proc. ICANN 2010.

[9] Hubel, D. H., T. N. Wiesel. Receptive Fields, Binocular Interaction And Functional Architecture In The Cat's Visual Cortex. Journal of Physiology, 1962.

[10] K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4): 193-202, 1980. Scholarpedia.

[11] Weng, J., Ahuja, N., and Huang, T. S. (1992). Cresceptron: a self-organizing neural network which grows adaptively. In International Joint Conference on Neural Networks (IJCNN), vol 1, p 576-581.

[11a] M. Riesenhuber, T. Poggio. Hierarchical models of object recognition in cortex. Nature Neuroscience 11, p 1019-1025, 1999.

[12] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel: Backpropagation Applied to Handwritten Zip Code Recognition, Neural Computation, 1(4):541-551, 1989.

[13] P. Sermanet, Y. LeCun. Traffic sign recognition with multi-scale convolutional networks. Proc. IJCNN 2011, p 2809-2813, IEEE, 2011

[14] INI Benchmark Website: The German Traffic Sign Recognition Benchmark

[14a] Qualifying for IJCNN 2011 competition: results of 1st stage (January 2011)

[14b] Results for IJCNN 2011 competition (2 August 2011)

[15] J. Schmidhuber. Deep Learning since 1991
.