The CPU/GPU software for training and testing Deep Neural Networks I have developed starting from 2008 is state of the art for segmented object classification (see publications, benchmarks and competitions on main page).
Features:
Portable: C++ code that runs on either CPU (can be recompiled to run on any platform) or GPU (CUDA). OpenCL version planned.
Stable.
Scalable: up to thousands of classes and gigabytes of data.
Very fast: 100-20000 images/s, depending of hardware, image size and net complexity.
Adaptable: the only thing required for learning a new task is a training dataset with labeled images.
Small memory footprint: less than 2MB for the executable and from 0.1MB to 15MB for the DNN model.
Ready for deployment in real applications like:
Various handwritten character recognition tasks (Digits, Latin and Chinese characters, any other symbols).
Automotive: traffic signs classification, number plate recognition.
Applications that require object classification (i.e. cell detection, face classification etc).