The CPU/GPU software for training and testing Deep Neural Networks I have developed starting from 2008 is state of the art for segmented object classification (see publications, benchmarks and competitions on main page). Some of its details are presented in my papers. It also contains unpublished work.
Features:
Portable: C++ code that runs on either CPU (can be recompiled to run on any platform) or GPU (CUDA). OpenCL version in development.
Stable.
Scalable: up to thousands of classes and gigabytes of data.
Very fast: 100-20000 images/s, depending of hardware, image size and net complexity. Optionally, for even greater speed, the CPU code can run on multiple cores and use manually optimized NEON, SSE, AVX or AVX2 code.
Adaptable: the only requirement for learning a new task is a training dataset with labeled images.
Small memory footprint: less than 2MB for the executable and from 0.1MB to 15MB for the DNN model.
Ready for deployment in industrial applications like:
Various handwritten character recognition tasks (Digits, Latin and Chinese characters, any other symbols).
Automotive: traffic signs detection and classification, number plate recognition, lane detection, pedestrian detection, car detection.
Applications that require object classification (i.e. cell detection, face classification etc.).