Using Frameworks in Machine Learning

Military Embedded Systems

Published in Military Embedded Systems

An Industry Perspective from Curtiss-Wright Defense Solutions

A framework is a toolbox for creating, training and validating deep-learning neural networks. Using a high-level programming API, it hides the complexities of the underlying algorithms to greatly simplify and speed up development. Like deep learning, frameworks are evolving rapidly. This column will focus on frameworks that work with NVIDIA’s TensorRT, a tool for deploying high-­performance deep neural networks.

Currently, the most popular framework is TensorFlow, an open-source software library created and supported by Google. Based on the Python programming language, TensorFlow supports C++ and R. A highly flexible system capable of running on multiple CPUs and GPUs, it can be used on desktop computers, servers, or mobile devices without rewriting code. TensorFlow supports the visualization of computation graphs. For example, a matrix multiplication would be represented by a node, while the two incoming edges would correspond to the incoming edge, and the result would be the outgoing edge. Because TensorFlow is a low-level library, creating models can be challenging and complex.

Keras, also written in Python, is a simplified interface that enables efficient neural nets to be built with minimum code. Since it runs on top of TensorFlow, MXNet, and other frameworks, Keras is surging in popularity. This user-friendly framework minimizes the number of APIs, and it is modular for easier model construction. Less configurable than lower-level frameworks, Keras works well for beginners who want to learn how to use various machine-learning models and quickly understand how they work.

PyTorch is based on Torch, a Lua-based open-source machine library. Developed by Facebook’s artificial-intelligence research group, PyTorch features complex tensor computation and strong GPU acceleration support. With basic Python knowledge, users can build deep-learning models without a steep learning curve. Its PyTorch architecture simplifies the deep-modeling process and offers more transparency than Torch. It also supports both data parallelism and distributed learning. With its many pre-trained models, PyTorch makes a good choice for prototyping and small projects (a C++ interface is in beta testing).

MXNet is designed for high efficiency, productivity, and flexibility. Supporting multiple popular programming languages – including Python, R, C++, and Julia – MXNet lets users train a deep-learning model without having to learn a new language. Like PyTorch, its back end is written in C++ and CUDA. Supporting recurrent neural networks (RNN), convolution neural networks (CNN), and long-short term memory (LTSM) networks, MXNet is touted for its imaging, handwriting/speech recognition, and forecasting capabilities. It scales well across multiple CPUs and GPUs, making it useful for enterprise solutions. Gluon, another simplified front end for MXNT like Keras, also supports a model zoo of predefined and pre-trained models.

The original Caffe framework is best known for solving image-processing tasks, especially visual recognition, but does not perform well for non-vision network design, reduced math precision, or distributed computation. It supports MATLAB as well as C, C++, Python, and a model zoo. To address these shortcomings, Facebook created Caffe2 to support its applications; currently, Caffe2 is being merged into PyTorch. NVIDIA maintains a separate fork of Caffe (“NVIDIA Caffe” or “NVCaffe”) tuned for multiple-GPU configurations and mixed-precision support. It features layer-wise adaptive rate control (LARC) with an adaptive global gradient scaler for improved accuracy, especially for 16-bit floating-point training.

NVIDIA TensorRT is a high-performance inference engineering tool designed to deliver maximum throughput, low latency, and power efficiency in the deployed network. It provides APIs in C++ and Python. Trained models are optimized by first restructuring to remove layers with no output, and then fusing and aggregating the remaining layers. The model is then optimized and calibrated to use lower precision (such as INT8 or FP16). For example, a TensorFlow CNN on an NVIDIA V100 can process 305 images/second. When the CNN is optimized with TensorRT, the output is 5700 images/second. (Framework training comparison from NVIDIA is available at https://developer.nvidia.com/deep-learning-performance-training-inference)

All of these frameworks are open-source, are available on GitHub, and can be deployed using NVIDIA’s TensorRT. The Caffe, TensorFlow, Pytorch, and MXNET frameworks are supported by Bright Cluster Manager in Curtiss-Wright’s OpenHPEC Accelerator Suite of development tools.

Choosing the right framework depends on the type of network being developed, the programming language and tools, and the user’s skill set.