


The VPX3-4936 GPU Processor Card is a powerful 3U OpenVPX™ GPU coprocessing engine developed in alignment with the SOSA technical standard. Based on the latest generation NVIDIA Ampere architecture, the VPX3-4936 is ideally suited for artificial intelligence (AI) and machine learning technologies as well as intensive signal processing in the aerospace and defense industries.
Key Features
- NVIDIA GA104 (RTX4500E) delivering 17.66 TFLOPS FP32 peak performance
- Capable of 154 GFLOPS/W
- 5888 CUDA® Cores for parallel processing
- 184 Tensor cores for dedicated AI Accelerated compute
- 46 RT Cores for superior rendering speed
- 16 GB GRRD6 256-bit memory
- 512 GB/S Max memory bandwidth
- Configurable PCIe® Gen4 x16 Switch
- Operating power configurable hard cap: 40-180W
- SOSA and legacy OpenVPX variants
- 4 simultaneous video outputs, supporting DP, DVI, HDMI
Applications
- ISR and EW applications requiring the highest performance processing
- SWaP-constrained deep learning inference needing 2x-4x more throughput than the previous generation GPU
- High-performance RADAR, SIGINT, EO/IR, sensor fusion, and autonomous platforms
VPX3-4936 GPU Processor Card, Aligned with the SOSA Technical Standard
The VPX3-4936 delivers unparalleled processing power ideally suited for AI and machine learning technologies with its Ampere GPU, 5888 CUDA cores, 184 tensor cores, and 46 ray tracing (RT) cores for superior rendering speeds. The Ampere architecture delivers twice the throughput of previous generations, which can range up to four times the throughput when the streaming multiprocessors can power the CUDA cores.
The VPX3-4936 delivers:
- Next-generation Ampere GPU architecture
- Twice the performance per slot (154 GFLOPs/W) of previous Turing GPUs (86 GFLOPs/W)
- An open standards approach developed in alignment with the SOSA technical standard
- 512 GB/s maximum memory bandwidth is the highest memory bandwidth available in an embeddable GPU
- GEN 4 PCie, doubling the host interface bandwidth to help eliminate data throughput bottlenecks
More flexible FP32 data processing
The Ampere architecture of the VPX3-4936 delivers more flexible concurrent execution of floating point and integer streams. With the Ampere, there are two data paths, one dedicated to processing FP32 data, while the other can execute INT32 or FP32 operations, enabling applications to access all the CUDA cores on the processor by processing FP32 data on both paths.
Enhanced Tensor Cores for Faster AI Deep Learning
Previous generations of Tensor cores provided significant improvements in the speed of tensor/matrix computation used for deep learning neural network training and inference operations. The third-generation Tensor cores in the Ampere architecture of the VPX3-4936 support new features and datatypes that improve performance, efficiency, and programming flexibility. Depending on the workload, these new Tensor cores deliver two to four times more throughput than previous generations.