Asking the right questions about HPEC software development tools for radar, SIGINT, and EW applications
Published in Military Embedded Systems
High performance embedded computing (HPEC) system designers tasked with architecting large-scale supercomputer-class processing systems for radar, signal intelligence (SIGINT), and electronic warfare (EW) applications depend greatly on the software development tools available to them. The choice of development tools - such as debuggers, profilers, and cluster managers - can result in an intimate relationship; often the choice means the success or failure of the system design.
How then does a system designer select those tools upon which so much depends? In the spirit of viewing the importance of software development tools much like an intimate relationship, let’s consider the selection process in that light, with the benefits and attributes of the tools discussed, and considered as one might interview a potential life partner. When the right questions are asked, the replies can be a revelation.
What follows is an “interview” of the sort that a system designer should undertake when considering the selection of these critical tools, using Allinea’s debugger and profiler and Bright Computing’s Cluster Manager, software developed for use with supercomputers in the commercial High Performance Computing (HPC) market as the examples. Knowing the right questions to ask can make all the difference in speeding development and reducing program risk when designing an HPEC system. System designers should consider asking the following types of questions when selecting their suite of HPEC development tools. The answers can result in a beautiful relationship or in heartbreak, a successful outcome or a sad failure of the project.
Q: (System Designer): How large is your user base and are you proprietary?
A: (HPEC Development Tool Suite): I use open standards-based tools already proven in the realm of supercomputers. Imagine all those new engineers you can hire that will already be familiar with me.
Q: Can you handle diversity?
A: Yes, I am very flexible, I handle a broad mixture of types from single-board computers to digital signal processors (DSPs) and graphics processing units (GPUs). I also easily scale across multiple processors and boards. I also work with both 6U and 3U OpenVPX systems.
Q: Can you speak multiple languages?
A: Of course: C, C++, CUDA, OpenCL, five dialects of MPI, and OpenMP. My other communication skills should also pique your interest, as they include Infiniband, TC/IP sockets (Regular and Encrypted), RoCE, just to name a few. I also speak Centos and Red Hat.
Q: How about setting up and managing the system?
A: I’m a relationship manager. With Bright Computing’s Cluster Manager (Figure 1), the initial setup of all system resources – including the operating systems, boards, disks, and networks – can be achieved by answering a few simple questions. You know, all that “small talk” such as provisioning, memory size and allocation, and the like. Using the image-based provisioning, software build images can be maintained for different board types, configurations, and different developers. The revision-control feature enables the user to track changes to software images using standardized methods. Using the same controls, the user can add, delete, move boards, or change network configurations with the same ease.
Q: Can you monitor my system’s health, check my pulse, and provide status checks?
A: I can provide both a visual status of the entire system and a log. Temperature, CPU loading, and disk space for each card are just a few of the parameters that can be monitored. Boot-up messages for all of the boards, along with any errors and warnings, are captured. For any device that can be monitored, an action (either a standalone script or a built-in command) will be executed when the condition is met.
Q: Can you manage my power?
A: Through the use of power distribution units (PDUs), intelligent platform management interface (IPMI)-based power control, or software daemons, I can manage the power to the system’s nodes. For devices that cannot be controlled through any of the standard existing power control options, a custom power management script can be created and invoked. To conserve power during certain scenarios, I can control CPU core frequencies and power up certain nodes in a predefined sequence or by node groups. For example, while a drone is sitting on the tarmac, the group of nodes designated as “ground control” could be turned on at power-up. Once in the air, the rest of the nodes can be brought online.
Q: Can you help me with fault tolerance?
A: Because of my monitoring and power-management capabilities, I can detect the failure and handle the switching of the hardware and the rerouting of the data flow. So far, I can tell you that your memory is up to speed, and your I/O is flowing freely.
Q: How are you at debugging?
A: HPEC designers need a true system debugger, one that can debug and control threads and processes, both individually and collectively, as defined by variable or expression values, current code location, or process state. This includes setting breakpoints, stepping, or playing individual or predefined groups of threads on a single board or across boards. My memory debugging detects dangling pointers, finds memory leaks, fixes misuse of stack and heap memory, and catches out-of-bounds data accesses. My ability to log variables and events in the background without affecting system timing helps catch the nonrepeatable bugs by allowing the system to collect data overnight, or however long it takes for the problem to occur. I can diagnose deadlocks, live locks, and message synchronization errors with both graphical and table-based message queue displays. I also have the ability to debug and profile across multiple GPUs.
Q: Can you help me verify the correctness of my data?
A: How does automatic change detection of variables, smart highlighting, and graphs of values across threads and processes sound? I can also graph any data (including multi- dimensional data) in the system. Imagine being able to plot your data before and after every filter and FFT, and compare it to your Matlab results or data previously gathered from your sensor. I can also generate statistical analysis of the data structures – No longer will you have to search your data looking for NaNs [“not a number” messages].
Q: Can you guide me through the optimization of my code?
A: By monitoring the expired instruction pipeline, I am able to provide advanced profiling functionality without requiring any application code changes. Unlike the classic trace-based performance tools, I will not drown you in data. Adaptive sampling rates, combined with on-cluster merge technology, ensure that the right amount of data is recorded. I will show the functions and source code lines that consume the most time. We will discover memory bottlenecks together over time. I will help you balance the CPU processing cycles, I/O accesses, and memory fetches. We will use the CPU performance extension to your best advantage, as well as the mapping of the threads to the cores.
Q: Can you help me with system BIT [built-in test]?
A: I can provide a toolbox to create a system-level BIT to help you customize your system. My framework provides application program interfaces (APIs), and a command line interface (CLI), to create PBIT/CBIT/IBIT by using the tests included in the board support package and other tests scripted by you. I can also analyze the results and report back to you. The framework can support custom sequences of the test, as well as custom tests using single or multiple processors.
Q: What can you tell me about the throughput and latency of my system?
A: My Dataflow tool not only shows the processor loading and temperature, but also their relationship to latency and throughput for PCIe, InfiniBand, and Ethernet. Supported APIs include IB Verbs, RoCE, MPI, and both regular and encrypted TCP/IP sockets. In addition to helping verify your data movements, it will also be useful in your testing. The results can be displayed in a real-time graph and/or stored in CSV format. Open standards are supported, with the inclusion of both VSIPL and the FFTW APIs, while the underlying function calls have been optimized for the AVX2, with support for both single-threaded and multiple-threaded versions.
Q: Tell me more about your GPU support.
A: So, are you into deep learning? The Bright Cluster Manager provides the provisioning and monitoring for GPUs as well as support for all GPU programming models. Administrators can have direct access to the performance-enhancing NVIDIA GPU Boost technology. There is also automatic synchronization with the latest NVIDIA CUDA software (verified for your environment), a fully configured modules environment for NVIDIA GPU clusters.
Q: Can I work remotely on the system or do I have to be next to the hardware in a noisy lab?
A: You can work from your desktop or laptop. And you no longer have to fight the tangle of serial cables for configuring, monitoring, debugging, and working with multiple terminal windows. System developers who make the effort to conduct this type of dialogue with their prospective commercial off-the-shelf (COTS) hardware and software development tool suppliers are more likely to end up in a productive relationship that results in a radar, SIGINT, or EW solution that meets their unique requirements, reduces program risk, and gets deployed without avoidable delays.
Read the full article here in Military Embedded Systems.
Taking the Complexity Out of PCI Express Configuration to Optimize HPEC System Design
Performance is all about eliminating bottlenecks to minimize latency and maximize throughput. To maximize overall system performance requires the fastest, most efficient processor-to-processor data paths.
Introducing Gen 5 VPX
Ivan Straznicky looks at the introduction of Gen 5 VPX, the initial Gen 5 VPX protocols are expected to be 100 Gigabit Ethernet (100G-KR4) and Infiniband EDR [enhanced data rate).
GPU Trends: The Quest for Performance, Latency, and Flexibility in ISR Systems
Tammy Carter talks about how employing strategies such as GPUDirect, PCIe Device Lending, and implementing SISCI API can help system integrators optimize ISR solutions.
Delivering supercomputing processing performance
HPEC Systems have a proven track record to deliver supercomputing processing performance in rugged, compact deployable system architectures optimized for harsh military environments. These systems consist of a large number of distributed processors, I/O, and software stacks connected by a low latency system fabric.