Zen and the Art of HPEC Software Debugging

September 18, 2017

Zen and the Art of HPEC Software Debugging

Published in Military Embedded Systems

The age-old debate regarding art versus science: Some engineers take a romantic approach, while others take a more traditional approach. Whether debugging is an art or a science or a combination of both will continue to be debated, but all sides can agree the tools can make all the difference between timely success and riding a metaphorical motorcycle down the road of failure.

The author of the classic novel “Zen and the Art of Motorcycle Maintenance,” Robert L. Pirsig, passed away in April 2017. His book, first published in 1974, has inspired many readers with its interweaving of serious philosophical explorations into the tale of a father and son coming to understand each other during a motorcycle road trip across the U.S. The ideas in Pirsig’s writing could encourage a view of software debugging that integrates art and science, so that the high-performance embedded computing (HPEC) system developer can successfully tap both process and imagination to solve daunting everyday problems.

There has always been a debate about whether software debugging is an art or a science. Some engineers take the romantic approach to software integration: they write the code and debug with inspiration and intuition. Living in the moment, they forgo rational analysis or repeatable processes. When confronted with an issue, they follow their gut instincts and change their code, often without even making a backup in case their assumption was wrong. They doggedly stack patches on top of patches or introduce more problems when trying to back the changes out later.

Taking a different tack, some software developers follow the traditional approach to debugging: they try to diagnose and solve the problem by rigidly following a step-by-step scientific methodology. This makes them frustrated when the real world does not function the same as the world described in the programming books. Because of their dedication to ritual, technology threatens to transform into magic, becoming unpredictable and time-consuming. The traditional developer will try the same techniques over and over again, vainly hoping for a different result.

Perhaps the best programmer is one that embraces debugging as both an art and a science. This approach takes the best from both worlds, enabling bursts of creativity and intuition to work in harmony with rational problem-solving debugging skills while using debugging tools.

The Three Categories of Software Bugs

Most software bugs encountered in the development of HPEC programs fall into three broad categories: The first type is straightforward and repeatable; due to its nature, this type of bug is the easiest to find and to fix.

The next type of bug plays catch-me-if-you-can, and hides when you try to trap it. For example, every developer has tried to debug a problem by putting in “just a couple of ‘print’ statements,” and amazingly the code starts working. After a few more ‘print’ statements are added the problem re-emerges, but dressed in totally different symptoms.

The third type of bug is the monster that haunts our nightmares, the one of such complexity and obscurity that it appears to be truly random and doesn’t follow any discernable pattern. It might, for example, only happen once a week after you run mode A and then mode B a thousand times in a complex sequence, and only when the moon is full and you have left to get a drink and a slice of cold pizza.

All these bugs create a range of problems. They can cause dismal failure, incorrect results, or a hard crash; they could even cause the program to get lost in the weeds. To make matters worse, the error is usually not where the failure is observed; a large part of the debugging detective work is tracing the error back to the root cause. Adding to the complexity, parallel programs’ bugs can and will propagate across multiple threads, as well as multiple processors. If all of that is not enough, the bugs can also be timing-dependent.

As Pirsig observed, “Some things you miss because they’re so tiny you overlook them. But some things you don’t see because they’re so huge.” When debugging, some basic scientific tenets must be observed. One must always use a logbook to track what has been tried. Nothing is more frustrating than seeing a similar problem to one that has already been fixed, but not remembering how it was solved. Keeping a paper logbook is a great first step, but you still have to keep track of the logbook itself!

Fixing Software with Software

Users can call upon a debugger such as Allinea DDT, which provides a digital logbook that automatically records the entire debugging session and preserves the records of their scientific inquiry. For each stop in the program’s execution, the reason and location is recoded along with the parallel stacks, variables, and tracepoints, which is a scalable “print” alternative. The only exercise left for the user is recording the hypothesis, noting the resulting observations, and then concluding using the annotation option. The formation of the hypothesis is part of the art of debugging. As Pirsig opined, “For every fact, there is an infinity of hypothesis.”

Combining Art and Science for Segment Faults

Let’s examine the straightforward problems that happen repeatedly, such as segment faults, aborts, or an exit without an error code, along with the tool features that can facilitate solving these problems. These common bugs are easy to fix with a debugger, a task that is much harder and time-consuming without one. The static-analysis tool will flag common mistakes when the user opens the code with Allinea DDT.

Read the full article.

Tammy Carter

Tammy Carter

Senior Product Manager

Tammy Carter is the Senior Product Manager for GPGPUs and software products, featuring OpenHPEC for Curtiss-Wright Defense Solutions. In addition to an M.S. in Computer Science, she has over 20 years of experience designing, developing, and integrating real-time embedded systems in the defense, communications, and medical arenas.

Deliver Supercomputing Processing Performance

HPEC Systems have a proven track record to deliver supercomputing processing performance in rugged, compact deployable system architectures optimized for harsh military environments. These systems consist of a large number of distributed processors, I/O, and software stacks connected by a low latency system fabric.