Securing your system when the roof is on fire with High Availability Clusters

September 19, 2016 | BY: Tammy Carter

Download PDF

Fault tolerance systems are defined by their ability to continue operating in the event of a component failure. Essentially, fault tolerant systems need to be able to continue processing data no matter the situation (even if the system is on fire). So how do we ensure the data processing continues?

System developers must build duplicate hardware of all critical components of a system and teach the software to re-route the data flow to the alternative hardware once a failure is detected. This comes with several challenges including ensuring the software reacts only when needed and is successful in transferring the software operations to the duplicate hardware.

In a High Performance Embedded Computing (HPEC) cluster, there are compute nodes and the cluster manager, which is also known as the head node. The “head node” is the connection between HPEC cluster and the external network. It controls all other devices and eases the administration of the compute nodes. This node provisioning by the cluster manager simplifies replacing a compute node in the event of a hardware failure. This decreases the risk of any errors and allows for a confident node replacement even when the rest of the system may be failing.

While the head node offers us a secure and reliable solution during a hardware failure, the downside remains that the head node is a single point of failure for the entire system.

What is the solution? A high availability setup derived from the HPC world.
Download the white paper HPEC: High Availability by Design to learn more about:

  • High Availability clusters
  • Fault Tolerance Software
  • HPC applications for HPEC
  • Cluster Managers
  • The STONITH process

Watch our video to learn more about HPEC and our OpenHPEC Accelerator Suite:

Author’s Biography

Tammy Carter

Senior Product Manager – GPGPUs & Software

Tammy Carter is the Senior Product Manager for GPGPUs and software products, featuring OpenHPEC, for Curtiss-Wright Defense Solutions. In addition to a M.S. in Computer Science, she has over 20 years of experience in designing, developing and integrating real-time embedded systems in the Defense, Communications and Medical arenas.

Share This Article

  • Share on Linkedin
  • Share on Twitter
  • Share on Facebook
  • Share on Google+
Want to add a comment? Please login
Connect With Curtiss-Wright Connect With Curtiss-Wright Connect With Curtiss-Wright


Contact our sales team today to learn more about our products and services.





Our support team can help answer your questions - contact us today.