HPEC: High Availability by Design
A fault-tolerant system must have the ability to continue processing data even when a hardware failure occurs. This is accomplished by building duplicate hardware of all critical components of the system, eliminating single points of failure such as the processors and power supplies. When a component fails, the software must be able to detect the failure and handle the switching of the hardware and re-routing of the data flow. As the next generation of embedded defense systems becomes even more complex with more computation power, memory, data, and speed, the problem of designing effective fault-tolerant systems also becomes more difficult. Fortunately, the High-Performance Computing (HPC) world has developed a set of mature, proven methodologies and tools to support High Availability (HA) clusters. By definition, availability refers to a level of service provided by applications, services, or systems. Highly available systems have minimal downtime, whether planned or unplanned.
Log in and download the white paper to learn more.
- Dissecting the HPEC Cluster
- The cluster manager
- The "failover"
- The heartbeat connection
- Identifying the Dead Node
- The STONITH procedure
- Managing your HA Embedded System