Cool World: A Tour of Thermal-management Approaches for Rugged Computer Systems

Military Embedded Systems

Published in Military Embedded Systems

What happens when a CPU gets too hot? Circuitry within the device runs slower, which can lead to poor system performance. The design of rugged mission-critical computer systems must consider thermal management as a system-level issue.

There are usually two levels of protection built into the chip to protect it from overheating. The first is a critical shutdown which, when triggered, will shut down the whole device to prevent physical damage. The second is throttling, where the processor’s clock is simply slowed down. Throttling, which is supported by Intel processors, typically occurs at a lower temperature threshold than shutdown.

For example, Intel core processors automatically throttle their performance based on the processor workload and their thermal environment. In theory, this is a good approach for cooling down a system that heats up after using increased amounts of power. In a mission-critical environment, however, a throttled processor is not desirable. For defense applications, such as electronic warfare (EW) and intelligence, surveillance and reconnaissance (ISR), where consistent, deterministic performance is required, processor throttling can adversely affect mission success.

Processor throttling (also sometimes called dynamic frequency scaling) is used in computer architectures to adjust the clock frequency, or instructions executed per unit of time, of a processor. Throttling back the clock frequency causes a processor to run more slowly, do less work, use less power, and as a consequence generate less heat. As the device’s operating clock gracefully slows down, the temperature goes down, preventing timing errors.

Keeping your cool

Thermal management for traditional servers, desktops, or laptops is fairly straightforward. Typically, system designers can come up with a combination of fans, heat sinks, heat pipe coolers, and other components that keep systems within a relatively cool operating range. Rugged military computers are different, however: The harsh conditions encountered by military platforms in the air, on the ground, or at sea preclude the use of many traditional cooling methods or require substantial changes and/or limitations.

For example, typical cooling fans work by exchanging the air inside the computer with the cooler, ambient air on the outside. But what if that ambient air is full of dust, humidity, salt fog, or smoke? All of these conditions are potentially harmful if introduced into the system. Consider missions that must operate in low-pressure zones (higher altitudes). Sometimes at higher altitudes there will not be enough air available to transfer heat sufficiently.

Each design challenge must consider the entire system, with components and solutions selected to best meet the requirements of the finished product.

For rugged applications, several thermal management techniques are often required to protect a system’s internal components.

Conduction cooling

Conduction cooling is defined as the transfer of heat through solids. A common example is a conduction-cooled chassis mounted onto a cold plate (Figure 1). Heat generated inside the chassis by the electronics flows into the aluminum sidewalls of the chassis and down into the cold plate. Since heat energy wants to move from the source to another medium that’s cooler, the heat is transferred from the chips to the lower-temperature cold plate.

Read the full article...