This blog article is re-published from the Power Architecture Today Blog
NXP’s (formerly Freescale) new T-series processor has brought back the AltiVec floating point SIMD instruction set, the heart of many defense and aerospace digital signal processing (DSP) applications for the last two decades. AltiVec, a mainstay of commercial-off-the-shelf (COTS) DSP boards for years, went missing in action for one series of previous devices. A disappearing act that led embedded system users to look for an alternative solution.
They had to because without the floating point engine the compute performance of the Power Architecture devices wasn’t a great match for what many DSP system designers needed. For many the alternative to AltiVec-less NXP was Intel. Processors like Intel’s i4s and i5s, with their built-in floating point units, etc., provided an alternate that delivered the desired performance increase. Switching to an alternative processing architecture isn’t always an attractive approach.
There are still a lot of users producing products with E600 cores and AltiVec. The good news for those who opted to stay with their tried and true algorithms, but who might also be desirous of boosting their compute performance, is that the AltiVec they knew and love has returned untransformed. The T-series processors feature the very same AltiVec that was in the E60/E600 cores found in the older 7447As and x48s processors.
That means that designers who have invested much time and money into developing and optimizing their algorithms can now easily perform a technology insertion to boost their compute power. Even better they don’t have to undergo the time consuming, costly, and potentially risky process of re-writing their algorithms from AltiVec instructions to another SMD type of processing engine.
Some designers have spent years developing their application’s particular set of algorithms. Re-writing algorithms and verifying performance can be tricky and time-consuming. Because every engine has its own idiosyncrasies, designers usually have to go through the entire process of re-coding the algorithm and getting it to work properly. All of the work that they’ve already done to tweak their existing algorithms and all the investment they’ve made to figure out which instructions they want to use, is essentially wasted. Also, based on the kind of processing they’re doing, they will have spent much effort to determine the order and method that they want processing to occur and which memory they want to move back and forth.
With AltiVec’s return, that valuable algorithm that the designer spent years implementing and improving can now be easily re-implemented within NXP’s newer devices. Now, that doesn’t mean that no programming will be required. The older Power Architecture processor’s instruction sets, the way that data is readied to be pulled into the AltiVec engines, and how the algorithms are executed, will all need to be examined, and will likely need to be changed a bit. But the fact that the algorithms can use the exact same instructions gives the system developer a lot more confidence while significantly lowering their risk when moving legacy code to a new processor.
The good news is that the resulting compute and power benefits of upgrading a legacy Power-Architecture-based design to a T-series can be huge. Consider the processing power of a new T2080 compared to the older 7447A and 7448s with which many old VME DSP engines were built. In a DSP system, multiple 7447As and 7448s, required 45-50W of power dissipation per board just for the processor cores alone, not including other circuitry and memory, and could only run at 1.0 to 1.2 GHz before hitting a power management ceiling. In comparison, the T2080 running at 1.8 GHz is a 20-22 Watt processor. That’s about the same power requirement as a single node on an earlier Power-based VME DSP engine.
Granted, while AltiVec has returned unchanged, it’s still not as wide an engine as you find in the Intel 4th and 5th generation products. Intel floating point engines are twice as wide as the 128-bit AltiVec. That means that the T-series of processors, AltiVec can perform four instructions in parallel, since, for single precision, each instruction is 32-bits. The Intel SIMD engine is 256-bits wide, so compared to AltiVec, can execute twice as many instructions at the same clock tick.
For some applications the headache of porting the AltiVec SIMD instructions to Intel SIMD instructions to achieve that performance boost will be worth it. But for many applications, for example, when a designer desires a technology insertion and doesn’t need a terribly large increase in processing power, the task of re-writing the code will pose too high a barrier. What’s more, this type of technology insertion project doesn’t usually come with a large budget for software development. For these users, the T2080 can theoretically deliver about 50 percent more processing power compared to their legacy Power Architecture processor. Now, combine that increased performance with faster memory and lower power and the result is significant benefits that are also easy to implement.