Florent de Dinechin
ENS de Lyon
In the last decades embedded systems have been challenged with more and more application variety, each time more constrained. This implies an ever growing need for performances and energy efficiency in arithmetic units. This work studies solutions ranging from hardware to software to improve floating-point arithmetic support in embedded systems. Some of these solutions were integrated in Kalray's MPPA processor.
The study starts with the design of a floating-point unit (FPU) based on the classical FMA (Fused Multiply-Add) operator. The improvements we suggest, implement and evaluate include a mixed precision FMA, a 3-operand add and a 2D scalar product, each time with a single rounding and support for subnormal numbers. It then considers the implementation of division and square root. The FPU is reused and modified to optimize the software implementations of those primitives at a lower cost. Finally, the study opens up on the development of a code generator designed for the implementation of highly optimized mathematical libraries in different contexts (architecture, accuracy, latency, throughput).