Low-Cost Protection for SER Upsets and Silicon Defects

Abstract

Extreme transistor scaling trends in silicon technology are soon to reach a point where manufactured systems will suffer from limited device reliability and severely reduced life-time, due to early transistor failures, gate oxide wear-out, manufacturing defects, and radiation-induced soft errors (SER). In this paper we present a low-cost technique to harden a microprocessor pipeline and caches against these reliability threats. Our approach utilizes online built-in self-test (BIST) and microarchitectural checkpointing to detect, diagnose and recover the computation impaired by silicon defects or SER events. The approach works by periodically testing the processor to determine if the system is broken. If so, we reconfigure the processor to avoid using the broken component. A similar mechanism is used to detect SER, faults, with the difference that recovery is implemented by re-execution. By utilizing low-cost techniques to address defects and SER, we keep protection costs significantly lower than traditional fault-tolerance approaches while providing high levels of coverage for a wide range of faults. Using detailed gate-level simulation, we find that our approach provides 95% and 99% coverage for silicon defects and SER events, respectively, with only a 14% area overhead.

Original document

The different versions of the original document can be found in:

http://www.eecs.umich.edu/~valeria/research/publications/DATE07BulletProof.pdf

http://xplorestaging.ieee.org/ielx5/4211748/4211749/04211959.pdf?arnumber=4211959,

http://dx.doi.org/10.1109/date.2007.364449

https://dblp.uni-trier.de/db/conf/date/date2007.html#MehraraASCBA07,

http://yadda.icm.edu.pl/yadda/element/bwmeta1.element.ieee-000004211959,

https://doi.acm.org/10.1145/1266366.1266614,

https://academic.microsoft.com/#/detail/2157843352

Abstract

Original document

Document information

Document Score

Share this document

Keywords

claim authorship