Abstract

Processor designers face the challenge of defect formation, leading to permanent faults, during fabrication and operation. Permanent or hard fault tolerance is an important problem in computing systems, solutions to which can help improve yield during fabrication and reduce the cost of transistor mortality during the service life of the processor. This paper presents PreFix, a method to handle hard errors to keep a faulty core running and correctly executing instructions. Instead of turning off faulty structures, PreFix predicts early on whether an instruction is likely to use faulty components, then refines this prediction later in the pipeline to actually detect when an error has occurred. Instructions marked as possibly- faulty in the front-end are queued for duplicate execution on a separate core. At commit, results from the original and duplicate instructions are compared. Upon a mismatch, the original instruction is patched up, the pipeline flushed and execution continues. Using PreFix, faulty components can continue performing useful work when their errors do not manifest in architecturally visible state changes. This enhances processor lifetime with minimal performance overhead.


Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.1109/dft.2017.8244459
https://dblp.uni-trier.de/db/conf/dft/dft2017.html#Soman017,
http://doi.ieeecomputersociety.org/10.1109/DFT.2017.8244459,
https://academic.microsoft.com/#/detail/2783888117


DOIS: 10.17863/cam.21614 10.1109/dft.2017.8244459

Back to Top

Document information

Published on 01/01/2017

Volume 2017, 2017
DOI: 10.17863/cam.21614
Licence: CC BY-NC-SA license

Document Score

0

Views 0
Recommendations 0

Share this document

Keywords

claim authorship

Are you one of the authors of this document?