Measuring FLOPS Using Hardware Performance Counter Technologies on LC systems

Latest revision as of 20:55, 5 March 2021

Abstract

FLOPS (FLoating-point Operations Per Second) is a commonly used performance metric for scientific programs that rely heavily on floating-point (FP) calculations. The metric is based on the number of FP operations rather than instructions, thereby facilitating a fair comparison between different machines. A well-known use of this metric is the LINPACK benchmark that is used to generate the Top500 list. It measures how fast a computer solves a dense N by N system of linear equations Ax=b, which requires a known number of FP operations, and reports the result in millions of FP operations per second (MFLOPS). While running a benchmark with known FP workloads can provide insightful information about the efficiency of a machine's FP pipelines in relation to other machines, measuring FLOPS of an arbitrary scientific application in a platform-independent manner is nontrivial. The goal of this paper is twofold. First, we explore the FP microarchitectures of key processors that are underpinning the LC machines. Second, we present the hardware performance monitoring counter-based measurement techniques that a user can use to get the native FLOPS of his or her program, which are practical solutions readily available on LC platforms. By nature, however, these native FLOPS metrics are notmore » directly comparable across different machines mainly because FP operations are not consistent across microarchitectures. Thus, the first goal of this paper represents the base reference by which a user can interpret the measured FLOPS more judiciously.

Original document

The different versions of the original document can be found in:

http://dx.doi.org/10.2172/945513

http://www.osti.gov/scitech/servlets/purl/945513,

https://academic.microsoft.com/#/detail/2085153158

@@ Line 2: / Line 2: @@
 == Abstract ==
-FLOPS (FLoating-point Operations Per Second) is a commonly used performance metric for scientific programs that rely heavily on floating-point (FP) calculations. The metric is based on the number of FP operations rather than instructions, thereby facilitating a fair comparison between different machines. A well-known use of this metric is the LINPACK benchmark that is used to generate the Top500 list. It measures how fast a computer solves a dense N by N system of linear equations Ax=b, which requires a known number of FP operations, and reports the result in millions of FP operations per second (MFLOPS). While running a benchmark with known FP workloads can provide insightful information about the efficiency of a machine's FP pipelines in relation to other machines, measuring FLOPS of an arbitrary scientific application in a platform-independent manner is nontrivial. The goal of this paper is twofold. First, we explore the FP microarchitectures of key processors that are underpinning the LC machines. Second, we present the hardware performance monitoring counter-based measurement techniques that a user can use to get the native FLOPS of his or her program, which are practical solutions readily available on LC platforms. By nature, however, these native FLOPS metrics are notmore » directly comparable across different machines mainly because FP operations are not consistent across microarchitectures. Thus, the first goal of this paper represents the base reference by which a user can interpret the measured FLOPS more judiciously.« le
+FLOPS (FLoating-point Operations Per Second) is a commonly used performance metric for scientific programs that rely heavily on floating-point (FP) calculations. The metric is based on the number of FP operations rather than instructions, thereby facilitating a fair comparison between different machines. A well-known use of this metric is the LINPACK benchmark that is used to generate the Top500 list. It measures how fast a computer solves a dense N by N system of linear equations Ax=b, which requires a known number of FP operations, and reports the result in millions of FP operations per second (MFLOPS). While running a benchmark with known FP workloads can provide insightful information about the efficiency of a machine's FP pipelines in relation to other machines, measuring FLOPS of an arbitrary scientific application in a platform-independent manner is nontrivial. The goal of this paper is twofold. First, we explore the FP microarchitectures of key processors that are underpinning the LC machines. Second, we present the hardware performance monitoring counter-based measurement techniques that a user can use to get the native FLOPS of his or her program, which are practical solutions readily available on LC platforms. By nature, however, these native FLOPS metrics are notmore » directly comparable across different machines mainly because FP operations are not consistent across microarchitectures. Thus, the first goal of this paper represents the base reference by which a user can interpret the measured FLOPS more judiciously.
 == Original document ==

Latest revision as of 20:55, 5 March 2021

Abstract

Original document

Document information

Document Score

Share this document

claim authorship