Practical applicability of optimizations and performance models to complex stencil-based loop kernels in CFD

Abstract

This work investigates the application and interaction of optimization techniques and performance models in a computational fluid dynamics (CFD) approach employing an OpenMP parallelized, explicit, weakly compressible, finite difference–based solver for the incompressible Navier–Stokes equations using a five-point wide stencil. The presented loop and stencil optimizations lead to a 6.8× increase in per core throughput. In order to verify optimal CPU utilization, performance models are applied to the tuned code. Three different performance models are considered: a roofline-based model, utilizing purely theoretical figures, one which is enhanced by measurements, and the execution cache memory model. It is shown that the models provide reliable estimates for simple benchmarks, such as seven-point stencils for scalar Laplacians, but the estimate quality is significantly worse for the complex and tuned stencil. While it is possible to include even more details in the model, it eventually leads to a state in which it purely reproduces the benchmarks from which it was derived. Thus, the applied general-purpose performance models are found to inaccurately predict the actual performance. They overestimate the achievable performance by more than about 97% for highly tuned code. Through further code tuning, 66% of the predicted performance could be achieved.

Abstract

Document information

Document Score

Share this document

claim authorship