Abstract

Techniques used to implement an unstructured grid solver on modern graphics hardware are described. The three‐dimensional Euler equations for inviscid, compressible flow are considered. Effective memory bandwidth is improved by reducing total global memory access and overlapping redundant computation, as well as using an appropriate numbering scheme and data layout. The applicability of per‐block shared memory is also considered. The performance of the solver is demonstrated on two benchmark cases: a NACA0012 wing and a missile. For a variety of mesh sizes, an average speed‐up factor of roughly 9.5 × is observed over the equivalent parallelized OpenMP code running on a quad‐core CPU, and roughly 33 × over the equivalent code running in serial.

Full Document

The PDF file did not load properly or your web browser does not support viewing PDF files. Download directly to your device: Download PDF document
Back to Top

Document information

Published on 01/01/2011

DOI: 10.1002/fld.2254
Licence: CC BY-NC-SA license

Document Score

0

Views 5
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?