Published in Computers & Fluids Vol. 81, pp. 134–144, 2013
The rise of GPUs in modern high-performance systems increases the interest in porting portion of codes to such hardware. The current paper aims to explore the performance of a portable state-of-the-art FE solver on GPU accelerators. Performance evaluation is done by comparing with an existing highly-optimized OpenMP version of the solver. Code portability is ensured by writing the program using the OpenCL 1.1 specifications, while performance portability is sought through an optimization step performed at the beginning of the calculations to find out the optimal parameter set for the solver. The results show that the new implementation can be several times faster than the OpenMP version.