In the ever-changing High-Performance Computing environment, porting and tuning applications to new platforms is of paramount importance but tedious and costly in terms of human resources. While solutions start to emerge to abstract most of the underlying programming models, the tuning of computing kernels is the work of highly trained specialists. An efficient computing kernel is often the combination of insights and techniques specific to the targeted architecture. The performance portability, and someti mes compatibility, of these specialized versions are often poor.
One way to tackle this problem is by using meta-programming approaches allowing the specialist to express optimization strategies and techniques in an orthogonal manner. Coupled with a runtime supporting most programming models found in HPC, different versions can be generated and benchmarked. The design space can then be explored to find the most suitable version for the target architecture. I will discuss how I developed and used BOAST, an autotuning framework, to address these challenges in several HPC applications, ported to OpenCL or vector architectures (AVX, NEON).