Research Article
Inastemp: A Novel Intrinsics-as-Template Library for Portable SIMD-Vectorization
Figure 4
Gigaflop per second to compute a general square matrix-matrix product, where the average was taken from three executions. Matrix dimension is in Double and in Float. () These executions use a simpler blocking scheme that shows better performance for the respective configurations (Xl-P8-OP Figure 4(f)).
| (a) Gcc-I3-PC |
| (b) Clang-I3-PC |
| (c) Gcc-IX-HPC |
| (d) Intel-IX-HPC |
| (e) Gcc-P8-OP |
| (f) Xl-P8-OP |
| (g) Gcc-KNL |
| (h) Intel-KNL |