Research Article
Designing Deep Learning Hardware Accelerator and Efficiency Evaluation
Table 3
The comparative evaluation of the existing and the proposed implementation schemes.
| Experimental platform | CPU | GPU | FPGA [6] | FPGA (proposed) |
| Platform configuration | i5–10400F | GTX 1660Ti | V6-690T | Xilinx Kintex-7 | Data type | Fp32 | Fp32 | Fix16 | Fix16/Fp32 | Clock frequency (MHz) | 4300 | 1845 | — | 1818 | Execution time (s) | 176.2 | 3.9 | — | 20.3 | Energy consumption (W) | 65 | 120 | 25.6 | 23.3 | Throughput (GOPS) | 1.359 | 117.4 | 41.32 | 76.19 | Energy efficiency (GOPS/w) | 0.0209 | 0.978 | 1.65 | 3.27 | Speedup | — | 40.98 | — | 8.67 |
|
|