Research Article
CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU
Table 1
Other configurations of baseline GPU.
| # of SMs | 30 (15 clusters of 2) |
| SM configuration | 1400 Mhz, Reg #: 32K, SIMD Width: 16, warp: 32 threads, and max threads per SM: 1536 | Branching handling | PDOM based method [3] | Warp scheduling | Greedy-then-oldest (GTO) [4] | Private L1 caches | 16 KB $L1D, 8 KB $Const, 12 KB $Texture, and 2 KB $L1I | Scratchpad memory | 48 KB | Interconnect | Butterfly, 1400 Mhz, 32 B width | # of LLC banks | 6 (= #of memory partitions) | LLC bank controller | First-in-first-out (FIFO) | LLC unified cache | 768 KB, 128 B line, and 8-way | Min. LLC latency | 120 cycles (compute core clock) | Memory controller | Out-of-order (FR-FCFS), max request queue length: 32 | GDDR5 timing (from Hynix H5GQ1H24AFR) | = 12, = 12, = 12, = 40, = 28, = 12, = 6, and = 5, = 12 | Min. DRAM latency | 460 cycles (compute core clock) |
|
|