Research Article

CaLRS: A Critical-Aware Shared LLC Request Scheduling Algorithm on GPGPU

Table 1

Other configurations of baseline GPU.

# of SMs30 (15 clusters of 2)

SM configuration1400 Mhz, Reg #: 32K, SIMD Width: 16, warp: 32 threads, and max threads per SM: 1536
Branching handlingPDOM based method [3]
Warp schedulingGreedy-then-oldest (GTO) [4]
Private L1 caches16 KB $L1D, 8 KB $Const, 12 KB $Texture, and 2 KB $L1I
Scratchpad memory48 KB
InterconnectButterfly, 1400 Mhz, 32 B width
# of LLC banks6 (= #of memory partitions)
LLC bank controllerFirst-in-first-out (FIFO)
LLC unified cache768 KB, 128 B line, and 8-way
Min. LLC latency120 cycles (compute core clock)
Memory controllerOut-of-order (FR-FCFS), max request queue length: 32
GDDR5 timing (from Hynix H5GQ1H24AFR) = 12, = 12, = 12, = 40, = 28, = 12, = 6, and = 5, = 12
Min. DRAM latency460 cycles (compute core clock)