Research Article
Multi-GPU Support on Single Node Using Directive-Based Programming Model
Algorithm 6
The worker algorithm for multi-GPU programming in OpenACC.
| (1) function WORKER_ROUTINE | | (2) Create the context for the associated GPU | | (3) pthread_mutex_lock(⋯) | | (4) context_created++; | | (5) while do | | (6) pthread_cond_wait(⋯) ⊳wait until all threads created their contexts | | (7) end while | | (8) pthread_mutex_unlock(⋯) | | (9) if then | | (10) pthread_cond_broadcast(⋯) | | (11) end if | | (12) Enable peer access among all devices | | (13) while (1) do | | (14) | | (15) while do | | (16) | | (17) if then | | (18) | | (19) Synchronize the GPU context ⊳the context is blocked until the device has | | completed all preceding requested tasks | | (20) pthread_exit(NULL) | | (21) end if | | (22) end while | | (23) cur_task = cur_thread queue_head; ⊳fetch the task from the queue head | | (24) cur_thread queue_size−−; | | (25) if then | | (26) cur_thread queue_head = NULL; | | (27) cur_thread queue_tail = NULL; | | (28) else | | (29) cur_thread queue_head = cur_task next; | | (30) end if | | (31) pthread_mutex_unlock(&cur_thread queue_lock); | | (32) cur_task routine((void)cur_task args); ⊳execute the task | | (33) end while | | (34) end function |
|