Technische Universitt Mnchen
Chip Multicore Processors
Tutorial 4
S. Wallentowitz
Institute for Integrated Systems Theresienstr. 90 Building N1 www.lis.ei.tum.de
Technische Universitt Mnchen
Task 3.2: Locking
#define N 100 int buf[N]; int c=0;
mutex lock;
On the right you find a piece of source code to implement a stack. In a stack, the last written element is read as first (LIFO).
Use the functions mutex_lock(mutex*) and mutex_unlock(mutex*) to make the code thread-safe.
void write(int i) { int tmp=c; while(tmp==N) { tmp=c; } buf[c]=i; c=c+1; } int read() { int r, tmp=c; while(tmp==0) { tmp=c; } r=buf[c]; c=c-1; return r; }
Institute for Integrated Systems
Chip Multicore Processors Tutorial 3 2 S. Wallentowitz
Technische Universitt Mnchen
void write(int i) { int tmp tmp=c; while(tmp==N) { tmp=c; } buf[c]=i; c=c+1; }
Chip Multicore Processors Tutorial 3 3 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
int read() { int r, tmp tmp = c; while(tmp==0) { tmp=c; } r=buf[c]; c=c-1; return r; }
Chip Multicore Processors Tutorial 3 4 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
void write(int i) { int tmp
int read() { int r, tmp tmp = c;
tmp=c;
while(tmp==N) { tmp=c;
while(tmp==0) {
tmp=c; } r=buf[c]; c=c-1; return r; }
}
buf[c]=i; c=c+1; }
Chip Multicore Processors Tutorial 3 5 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
void write(int i) { int tmp
int read() { int r, tmp tmp = c;
tmp=c;
while(tmp==N) { tmp=c;
while(tmp==0) {
tmp=c; } r=buf[c]; c=c-1; return r; }
}
buf[c]=i; c=c+1; }
Chip Multicore Processors Tutorial 3 6 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
void write(int i) { int tmp
int read() { int r, tmp tmp = c;
tmp=c;
while(tmp==N) { tmp=c;
while(tmp==N) {
tmp=c; } r=buf[c]; c=c-1; return r; }
}
buf[c]=i; c=c+1; }
Chip Multicore Processors Tutorial 3 7 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
void write(int i) { int tmp
int read() { int r, tmp tmp = c;
tmp=c;
while(tmp==N) { tmp=c;
while(tmp==N) {
tmp=c; } r=buf[c]; c=c-1; return r; }
}
buf[c]=i; c=c+1; }
Chip Multicore Processors Tutorial 3 8 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Condition Variables
Functions
wait() signal() or broadcast()
wait()
Thread 0
Thread 1
Thread 2
wait() signal()
Wait for condition
Append thread to queue of waiting threads Suspend thread
wait() signal()
wait()
broadcast()
Signal event
Pop thread from queue Re-activate thread
Broadcast
Re-activate all threads in queue
Chip Multicore Processors Tutorial 3 9 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Condition Variables: Example POSIX Thread
Wait
Wait for a certain condition Role of mutex Needs to hold mutex Blocking call, will still hold mutex after call While waiting for the condition: Mutex is released
pthread_cond_wait(cond_t*,mutex_t*) pthread_cond_signal(cond_t*) pthread_cond_broadcast(cond_t*)
Signal
Signal to one waiting thread that the condition holds
Broadcast
Signal to all waiting threads that the condition holds Useful for example when implementing barriers
Chip Multicore Processors Tutorial 3 10 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
#define N 100 int buf[N]; int c=0; mutex lock; void write(int i) { int tmp tmp=c; int read() { int r, tmp tmp=c;
r=buf[c]; buf[c]=i; c=c-1; c=c+1; return r;
Chip Multicore Processors Tutorial 3 11 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Another Example for Conditions: Barrier Implementation
struct { int n_treads; int waiting; condition complete; mutex lock; } barr; void barrier() { mutex_lock(&barr.lock); barr.waiting++; if (barrier.waiting<barrier.n_treads) { cond_wait(&barr.complete,&barr.lock); } else { barr.waiting = 0; cond_broadcast(&barr.complete); } }
Chip Multicore Processors Tutorial 3 12 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Task 4.1: Counter Implementation
You want to share a counter among several threads.
Implement the counter with locks, compare-and-swap and loadlinked/store-conditional. How is each of the implementations characterized?
Chip Multicore Processors Tutorial 3 13 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Starting point: Counter
volatile int count; void countup() { count++; }
volatile int count; void countup() { int tmp; tmp = count; tmp = tmp+1; count = tmp; }
countup: ... R3 <- @(count); R4 <- LD(R3); R4 <- R4 + 1; R4 -> ST(R3) ...
Chip Multicore Processors Tutorial 3 14 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Lock Implementation
volatile int count; void countup() { int tmp; tmp = count; tmp = tmp+1; count = tmp; }
Chip Multicore Processors Tutorial 3 15 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Compare-and-Swap
Similar to Test-and-Set Test for data value and conditionally replace
Read value If value matches compared value: swap If value does not match: leave old value Return previous value
compare swap address
int cas(int* addr, int compare, int swap) { int tmp=*addr; if (tmp==compare) { *addr=swap; } return tmp; }
value
==
Implement non-blocking data structures
Threads always progress Ad-hoc: For specific algorithm Canned: Provided by libraries
Modify
(Bus) Interconnect
Read
data Memory
Write
Chip Multicore Processors Tutorial 3 16 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Compare-and-Swap Implementation
int CAS(int* addr,int compare,int swap);
volatile int count; void countup() { int tmp; tmp = count; tmp = tmp+1; count = tmp; }
Chip Multicore Processors Tutorial 3 17 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Load-Linked/Store-Conditional
Alternative approach: Hardware tracks changes Load-Linked (LL):
Load value and set linked bit Store value when data is unchanged Returns success of store Can also incorrectly fail (implementation dependent): Exceptions, other LL, cache line replacement
LL
value addr
SC
success addr value
Linked bit(s)
Store-Conditional (SC):
set
==1
reset write to addr
(Bus) Interconnect
Often associated to cache line
Requires cache coherency (see module 3) PowerPC, ARM, MIPS, .. Strictly RISC compared to CAS
Institute for Integrated Systems
data Memory
Implemented in most architectures
Chip Multicore Processors Tutorial 3 18 S. Wallentowitz
Technische Universitt Mnchen
LL/SC Implementation
int LL(int* addr); bool SC(int* addr,int value);
volatile int count; void countup() { int tmp; tmp = count;
tmp = tmp+1;
count = tmp; }
Chip Multicore Processors Tutorial 3 19 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
Task 4.2: FiFo Implementation
In this task you will implement a FiFo memory in software.
a) Implement a (capacity bound) FiFo as circular buffer by using an array. b) Allow the parallel access to the FiFo by multiple threads using locks. c) Is it possible to use two locks for parallel access by reading and writing threads? d) Use a (POSIX) condition variable for synchronization. e) Can you implement this data structure in a non-blocking fashion? If not, how can you change the data structure to allow for it? f) Is it possible to implement an (unbounded) FiFo by using a linked list and without locking?
Chip Multicore Processors Tutorial 3 20 S. Wallentowitz
Institute for Integrated Systems
Technische Universitt Mnchen
FiFo Implementation
int data[N]; unsigned int start; unsigned int end; void enqueue(int d) { while(((end+1)%N)==start)) {} data[end] = d; end = (end+1)%N; } int dequeue() { int d; while(end==start) {} d=data[start]; start=(start+1)%N; return d; }
Chip Multicore Processors Tutorial 3 21 S. Wallentowitz
Institute for Integrated Systems