Cyclic Barrier

Cyclic Barrier

Basics
A synchronization primitive that allows a set of threads to all wait for each other to reach a common barrier point. Once all threads have reached the barrier, they are all permitted to continue past the barrier. The barrier is called cyclic because it can be re-used after the waiting threads are released.
One important feature of the barrier is its re usability - its ability to be used over and over again to synchronize the threads and to divide the computation into many well defined phases.

High level structure - 1)the four tasks approach the barrier; 2)all except C arrive and have to wait;3)once all tasks including C have reached the barrier,they continue past the barrier.

RAM structure
The implementation given below uses an atomic counter which is initially set to zero. An atomic counter supports two atomic operations: incrementing the counter by 1,and decrementing the counter by 1.Since 8 bit AVR MCU does not have read-modify-write instruction we will simulate atomicity through disable/enable the global interrupt flag(terrible).
One byte of RAM is needed to store the synchronization artifacts - the go flag and atomic counter.

{G} - go bit - constitutes the barrier re usability and is part of the synchronization predicate.
{C} - counter bits - up to 127 tasks could be synchronized by the barrier.

Simple Counter Barrier

Let me present a simple barrier based on the Counter algorithm. The counter is initially set to 0. As soon as a task reaches the barrier, it increments the counter by one and waits for the predicate - when the value of the counter reaches N , all N tasks has finished the current phase and have reached the barrier. The last task to increment the counter signals to other tasks that they may continue to run past the barrier and it resets to 0 the counter so that the barrier can be reused in the future.Waiting and signaling are done as follows: There is a single bit, called GO , and all the waiting tasks spin on this bit waiting for its value to be changed(the so called predicate). The last task to reach the barrier terminates the spin with a single write operation which toggles the value of the GO bit.

Implementation
What follows is a simple implementation of the Counter algorithm in AVR assembler.

1.    push temp
2.    push r17

;INIT
3.    cli
4.    lds temp,@0             ;local storage in temp
5.    mov r17,temp
6.    andi r17,0x7F           ;isolate number
7.    inc r17
8.    sbrc temp,GO_BIT
9.    sbr r17,(1<
10.  sts @0,r17
11.  sei

;NOTIFY if predicate holds
12.    andi r17,0x7F            ;isolate number
13.    cpi r17,@1
14.    brne wait_go

15.    com temp                 ;toggle go bit
16.    andi temp,(1<
17.    sts @0,temp
18.    rjmp go_to_exit

;WAIT
;temp holds the old GO bit
wait_go:
19.    andi temp,(1<
loop_go:
21.    lds r17,@0
22.    andi r17,(1<

23.       cp temp,r17
24.    breq loop_go           ;same bit -> go on looping, else leave barrier

go_to_exit:
25.    pop r17
26.    pop temp

Needless to say - registers to be used must be preserved.
Portion {3-11} is the so called INIT part - the place where each task increments the barrier counter as soon as it reaches the barrier point place.
Portion {12-14} checks if this is the last task - the incremented number is compared against the N variable passed to the barrier macro,if it is the Nth task toggle the GO bit{15-18} and exit the barrier otherwise {19-24} busy wait on the GO bit.

Cyclic Barrier could be used in each version of the kernel, provided that there is 1 byte of extra RAM for it.
The cyclic barrier will be available in the next kernel release(2.1 and higher). In the context of ACORN kernel terms thread and task are used interchangeably.