Power of number algorithm

Another example for recursive call is calculating the power of a number. The recursion requires that formal params,local variables and return address are stored onto the stack for each recursive call. However,  having in mind that RAM is a limited resources in AVR devices it is conceivable why recursive algorithms are not widely accepted in avr assembly programming. There is a need for optimization or twisting the stack frame build up requirement. The following rules will be used in building a recursive avr algorithm

•  Stack frame limit - calculate the maximum stack size in accordance with input params. RAM is a limited resource so beware how deep the rabbit hole may go. Depending on AVR device you may have different stack frame’s size.
• Save in stack frame only the minimum amount of data, use global data whenever possible. This reduce code usage and stack frame size.
• CPU saves the return address on each recursive call so apart from this, frame build up and maintenance is our code responsibility.

Two methods of finding the power of a number are presented. Both assume unsigned byte size input - word size output.

Naive solution

The most straightforward solution to the problem would be to implement  x2=x.x.x.x...x , multiplying x  n times with itself. Following implementation does not even use local stack frame variables. All calculations are done over global ones.

;************************************Power of a number***************
;* Naive slow approach stack usage N*4
;* Unsigned arithmetic
;*@INPUT: xx -> number
;*                  n -> power
;*************************************************************************
naive_power:
//EXIT CONDITION
tst n
brne nextframe
ret
nextframe:
//BUILD NEXT STACK FRAME
dec n
rcall naive_power
//CALCULATE REWINDING THE STACK
clr ah
mov al,xx
movw bh:bl,output3:output4
rcall mul16x16_32   ;no stack manipulation
ret

Optimized solution

Suppose we want to compute xn , where x is a real number and n is any integer. It's easy if n is 0, since x0=1 no matter what x is. That's a good base condition case. Only positive numbers are considered so when you multiply powers of x, you add the exponents: xa+xb=xa+b for any base x and any exponents a and b. Therefore, if n is positive and even, then xn=xn/2 .xn/2. If we were to compute y=xn/2 recursively, then we could compute xn=y.y . What if n is positive and odd? Then xn=xn-1.x and and n-1 either is 0 or is positive and even. We just saw how to compute powers of x when the exponent either is 0 or is positive and even. Therefore, we could compute xn-1 recursively , and then use this result to compute xn=xn-1.x

What about when n is negative? Then xn=1/x-n , and the exponent −n is positive. So we can compute x-n recursively and take its reciprocal.

Putting these observations together, we get the following recursive algorithm for computing xn.

• The base case is when n=0,and x0=1.
•  If n is positive and even, recursively compute y=xn/2 and then xn=y.y. Notice that you can get away with making just one recursive call in this case, computing xn/2 just once, and then you multiply the result of this recursive call by itself.
•  If n is positive and odd, recursively compute xn-1 , so that the exponent either is 0 or is positive and even. Then, xn=xn-1.x.
•  If n is negative, recursively compute x-n , so that the exponent becomes positive. Then, xn=1/x-n.

The following acorn kernel task is based  on positive numbers only. Default task stack is 20 and each stack frame keeps 1 byte and return call of 2 bytes. There is however an internal call to multiply function in each frame after 1 byte is popped off the stack. So total stack frame usage is 4 bytes. Calculating 85 will require (5/2+1)*4=12. The default task stack size is less than 20 bytes so NO increase of the variable TASK_STACK_DEPTH in kernel.inc is required.

;************************************Power of a number***************
;* Optimized fast approach
;* Unsigned arithmetic
;*@INPUT: xx -> number
;*                  n -> power
;*************************************************************************
power:
//EXIT CONDITION
tst n
brne nextframeex
ret
nextframeex:
//BUILD NEXT STACK FRAME
push n
lsr n
rcall power
//CALCULATE REWINDING THE STACK
pop n
sbrc n,0   ;is n odd
rjmp odd
even:
;result*result
movw ah:al,output3:output4
movw bh:bl,output3:output4
rcall mul16x16_32
ret
odd:
;result*result
movw ah:al,output3:output4
movw bh:bl,output3:output4
rcall mul16x16_32
;x*result
clr ah
mov al,xx
movw bh:bl,output3:output4
rcall mul16x16_32
ret

;****************MULTIPLY unsigned 16x16=32bit
;* INPUT= r23:r22 * r21:r20
;* USAGE=r0,r1,r2
;* STACK=0
;* OUTPUT = r19:r18
;*              r17:r16
;******************************************
mul16x16_32:
clr r2
mul r23, r21 ; ah * bh
movw r19:r18, r1:r0
mul r22, r20 ; al * bl
movw r17:r16, r1:r0
mul r23, r20 ; ah * bl
mul r21, r22 ; bh * al
ret

.def output1=r19   ;not used
.def output2=r18   ;not used
.def output3=r17
.def output4=r16

.def    bh=r23
.def    bl=r22

.def    ah=r21
.def    al=r20

.def    n=r24
.def    xx=r25

ldi bl,0
ldi bh,0

ldi output4,1

clr ah
clr al

ldi n,5
ldi xx,8   ;3^6

rcall power
main16:
nop
rjmp main16
ret