Power of number algorithm

Another example for recursive call is calculating the power of a number. The recursion requires that formal params,local variables and return address are stored onto the stack for each recursive call. However, having in mind that RAM is a limited resources in AVR devices it is conceivable why recursive algorithms are not widely accepted in avr assembly programming. There is a need for optimization or twisting the stack frame build up requirement. The following rules will be used in building a recursive avr algorithm

- Stack frame limit - calculate the maximum stack size in accordance with input params. RAM is a limited resource so beware how deep the rabbit hole may go. Depending on AVR device you may have different stack frame’s size.
- Save in stack frame only the minimum amount of data, use global data whenever possible. This reduce code usage and stack frame size.
- CPU saves the return address on each recursive call so apart from this, frame build up and maintenance is our code responsibility.

Two methods of finding the power of a number are presented. Both assume unsigned byte size input - word size output.

**Naive solution**

The most straightforward solution to the problem would be to implement x^{2}=x.x.x.x...x , multiplying x n times with itself. Following implementation does not even use local stack frame variables. All calculations are done over global ones.

;************************************Power of a number***************

;* Naive slow approach stack usage N*4

;* Unsigned arithmetic

;*@INPUT: xx -> number

;* n -> power

;*************************************************************************

naive_power:

//EXIT CONDITION

tst n

brne nextframe

ret

nextframe:

//BUILD NEXT STACK FRAME

dec n

rcall naive_power

//CALCULATE REWINDING THE STACK

clr ah

mov al,xx

movw bh:bl,output3:output4

rcall mul16x16_32 ;no stack manipulation

ret

**Optimized solution**

Suppose we want to compute x^{n}, where x is a real number and n is any integer. It's easy if n is 0, since x^{0}=1 no matter what x is. That's a good base condition case. Only positive numbers are considered so when you multiply powers of x, you add the exponents: x^{a}+x^{b}=x^{a+b} for any base x and any exponents a and b. Therefore, if n is positive and even, then x^{n}=x^{n/2}.x^{n/2}. If we were to compute y=x^{n/2} recursively, then we could compute x^{n}=y.y . What if n is positive and odd? Then x^{n}=x^{n-1}.x and and n-1 either is 0 or is positive and even. We just saw how to compute powers of x when the exponent either is 0 or is positive and even. Therefore, we could compute x^{n-1} recursively , and then use this result to compute x^{n}=x^{n-1}.x

What about when n is negative? Then x^{n}=1/x^{-n} , and the exponent −n is positive. So we can compute x^{-n} recursively and take its reciprocal.

Putting these observations together, we get the following recursive algorithm for computing x^{n}.

- The base case is when n=0,and x
^{0}=1. - If n is positive and even, recursively compute y=x
^{n/2}and then x^{n}=y.y. Notice that you can get away with making just one recursive call in this case, computing x^{n/2}just once, and then you multiply the result of this recursive call by itself. - If n is positive and odd, recursively compute x
^{n-1}, so that the exponent either is 0 or is positive and even. Then, x^{n}=x^{n-1}.x. - If n is negative, recursively compute x
^{-n}, so that the exponent becomes positive. Then, x^{n}=1/x^{-n}.

The following acorn kernel task is based on positive numbers only. Default task stack is 20 and each stack frame keeps 1 byte and return call of 2 bytes. There is however an internal call to multiply function in each frame after 1 byte is popped off the stack. So total stack frame usage is 4 bytes. Calculating 8^{5 }will require (5/2+1)*4=12. The default task stack size is less than 20 bytes so NO increase of the variable TASK_STACK_DEPTH in kernel.inc is required.

;************************************Power of a number***************

;* Optimized fast approach

;* Unsigned arithmetic

;*@INPUT: xx -> number

;* n -> power

;*************************************************************************

power:

//EXIT CONDITION

tst n

brne nextframeex

ret

nextframeex:

//BUILD NEXT STACK FRAME

push n

lsr n

rcall power

//CALCULATE REWINDING THE STACK

pop n

sbrc n,0 ;is n odd

rjmp odd

even:

;result*result

movw ah:al,output3:output4

movw bh:bl,output3:output4

rcall mul16x16_32

ret

odd:

;result*result

movw ah:al,output3:output4

movw bh:bl,output3:output4

rcall mul16x16_32

;x*result

clr ah

mov al,xx

movw bh:bl,output3:output4

rcall mul16x16_32

ret

;****************MULTIPLY unsigned 16x16=32bit

;* INPUT= r23:r22 * r21:r20

;* USAGE=r0,r1,r2

;* STACK=0

;* OUTPUT = r19:r18

;* r17:r16

;******************************************

mul16x16_32:

clr r2

mul r23, r21 ; ah * bh

movw r19:r18, r1:r0

mul r22, r20 ; al * bl

movw r17:r16, r1:r0

mul r23, r20 ; ah * bl

add r17, r0

adc r18, r1

adc r19, r2

mul r21, r22 ; bh * al

add r17, r0

adc r18, r1

adc r19, r2

ret

.def output1=r19 ;not used

.def output2=r18 ;not used

.def output3=r17

.def output4=r16

.def bh=r23

.def bl=r22

.def ah=r21

.def al=r20

.def n=r24

.def xx=r25

Task_16:

ldi bl,0

ldi bh,0

ldi output4,1

clr ah

clr al

ldi n,5

ldi xx,8 ;3^6

rcall power

main16:

nop

rjmp main16

ret