Base64

Base64 is an encoding scheme which transforms binary data into ASCII character stream. In other words - binary data consists of bytes holding 8 meaningful(carrying information) bits  whereas ASCII character stream consists of bytes holding 7 meaningful(carrying information) bits. It is commonly used when there is a need to encode binary data that needs to be stored and transferred over media that is designed to deal with textual data.
The process of transformation is described by converting the binary stream “HELLO” into its Base64 encoded representation. Needless to say the stream could include bytes of any value,but I have chosen printable characters for better visual representation.
The first thing to do is splitting the stream into chunks of 3 bytes each. If the stream length is Lin, it is obvious that the whole stream consists of Lin/3 chunks if Lin is divisible by 3.
If the stream length is not divisible by 3 - which is very often the case - the padding/addition of one or two extra bytes is  done thus the following formula:

The stream “HELLO”’s length is not divisible by 3 so extra byte/s will be added to get 2 chunks of bytes to process at the end {‘HEL’,’LO?’}. Later we will see what stays in place of `?` symbol.

Base64 encoding.
The basic steps to convert a 3 byte input chunk to a 4 byte output chunk is as follows:

  1.  Get byte content (use ASCII table for characters)
  2. Join together the 3 bytes’ bits as one 24 bit octet.
  3. Split the 24 bit octet into 4 groups of 6 bits.
  4. From left to right convert each group of 6 bits to their corresponding value according to index transformation table fig.1
  5. Add the newly built base64 encoded block to the output.


Fig.2 is a visual representation of the first chunk base64 encoding, according to the above algorithm.

When the number of bytes to encode is not divisible by three (that is, if there are only one or two bytes of input for the last 24-bit block), then the padding action is performed:
Extra bytes with value zero are added so there are three bytes, and then the conversion to base64 is done.

If there was only one significant input byte, only the first two base64 digits are picked (12 bits), and if there were two significant input bytes, the first three base64 digits are picked (18 bits). '=' characters might be added to make the last block contain four base64 characters.


Padding.
The '==' sequence at the end of base64 encoded output stream indicates that the last group of the input contained only one byte(2 bytes short of the 3 byte size block), and '=' indicates that it contained only two bytes(1 byte short of the 3 byte size block).
Let us get back to our example!
The second chunk to encode of our example {‘LO’} is one byte short to the complete 3 byte size block  so an extra byte of zero will be added.
The basic steps to convert an incomplete 3 byte input chunk to a 4 byte output chunk is as follows:

  1. Fill the missing bytes with 0. If the input chunk has one byte - add 2 extra bytes with 0, if  input chunk has 2 bytes - add 1 extra byte with 0.
  2. Get the bytes content (use ASCII table for characters)
  3. Join together the 3 bytes’ bits as one 24 bit octet.
  4. Split the 24 bit octet into 4 groups of 6 bits.
  5. From left to right convert each group of 6 bits to their corresponding value according to index transformation table fig.1
  6. Add the newly built base64 encoded block to the output.


Fig.3 is a visual representation of the last chunk base64 encoding that is not divisible by 3, according to the above algorithm.

Result.
It is obvious the the input stream {HELLO} base64 encodes to {SEVMTE8=}

Output Length.
The number of base 64 encoded output bytes is calculated by the following formula, where Lin is the input stream length in bytes.

It is obvious that base64 encoding produces larger output stream in bytes compared to the input stream size.

Base64 decoding.
When decoding Base64 stream, four bytes are typically converted back to three bytes. The only exceptions are when padding characters exist. A single '=' indicates that the four bytes will decode to only two bytes, while '==' indicates that the four bytes will decode to only a single byte.
In theory, the padding character is not needed for decoding, since the number of missing bytes can be calculated from the number of Base64 digits.
I have included base64 encoding and decoding as well as length calculation applied on a stream of less than 192 bytes.

Base64 avr assembler implementation.

/*
 *  BASE64
 *  Dynamic input stream size implementation.
 *  Process byte stream of length up to ((x/3)*4)<256
 *  Author: Sergey Iliev from Vakarel
 */

;MIND RAM memory size
.dseg

input:  .byte 256  ;up to 256

input_size: .byte 1   ;input size is a parameter

block: .byte 3

.org 0x00F1
decodeoutput:  .byte 256 ;up to 256 bytes

.org 0x019F
encodeoutput: .byte 256   ;up to 256 bytes


.cseg

.MACRO ADD16
    add @0,@2 ;Add low bytes
    adc @1,@3 ;Add high bytes with carry
.ENDMACRO


Base64 encoding.

/*********************ENCODE 64***********************************************************
;@INPUT: Z input buffer to encode
        ;cnt input buffer size
        ;Y 3 byte block buffer pointer
        ;X output buffer pointer
;@USAGE: cnt,ax,bx,cx
;@OUTPUT: output buffer base64 encoded
*******************************************************************************************/
base64encodeex:
  ldi    YL,low(block)
  ldi    YH,high(block)    

  ldi    XL,low(encodeoutput)
  ldi    XH,high(encodeoutput)    

  clr bx
e64loop:
  ld ax,Z+
  st Y+,ax
  inc bx
  ;i==3?
  ;is block full with next 3 bytes to process
  cpi bx,3
  brne e64loopa

  ldi    YL,low(block)
  ldi    YH,high(block)    
 
  ;(block_3[0] & 0xfc) >> 2;
  ld ax,Y
  andi ax,0xfc
  lsr ax
  lsr ax

  ;transform ax
  rcall getcharset64
  st X+,ax
          
  ;block_4[1] = ((block_3[0] & 0x03) << 4) + ((block_3[1] & 0xf0) >> 4);
  ld ax,Y
  andi ax,0x03
  lsl ax
  lsl ax
  lsl ax
  lsl ax

  ldd cx,Y+1
  andi cx,0xf0
  lsr cx
  lsr cx
  lsr cx
  lsr cx

  or ax,cx
  ;transform ax
  rcall getcharset64
  st X+,ax
   
  ;block_4[2] = ((block_3[1] & 0x0f) << 2) + ((block_3[2] & 0xc0) >> 6);
  ldd ax,Y+1
  andi ax,0x0f
  lsl ax
  lsl ax

  ldd cx,Y+2
  andi cx,0xC0
  lsr cx
  lsr cx
  lsr cx
  lsr cx
  lsr cx
  lsr cx

  or ax,cx
  ;transform ax
  rcall getcharset64
  st X+,ax

  ;block_4[3] = block_3[2] & 0x3f;
  ldd ax,Y+2
  andi ax,0x3F
  ;transform ax
  rcall getcharset64
  st X+,ax
  ;i=0;
  clr bx
e64loopa:
  dec cnt
  tst cnt
  brne e64loop
 
  ;are there left bytes (input size not devisable by 3)
  tst bx
  breq e64exit    ;no left over
  ;fill up the missing bytes round to 3 with '0'
  mov ax,bx
  ldi cx,0
e64loopb:
  inc ax
  st Y+,cx   ;set to 0
  cpi ax,3
  brne e64loopb
 
 ;---------------convert-------------------------------------------
  ldi    YL,low(block)
  ldi    YH,high(block)    
 
  ;(block_3[0] & 0xfc) >> 2;
  ld ax,Y
  andi ax,0xfc
  lsr ax
  lsr ax

  ;transform ax
  rcall getcharset64
  st X+,ax
          
  ;block_4[1] = ((block_3[0] & 0x03) << 4) + ((block_3[1] & 0xf0) >> 4);
  ld ax,Y
  andi ax,0x03
  lsl ax
  lsl ax
  lsl ax
  lsl ax

  ldd cx,Y+1
  andi cx,0xf0
  lsr cx
  lsr cx
  lsr cx
  lsr cx

  or ax,cx
  ;transform ax
  rcall getcharset64
  st X+,ax
   
  ;block_4[2] = ((block_3[1] & 0x0f) << 2) + ((block_3[2] & 0xc0) >> 6);
  ldd ax,Y+1
  andi ax,0x0f
  lsl ax
  lsl ax

  ldd cx,Y+2
  andi cx,0xC0
  lsr cx
  lsr cx
  lsr cx
  lsr cx
  lsr cx
  lsr cx

  or ax,cx
  ;transform ax
  rcall getcharset64
  st X+,ax

  ;block_4[3] = block_3[2] & 0x3f;
  ldd ax,Y+2
  andi ax,0x3F
  ;transform ax
  rcall getcharset64
  st X+,ax

  ;pad with '='
  ldi ax,'='  
e64loopc:  
  inc bx
  st -X,ax
  cpi bx,3
  brne e64loopc

e64exit:
ret


Base64 decoding.

/*********************DECODE 64***********************************************************
;@INPUT: Z input buffer to encode
        ;Y 3 byte block buffer pointer
        ;X output buffer pointer
        ;cnt - length of the stream
;@USAGE: cnt,ax,bx,cx
;@OUTPUT: output buffer base64 decoded.
;@WARNING: there is no check that the input buffer to decode consists of base64 complient characters.
*******************************************************************************************/

base64decodeex:  
  ldi    YL,low(block)
  ldi    YH,high(block)    

  ldi    XL,low(decodeoutput)
  ldi    XH,high(decodeoutput)    

  clr bx
d64loop:
  ld ax,Z+
  //transform back
  rcall findindex64
  st Y+,ax
  inc bx
  ;i==4?
  ;is block full with next 4 bytes to process
  cpi bx,4
  brne d64loopa

  ldi    YL,low(block)
  ldi    YH,high(block)

  ;block_3[0] = (block_4[0] << 2) + ((block_4[1] & 0x30) >> 4);
  ld ax,Y
  lsl ax
  lsl ax

  ldd cx,Y+1
  andi cx,0x30
  lsr cx
  lsr cx
  lsr cx
  lsr cx

  or ax,cx
  st X+,ax
 
  ;block_3[1] = ((block_4[1] & 0xf) << 4) + ((block_4[2] & 0x3c) >> 2);   
  ldd ax,Y+1
  andi ax,0x0F
  lsl ax
  lsl ax
  lsl ax
  lsl ax

  ldd cx,Y+2
  andi cx,0x3C
  lsr cx
  lsr cx

  or ax,cx
  st X+,ax
 
  ;block_3[2] = ((block_4[2] & 0x3) << 6) + block_4[3];
  ldd ax,Y+2
  andi ax,0x03
  lsl ax
  lsl ax
  lsl ax
  lsl ax
  lsl ax
  lsl ax


  ldd cx,Y+3

  or ax,cx
  st X+,ax

  ;i=0;
  clr bx
d64loopa:
  dec cnt
  tst cnt        ;finished with input?
  breq d64loopb
  ld ax,Z
  cpi ax,'='     ;padded bytes reached
  breq d64loopb
  rjmp d64loop   
d64loopb:
   
  ;are there left bytes (input size not devisable by 4)
  tst bx
  breq d64exit    ;no left over
  ;fill up the missing bytes round to 4 with '0'
  mov ax,bx
  ldi cx,0
d64loopc:
  inc ax
  st Y+,cx   ;set to 0
  cpi ax,4
  brne d64loopc
;----convert-----
  ldi    YL,low(block)
  ldi    YH,high(block)

  ;1st byte is always present. The padded chunk has '=='.
  ;block_3[0] = (block_4[0] << 2) + ((block_4[1] & 0x30) >> 4);
  ld ax,Y
  lsl ax
  lsl ax

  ldd cx,Y+1
  andi cx,0x30
  lsr cx
  lsr cx
  lsr cx
  lsr cx

  or ax,cx
  st X+,ax
  ;2 byte could be present but could be not.The padded chunk had one '='
  ;If not present this will add 0 to the decoded stream.CAN YOU LIVE WITH IT?
  ;block_3[1] = ((block_4[1] & 0xf) << 4) + ((block_4[2] & 0x3c) >> 2);   
  ldd ax,Y+1
  andi ax,0x0F
  lsl ax
  lsl ax
  lsl ax
  lsl ax

  ldd cx,Y+2
  andi cx,0x3C
  lsr cx
  lsr cx

  or ax,cx
  st X+,ax
 
  ;3th byte is never part of the decoded stream, when it comes to decoding the padded chunk.
  ;block_3[2] = ((block_4[2] & 0x3) << 6) + block_4[3];
  ;ldd ax,Y+2
  ;andi ax,0x03
  ;lsl ax
  ;lsl ax
  ;lsl ax
  ;lsl ax
  ;lsl ax
  ;lsl ax


  ;ldd cx,Y+3

  ;or ax,cx
  ;st X+,ax


d64exit:
ret


Helper functions.

//----get char value from index using base64 transformation table---
;@INPUT ax ->from [0 to 63] index
;@USE  Z,r0
;@OUTPUT ax
getcharset64:
    push ZH
    push ZL
    ldi    ZH,high(charset64*2)
    ldi    ZL,low(charset64*2)    ;Z-pointer <- ROM table end + 1
    clr r0
    ADD16 ZL,ZH,ax,r0    
    lpm
    mov    ax,r0
    pop ZL
    pop ZH        
ret

//----get index from a char value using base64 transformation table------
;@INPUT ax
;@USE  Z,r0
;@OUTPUT ax ->from [0 to 63]
;@WARNING:input must be a charecter from the charset64.
findindex64:
    push ZH
    push ZL
    push cnt
    ldi    ZH,high(charset64*2)
    ldi    ZL,low(charset64*2)    ;Z-pointer <- ROM table end + 1
    clr cnt
fl:  
       lpm
    cp ax,r0
    breq flexit
    adiw Z,1    
    inc cnt
    cpi cnt,64    ;should limit the looping
    brne fl    
flexit:    
    mov ax,cnt    ;set index result    
    pop cnt
    pop ZL
    pop ZH        
ret

//-------------------get output size of base64 encoding----------
;we are doing a byte arithmetic so input size X must be (X/3)*4<256 which is X<192
;@INPUT:  cnt - input size
;OUTPUT:  dres8u - size of base64 encoded stream

getbase64size:
    mov dd8u,cnt
    ldi dv8u,3
    clr drem8u
    
    rcall div8u
    
    tst drem8u   ;is it devisable by 3 or not
    breq gb64size
    inc dres8u
gb64size:        ;multiply by 4
    lsl dres8u
    lsl dres8u
ret
    

;***************************************************************************
;*
;* "div8u" - 8/8 Bit Unsigned Division
;*
;* This subroutine divides the two register variables "dd8u" (dividend) and
;* "dv8u" (divisor). The result is placed in "dres8u" and the remainder in
;* "drem8u".
;*  
;* Number of words    :14
;* Number of cycles    :97
;* Low registers used    :1 (drem8u)
;* High registers used  :3 (dres8u/dd8u,dv8u,dcnt8u)
;*
;***************************************************************************

;***** Subroutine Register Variables

.def    drem8u    =r22        ;remainder
.def    dres8u    =r23        ;result
.def    dd8u    =r23        ;dividend
.def    dv8u    =r24        ;divisor
.def    dcnt8u    =r25        ;loop counter

;***** Code

div8u:    
    sub    drem8u,drem8u    ;clear remainder and carry
    ldi    dcnt8u,9    ;init loop counter
d8u_1:    rol    dd8u        ;shift left dividend
    dec    dcnt8u        ;decrement counter
    brne    d8u_2        ;if done
    ret            ;    return

d8u_2:    rol    drem8u        ;shift dividend into remainder
    sub    drem8u,dv8u    ;remainder = remainder - divisor
    brcc    d8u_3        ;if result negative
    add    drem8u,dv8u    ;    restore remainder
    clc            ;    clear carry to be shifted into result
    rjmp    d8u_1        ;else
d8u_3:    sec            ;    set carry to be shifted into result
    rjmp    d8u_1



charset64: .db "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"