Optimized inner loop of bit-at-a-time Huffman decoder is indistinguishable from magic:
106: 22 0f add r18, r18
108: 33 1f adc r19, r19
10a: 37 1b sub r19, r23
10c: 11 97 sbiw r26, 0x01 ; 1
10e: fd 01 movw r30, r26
110: 54 91 lpm r21, Z
112: 75 0f add r23, r21
114: 4f 5f subi r20, 0xFF ; 255
116: 48 30 cpi r20, 0x08 ; 8
118: 58 f4 brcc .+22 ; 0x130
11a: 37 17 cp r19, r23
11c: a0 f7 brcc .-24 ; 0x106
Tree format
A header of 8-bit codeword counts per bit length in reverse order followed by a table of fixed-length values in codebook order. The header specifies the canonical huffman code; explicit codewords are not stored. The file header points at the beginning of the value table; codeword counts are found by decrementing that pointer (hence reverse order).
Example: Given this code:
0 = 4
10 = 5
110 = 1
111 = 2
The tree would be stored as {2,1,1,4,5,1,2} with its file pointer pointing at the first value byte (4).
Decoding the tree
The bit-at-time decoder works by reconstructing the last code in the codebook for each bit length and comparing it to the decoded codeword so far. It stops when the decoded codeword is less than the last codeword for that bit length. The result is an index into the value table computed from the codeword and the number of codewords one shorter than the resulting codeword.
This can be accomplished with the following psuedocode:
codeword = 0, max_codeword = 0, num_codewords_so_far = 0, n = 0
while True:
max_codeword = max_codeword << 1
codeword = (codeword << 1) + getbit(n)
num_codewords_this_length = codewords_of_length(n)
if codeword < max_codeword + num_codewords_this_length:
return num_codewords_so_far + codeword - max_codeword
num_codewords_so_far += num_codewords_this_length
max_codeword += num_codewords_this_length
n = n + 1
This can be simplified algebraically by subtracting num_codewords_so_far from both max_codeword and codeword just before each comparison in the loop. This doesn't affect the outcome of the comparison.
adj_codeword = 0, adj_max_codeword = 0, num_codewords_so_far = 0, n = 0
while True:
num_codewords_this_length = codewords_of_length(n)
adj_max_codeword = (adj_max_codeword << 1) - num_codewords_so_far
adj_codeword = (adj_codeword << 1) + getbit(n) - num_codewords_so_far
if adj_codeword < adj_max_codeword + num_codewords_this_length:
return num_codewords_so_far + adj_codeword - adj_max_codeword
num_codewords_so_far += num_codewords_this_length
adj_max_codeword += num_codewords_this_length
n = n + 1
However after this transformation adj_max_codeword always equals num_codewords_so_far, so we can eliminate the redundant variable and greatly simplify the loop.
adj_codeword = 0, num_codewords_so_far = 0, n = 0
while True:
num_codewords_this_length = codewords_of_length(n)
adj_codeword = (adj_codeword << 1) + getbit(n) - num_codewords_this_length
num_codewords_so_far += num_codewords_this_length
if adj_codeword < num_codewords_so_far:
return adj_codeword
n = n + 1
Assembly code
r18 = bits from bitstream
r19 = adjusted codeword (index into the codebook)
r20 = number of bits shifted from r18
r21 = number of codewords at this length
r23 = sum number of codewords up to this length
r26:r27 = pointer to table of codewords per length (reverse order)
It works out to 16 cycles per bit (15 when it terminates; and some more when r18 runs out if bits).
The add/adc consumes one bit from the bitstream by shifting: r18 bit 7 is shifted into r19 bit 0. The subtraction at 10a subtract the number of total codewords in the codebook previous so that r19 is an index into the codebook. If it is less than the total number of codewords at the length, the complete codeword is decoded. Otherwise the offset points at a longer codeword and more bits are needed.
The branch at 118 is a subroutine that gets more bits from the bitstream loading them into r18 and setting r20=0. That routine jumps to 11a when it's done.
Examples
Lengths = 0, 2, 0, 3 (not reversed)
Codebook:
00
01
1000
1001
1010
Say the input bits are 1001. After each loop:
r19 = 0 * 2 + 1 - 0 = 1 not less than 0
r19 = 1 * 2 + 0 - 0 = 2 not less than 2
r19 = 2 * 2 + 0 - 2 = 2 not less than 2
r19 = 2 * 2 + 1 - 2 = 3 is less than 5
The loop terminates and r19 = 3 is the number of codewords in the codebook before 1001. This is the offset into the table of values indexed by codeword position in the codebook.
Same but for input 01:
r19 = 0 * 2 + 0 - 0 = 0 not less than 0
r19 = 0 * 2 + 1 - 0 = 1 is less than 1
After the loop exits r19 = 1 is the number of codewords in the codebook before 01.