My method was to read two bits and then check a lookup table to see whether the resulting value is valid, if so the value read from the table contains the coefficient/run of zeros, or if not, read another bit and check the lookup table again, etc.
I found that I can replace this:
; mov #0,r0
; sub r3,r0
; mov r0,r3
with: neg r3,r3
(duh) and code alignment might be able to gain something but other than that I`m short on ideas. Maybe a large unwieldy method is needed, because a data rate of 300KB/s doesn`t allow a lot of cycles for processing one bit at a time. Perhaps a jump table with a series of routines for decoding from each position within the word.