Advanced Audio Compression on the 32X

Chilly Willy

Established Member
I've been playing with audio compression now for awhile. I've been looking at all sorts of compression formats, checking their speed vs compression vs quality. MP3 is not suitable for a number of reasons, a primary one being patent issue - right now, the U.S. is almost self-destructing over bad patents. A secondary issue is speed - I've yet to find an open source mp3 decoder that can do 16kHz mono on the 32X. Finally, even LAME can't encode mp3 at 24 to 32 kbps with a quality worth pursuing.

That led me to ogg-vorbis - no patent issues (knock on wood), sounds GOOD at 24 to 32 kbps, and has one of the fastest integer decoders out (Tremor). So I did some experiments - with just a few assembly optimizations I've got Tremor running fast enough for 22kHz mono on the 32X on real hardware. One issue I ran into concerned memory - my test app needs 128KB for Tremor to decode music (checked in 8KB increments from 64KB to 128KB); it also leaks memory bad enough that I have to reinit the heap between songs to keep it running.

So here's my latest Tremor test. Included is all source and the oggs that are used (22050 Hz mono 16 bit source encoded by the latest oggenc at 24 kbps VBR with an average bitrate of 30 to 33 kbps for most of the songs). This runs full speed in emulation AND on real hardware. The player starts playing automatically on startup and continues cycling through all the songs until you reset/power off. Press A for the previous song, or C for the next song.

TremorMonoTest32X-20110806.7z

As mentioned above, the codec leaks memory bad enough to require reseting the heap between songs, so I used MSYS - Simple Malloc & Free Functions - by malkia@digitald.com. Only, I might want to use MSYS on the Master SH2 as well for something... so I made two versions - MSYS for the Master SH2, and SSYS for the Slave SH2. They are functionally identical, but use different names on the calls along with static vars so that you don't run into cache coherency issues. So the Tremor codec in the archive above uses SSYS since it runs entirely on the Slave SH2.

memlibs-20110806.7z

All the above code uses my current toolchain. MSYS comes with makefiles for both the 32X and m68k, while SSYS just compiles for the 32X.

To make the test, cd to libssys and make clean, make, make install. Then cd to the test dir, cd to Tremor and make clean, make, make install. Then cd .. and make clean, make. You should then have oggtest.bin. This binary is included in the main archive above for folks who don't want to build this themselves.

As a comparison to my tests on CVSD ADPCM, mono 22kHz ogg-vorbis at 24 to 32 kbps is equivalent to 1 to 1.5 bits per sample ADPCM. If you compare the quality of the 30 kbps ogg samples, it sure sounds a lot better than my 1 or 2 bits per sample tests... better than the 3 bits per sample TADPCM as well. Now if I can get the memory usage down a bit and maybe a few more assembly optimizations...
 

antime

Extra Hard Mid Boss
IIRC the memory leaks have been known for many years, but for some reason Xiph have not been interested in fixing them. There's a version of Tremor optimized for DSPs which should consume less memory, but may use more CPU. It's found in the "lowmem" branch of Xiph's SVN. Some of the changes in the branch were supposed to be folded into the mainline version, but I don't think that ever happened. You may also want to investigate if any of the Tremolo optimizations could be applied.

If you haven't completely discounted MP3s, the Helix MP3 decoder is pretty good (at least on ARM). Figuring out how to get the source code can be a pain though, IIRC you have to register on the Helix website first.
 

Chilly Willy

Established Member
antime said:
IIRC the memory leaks have been known for many years, but for some reason Xiph have not been interested in fixing them. There's a version of Tremor optimized for DSPs which should consume less memory, but may use more CPU. It's found in the "lowmem" branch of Xiph's SVN. Some of the changes in the branch were supposed to be folded into the mainline version, but I don't think that ever happened. You may also want to investigate if any of the Tremolo optimizations could be applied.

Thanks, I will probably try the lowmem branch. It should cut the memory by more than half from what I've read. As long as 22kHz mono still plays fine, that would probably be the best. Tremolo is very similar to the RockBox Tremor, but with a LOT more assembly. When I look for areas to try out SH2 asm, I look at RBT and Tremolo for a guideline based on what they've converted to ARM.

If you haven't completely discounted MP3s, the Helix MP3 decoder is pretty good (at least on ARM). Figuring out how to get the source code can be a pain though, IIRC you have to register on the Helix website first.

I got the Helix mp3dec code and it's speed is SOLELY due to having converted nearly ALL the C to assembly for ARM, and using SSE wherever possible on x86.

I may go back and work on libmad some more using what I've learned on Tremor. My libmad test was pretty close to what I got from Tremor. If I can get mad and Tremor both working at 22kHz mono in a reasonable amount of memory, that would probably satisfy most folks. It IS just the 32X, after all. I'm amazed at some of the things I've been able to get it to do. Seriously, we now have ogg-vorbis run solely off the slave sh2 with DMA PWM audio. I never doubted my MOD player would run fine, but ogg-vorbis? The SH2 is a good CPU, but the 32X has it on a 16 bit bus with slow access to rom (8 cycles per word if I remember rightly, best case). I knew the Saturn can handle it, but I'm mighty pleased with the 32X so far.

EDIT: Trying the lowmem branch now... sounds okay so far. All but one of my optimizations transferred over fine. Now using 48KB of ram instead of 128KB.
biggrin.gif


EDIT 2: Here's the latest binary and source: http://www.mediafire.com/download.php?9acgq3givvi8kvd
 
Top