PSX str format on saturn

There shouldn't be any technical reasons preventing it, someone just has to make a decoder.
 
Do you think the IDCT (which I think would be the main CPU-eating part) can be done on the SCU DSP, or is it not big/fast enough to handle that?
 
I think it could, if I understand correctly it's basically a series of matrix multiplications. The matrices happen to be 8 by 8, which means they'd fit nicely in the DSP's memories. I would try to set it up so that a CPU decodes the a macroblock while the DSP performs dequantization and IDCT on the previous one.

The other big time sink would be colour space conversion. I've wondered if it would be possible to speed this up by using a separate background layer per primary colour, but I don't understand the VDP2 blending functions enough to say if this is possible at all.
 
I don`t think it`s going to work. Every non-zero coefficient is going to require 64 multiplies and adds to do the iDCT. 28MHz divided by 15fps and 1680 iDCTs per frame (320x224) leaves only 1134 cycles per 8x8 block (per CPU). And there is a bitstream to decode. I don`t know much about the SCU DSP but I don`t think it can do multiple ops in parallel like MMX or the MDEC can it? I just tried writing a decoder for .STR in FreeBASIC and it runs a bit slow on my 1.2GHz laptop so I don`t see SH2 assembly faring any better.
 
DamageX said:
I don`t know much about the SCU DSP but I don`t think it can do multiple ops in parallel like MMX or the MDEC can it?
Not MMX-style (SIMD) parallelism, but instruction-level parallelism. Different bits of the instruction control different functional units, so you can have a single instruction that does something like "multiply current sample by current coefficient, store n-1 result to RAM1, transfer n-2 result from RAM1 to DMA engine, load immediate next coefficient, load next sample from RAM2". Probably more operations there than the SCU DSP can handle in one word, but hopefully you get the idea.
 
I was just thinking maybe if the resolution is cut down a bit, say decoding only 4x8 or 4x4 pixels in a block, then maybe the Saturn can scrape together enough power to keep up? YUV->RGB conversion might as well be done with a lookup table. Knock Y down to 6 bits and U/V to 5 bits each and pull a 15-bit RGB value out of a 128KB table. Since there isn`t enough VRAM to do double buffering with a 32bpp screen anyway. Haven`t looked at the audio yet, I wonder if the 68K could decode it.
 
DamageX said:
I`m having a hard time even decoding the bitstream fast enough without iDCTs. The Playstation CPU manages it somehow, so maybe I just suck. But check this out, it shows video from a .STR file in RAM by using the DC coefficients only:

http://www.hyakushiki.net/str3.zip

The PSX CPU doesn't manage it somehow - it has extra hardware to decode the stream... just as the PS2 has extra hardware to decode MPEG2 data.
 
It's been a while since I've looked at it, but I'm pretty sure that the MDEC only decodes batches of macroblocks, i.e. the stream itself is still unpacked by the CPU first.
 
ExCyber said:
It's been a while since I've looked at it, but I'm pretty sure that the MDEC only decodes batches of macroblocks, i.e. the stream itself is still unpacked by the CPU first.

I haven't looked at it in a while either, but the PS2 has hardware to help unpack the bitstream, so I'm guessing the MDEC does as well. I don't remember emulators passing an unpacked stream to the MDEC functions... I THINK it got a raw stream. I could be wrong, though. A 33MHz MIPS shouldn't have any trouble decoding the bitstream, but you need the CPU doing more than just decoding video streams, so it would make sense to put hardware to at least help with that in the MDEC.
 
I don't see any stream-unpacking code in PCSX's MDEC code, and some other documents strewn about on the web suggest pretty strongly that developers were able to make up their own bitstream formats (some of which reportedly use custom Huffman codes.
 
ExCyber said:
I don't see any stream-unpacking code in PCSX's MDEC code, and some other documents strewn about on the web suggest pretty strongly that developers were able to make up their own bitstream formats (some of which reportedly use custom Huffman codes.

That would certainly imply the CPU decodes the stream.
smile.gif
 
Indeed the PlayStation MDEC chip only handled streams of 16-bit MDEC codes. There was no hardware support for parsing the bitstreams (3 distinct variations have been found in the wild and will soon all be documented).

It's hard to tell what bit parsing approach you've used in the ASM code. I've written about a couple methods that should be pretty fast.

If C/C++ compilation is possible, a C++ implementation is available in the Q-gears repo (the code has been put together into a self-contained program in the download posted here). Reevengi might also include a C version. Both implementations get the colors wrong, so you might want to tweak them for correct output.
 
Has anyone gotten C++ working on Saturn? I've heard of people building sh-elf-g++ as part of a Saturn toolchain, but I don't recall hearing of any Saturn programs successfully built with it.
 
My method was to read two bits and then check a lookup table to see whether the resulting value is valid, if so the value read from the table contains the coefficient/run of zeros, or if not, read another bit and check the lookup table again, etc.

I found that I can replace this:

; mov #0,r0

; sub r3,r0

; mov r0,r3

with: neg r3,r3

(duh) and code alignment might be able to gain something but other than that I`m short on ideas. Maybe a large unwieldy method is needed, because a data rate of 300KB/s doesn`t allow a lot of cycles for processing one bit at a time. Perhaps a jump table with a series of routines for decoding from each position within the word.
 
ExCyber said:
Has anyone gotten C++ working on Saturn? I've heard of people building sh-elf-g++ as part of a Saturn toolchain, but I don't recall hearing of any Saturn programs successfully built with it.

I've got C++ working on the MD and 32X. I can apply much of the 32X coding towards the Saturn. I meant to make a toolkit for the Saturn anyway so I could do some homebrew.
 
Tried my hand using Approach 3 in C (just bit parsing) in case you're interested link. It needs more memory due to the 2048 byte lookup table. I don't know of any (theoretically) faster approach, so I'm curious if it really is fast (enough).
 
Back
Top