It's time to revise what I wrote initially about the FMV codec. I didn't really know (and still don't) how to use the correct technical terms for all this video compression stuff. In German you'd say I have "dangerous half knowledge" of video codecs, so please bear with me.
The first thing I want to revise is calling this codec MPEG. At the current stage of reverse engineering, I think all this codec has in common with MPEG is the use of a DCT. This is probably where the similarities end.
The next thing is the header. I think it's actually 16 bytes long instead of 12. The additional 4 bytes seem to be always zero, but don't seem to be padding, as the game reads each of these bytes explicitly at some point. This leaves 60 bytes for section 0, but since an offset for the start of section 1 is specified in the header it's possible that it's length is variable. I haven't seen a different length than 60 bytes yet though. I still don't know what section 0 is used for, but it's probably not for the quantization.
Now let's talk about section 1. I was correct about the structure of this section, but wrong about the content. The number per block that is written here isn't the DC coefficient of the block. Instead it contains the number of non-zero elements of the macro block and their ordering. If we look for example at the first frame of the intro movie, section 1 is simply filled with zeros. The game reads the zero and uses the following lookup table to get the number of elements.
[CODE title="Element number lookup table"] self._elem_num_lut = [1, 2, 3, 4, 5, 6, 7, 8, # starts @6036C90
2, 4, 6, 8, 10, 12, 14, 16,
3, 6, 9, 12, 15, 18, 21, 24,
4, 8, 12, 16, 20, 24, 28, 32,
5, 10, 15, 20, 25, 30, 35, 40,
6, 12, 18, 24, 30, 36, 42, 48,
7, 14, 21, 28, 35, 42, 49, 56,
8, 16, 24, 32, 40, 48, 56, 64][/CODE]
This means 0 => 1 element to write. We see that some numbers are found multiple times in the table. This is because the number also specifies the order in which the elements must be written to the block. The ordering is found in another lookup table.
[CODE title="Order lookup table"]
lut = [[0], [0, 1], [0, 1, 2], ..., [0, 1, 7, 14, 8, 2, 3, 9, 15, ...]]
[/CODE]
The table is quite long so I left out most of the elements, but I hope the idea is clear. If we look at the last element of the lookup table we can see the zig-zag ordering that is used in JPEG for example. If we go back to the example, we see that the order that corresponds to 0 is [0]. So basically write the single element to position 0 in the macro block. Simple enough.
The actual elements of each macro block are encoded in section 2 and 3. Like I said before, section 2 works like the Grandia image compression. This means the first 16 bytes specify how to interpret the data. This is best explained with an example, so let's look again at the first frame that is just a black image by the way. Here we find "0x00000000000000000000000000010010". This means there are only two different codes found in the encoded stream: 27 and 30. This is because the 1s are found at the 27th and 30th position of the hex string. The one means each code is encoded in 1 bit. Therefore 0b0 => 27 and 0b1 => 30. The meaning of the codes is not trivial and I don't fully understand what they mean. I've completely reverse engineered what they do though. What is simple is the meaning of the code "30". It means "end of block". Each macro block ends with this code. The code "27" basically tells us how to decode the first element. It translates to: No zero elements before the current element, use 11 bits from section 3 and add 35, get a value from another lookup table and multiply it by -1, multiply both values for the final element. Well, I told you it was complicated...
Let's go back to the example. The beginning of section 2 (after the first 16 bytes) is "0b0111111111..." This means the first code is 27. As explained above the first step is to get 11 bits from section 3. This is 0b01111011100 or 988. 988 + 35 = 1023. This our first number. The value from the lookup table is 1, so 1023*1*(-1) = -1023. This is the value of the only element of the first block. Now the next bit is read from section 2. It's 1 so the code is 30 or "end of block". This means switch to the next macro block.
The next code is again "30". Since the block size is 1, but the end of block is already reached we write the only element as zero and switch to the next block. Since all following bits in section 2 are 1, this is done for all remaining blocks.
In summary we have one macro block with -1023 as first and only non-zero element and a lot of zero macro blocks for the rest. If you're paying attention you may ask yourself how this makes sense. The picture is filled with black so all macro blocks should have the same values. It's because the macro blocks are delta encoded. So the blocks following the the first one only contain deltas for each element. Since all blocks are same as the first one, they only contain 0s here.
The next step seems to be the iDCT. I'm currently reverse engineering the functions that are used for that and will write up some more once I'm done.