At least mathematically I understand the iDCT being used now. As I said, at its core a DCT or iDCT is just a matrix multiplication. A 2D DCT is two matrix multiplications, one for the rows and one for the columns. So if X is our image data, the DCT is Y=C X C^T and the iDCT X = C^T Y C. This works because the scaled DCT matrix C is orthogonal.
The codec in Grandia is altered though. All negative entries of the transposed DCT matrix are offset by +1. The non-transposed matrix however is unchanged. You can easily generate these matrices yourself with this Python code.
Python:
import numpy as np
def gen_dct_mat1():
res = np.zeros((8,8))
for j in range(0,8):
for k in range(0,8):
if j == 0:
en = np.sqrt(2)/2
else:
en = 1
res[j,k] = en*np.cos((j*(2*k+1)*np.pi)/16)
if res[j,k] < 0:
res[j,k] += 2
return (0.5*res).T
def gen_dct_mat2():
res = np.zeros((8,8))
for j in range(0,8):
for k in range(0,8):
if j == 0:
en = np.sqrt(2)/2
else:
en = 1
res[j,k] = en*np.cos((j*(2*k+1)*np.pi)/16)
return 0.5*res
I think this means that technically this isn't a DCT transform in the normal sense. At least for the columns of the iDCT input matrix. So in Grandia the equation for the iDCT is X = C_1^T Y C_2, where C_1^T != C_1^-1 != C_2 and C_2^T == C_2^^-1. But C_1 is still invertible, so the DCT should be Y = (C_1^T)^-1 X C_2^T.
When I implement the iDCT with floating point operations and compare it with the results from the game there are some differences. Like here for example:
View attachment 5502
We can see that in the integer result the values get smaller to the upper right, but in the floating point version they actually get slightly bigger. Since the floating point version works with 64 bit floats and I don't see potential for numerical instability here, I must assume that floating point version is more accurate and the differences are caused by the fixed point math used in the integer version. So think what I have to do is implement the DCT with high precision floating point operations and round the result after the quantization step to integers. The decoded results will probably be different from the pre-encoded data not only because of the quantization, but also because of errors in the fixed point math the game uses, but as far as I know that is unavoidable. If someone knows better though, please speak up.
I think with this I can begin to write an encoder. It will probably be pretty bare-bones and rather slow. At the moment my plan is to write most of it in Python with some C extension for stuff that would be too slow in pure Python. The most complicated part is figuring out how to determine what quantization to use per frame, but I think I can actually get around that. Since we'll be encoding pretty much the same video data as the original, I think I can just pull the quantization levels from the existing files and use those. This should hopefully yield very similar results in quality and compression to the original data. It may be necessary to bump up the quality of frames with subtitles though. DCTs are not good with the hard edges you see in text. And some of the video frames look pretty blocky. Like here for example:
View attachment 5503
I doubt that subtitles would be very readable with this strong compression level.