Just incredible your work! Congratulations! Well even 3fps for one sh2 is amazing. You can share iso to see how run in real hardware? So you try to continue implementing code? And share the code? are you in discord channel?
I didn't try the parsing code by m35 above but that part wasn't a huge bottleneck with my code (@ around 3 v-blanks to complete) compared to other parts. IDCT decoding and YUV to RGB take around 16-18 v-blanks to complete. Reading from cd was also taking 4 but this is because I didn't implement async reads. Note all this is done a single sh-2. But even using dsp and the second sh-2 there is a big performance gap to get it from 3fps to 15fps.