From the SOA#8 (sattech.pdf, SEGA Saturn Technical Bulletins & all),
the MUL takes 1 cycle to complete, but
the ALU computation is done in the same cycle.
The DSP can handle multiple operations simultaneously. They just have to be Operation commands and concern ALU or any different bus.
I can have an ALU computation, while a D1 bus access, while a X access, while a Y access ! 4 operations meanwhile, talk about parallelism ! All I have to do is to ensure that they are operation commands, that they do not concern the same bus and to write these operations on the same line :
In the same cycle, M0 is loaded into X register, and M1 is loaded into Y register. Meanwhile, the MUL is computing X * Y.
On the next step, MUL has finished and you can do
P = X * Y, then you can use P.
With, the ALU, you can do even faster :
Code:
MOV M0,P MOV M1,A
ADD MOV ALU,A MOV ALL,MC2
In just a single cycle, you have done M2[CT2++] = M0 + M1