Getting GCC to generate the SH2 Assembly for Q16.16 fixed-point multiplication

Ponut

Gear Supporter
Credit to @ThePuristOfGreed, well, the Discord user for letting me know about this.

Basically, I had assumed for awhile that the compiler wouldn't know how to do fixed-point multiplication on the Saturn because the SH2 does not have a 64-bit register/addressing mode and Q16.16 fixed-point multiplication requires it.

So normally, I do this:
inline FIXED fxm(FIXED d1, FIXED d2) //Fixed Point Multiplication
{
register volatile FIXED rtval;
asm(
"dmuls.l %[d1],%[d2];"
"sts MACH,r1;" // Store system register [sts] , high of 64-bit register MAC to r1
"sts MACL,%[out];" // Low of 64-bit register MAC to the register of output param "out"
"xtrct r1,%[out];" //This whole procress gets the middle 32-bits of 32 * 32 -> (2x32 bit registers)
: [out] "=r" (rtval) //OUT
: [d1] "r" (d1), [d2] "r" (d2) //IN
: "r1", "mach", "macl" //CLOBBERS
);
return rtval;
}

This is also an assumption paired with a solution that I've spread to other developers.

It had been elucidated to us by this fellow that GCC could actually generate this.

cap3.PNG


That website (compiler explorer) is really cool for this.

Following some smoothbrain complications in getting it to work in C (and not only C++ without the std definitions for uint64_t), I came to this result:
//this particular arrangement of C coaxes GCC into outputting dmuls.l followed by sts mach, sts macl, and xtrct
inline int fxm(int d1, int d2) //Fixed Point Multiplication
{
unsigned long long c = (unsigned long long)d1 * (unsigned long long)d2;
return c>>16

The best news is this comes with a measurable performance benefit, since GCC no longer has to manage the registers going in and out of the inline assembly block.

I just wanted to share so that this could be searchable in future.
cap2.PNG
Capture.PNG
 

Attachments

  • cap3.PNG
    cap3.PNG
    11.6 KB · Views: 0
Back
Top