To TakaIsSilly


I sent you current saturn.c. And i describe the problem in this email. Hope you will find the bug.

In two words, somehow with removage of my glitched off-screen buffer (i.e. bringing video routine to close to original emu) - the new glitch appeared - the background is in black. All background uses first color from palette. Somehow it's setten as black. But with usage of off-screen buffer it's not allow it to be setted as black.

Possible there also some problems with priority/transparency (not sure, they are in render.c and OS independend)

Find the solution. Strange, but palette need to be moved on +0x01 (and of course color in putpixel too need to be +0x01). Now it's work ok (at least on Sonic).

Somehow Saturn sets firts color as black and if palette moved - all's ok.

Will send you curren saturn.c right now.
I belive that's got something to do with the

slBackColSet() function.

Well, any especific areas you want me to work on? Your solution seems to be the only way... But I have a couple of more optimizations to do, including a faster method for your bitmap display. Not that it matters mutch ^^;. I'll send the sorce when it's done, so you can integrate it if you want.

Maked save/load sram support! For now it's save into internal backup memory of saturn (to save into cart, it need to change only one byte in code :) ). But problem is that sms/gg sram amount at default is 32kb as saturn internal ram. But saturn also uses some amount for file name/description (and time/date) so it's impossible to use ALL 32kb of internal memory. So for checking purposes i uses only 16kb for sram save. I tested it with game gear Phantasy Star Gaiden and it's works ok.

For future - there are two variants: use for saving cart, or use any simple archiver to fit all 32kb of save sram to internal "smaller-then-32-kb"

Seems that i need help in making a rom loader. I think that it no need in use of GFS library. We just need to load index file as we do with roms into any array. And then get strings from this array. index must contain only rom name with extension + size of rom. So, it's a basis.
Ok. Why not start to implementing sh2 asm into z80 core (really need it to speed up the emu).

Here is the ADD instruction from z80 emu with x86 asm:

#define ADD(value)

asm (

" addb %2,%0


" lahf


" setob %1

" /* al = 1 if overflow */

" addb %1,%1


" addb %1,%1

" /* shift to P/V bit position */

" andb $0xd1,%%ah

" /* sign, zero, half carry, carry */

" orb %%ah,%1


:"=r" (_A), "=r" (_F)

:"r" (value), "1" (_F), "0" (_A)


What it really do. It's adds "value" to "_A" and sets flags at "_F". In case of flags at "_F" it uses x86 AH register to check is the value overflow/carry/sign/zero.

So, why not start to make this on sh2 asm:

Here is the beginning:

__asm__ volatile (

" add %2,%0;"

or we need check it for carry/overflow and use addc/addv and check T-bit?

Can somebody help with this one?
So, here is comments:

addb %2,%0

- adds value to 0 (_A) move result to _A


- loads AH register with status flags (those flags are very close to Z80 ones which must later be set to _F. So, _F is somewhat like AH). Status flags setted by result of addb %2,%0 (they are SIGN, ZERO, PARITY, OVERFLOW, CARRY)

setob %1

- sets _F to 0x01 if the overflow flag is 1.

addb %1,%1

addb %1,%1

- so if the _F = 00000001 then we got 00000100


parity/overflow bit

if the _F = 00000000 (no overflow) - 00000000


parity/overflow bit

and by this we already have setted Parity/Overflow flag of Z80 register F

andb $0xd1, %%ah

- check for other flags (sign, zero, half carry, carry) in AH? 0xd1 is the mask of those registers - so if 1=1 then 1?? Not sure about this one....

orb %%ah,%1

- if overflow flag in AH or in _F = 1 then set it to 1??????? why?

help me :))
Problem. From this code. Im not sure: is this only sets Parity/Overflow flag to _F (i.e. without settings other flags to _F). I.E. this code always output _F as 0x00 or 0x04 (from overflow bit)?? But what about all other registers (sign/zero/carry/h-carry)? Z80 needs them but why code outputs only overflow bit?

Why used two last instructions (logical and, or)? if it no output to F??????????

Btw, if do only output _F as 0x04 (00000100) or 0x00 (00000000) (but i think it's wrong) we can do such code:

addv %0,%2

- (add value to _A, place it into_A set T-bit as follows: overflow = 1; not overflow = 0)

movt %1

- move overflow bit (1 or 0) to _F

add %1,%1

add %1,%1

- shift overflow bit to it propertly position in _F

btw about sign. Can we use it to detect sign:

MOV.B @(R0,Rm),Rn 0000nnnnmmmm1100 (R0 + Rm) --> Sign extension --> Rn

i.e. we can move value to r0 then use this instruction and then check Rn??

Please, help me.

andb $0xd1, %%ah

- check for other flags (sign, zero, half carry, carry) in AH? 0xd1 is the mask of those registers - so if 1=1 then 1?? Not sure about this one....


little addition: adn what?? We check those flags and result placed into memory position of 0xd1 - but what relation it have to _F?? And does this resets overflow bit?


orb %%ah,%1

- if overflow flag in AH or in _F = 1 then set it to 1??????? why?


And dont we already sets overflow flag in _F (i.e. as i understand this can only check for overflow - because we dont have anything more in _F register) why we do this twice? Or the _previous_ andb function somehow (??? how???) sets _F register with other flags form AH (which choosed by mask 0xd1)


Aw damm... I wish I could help right now, but i'll soon leave to celebrate X-Mas and can't give you a proper reply until friday... As for looking at the code, I can only assume you are two things. RISC processors only operate aditions from registers, meaning that you have to use a memory function to load the data into the registers first, and only then perform the operation... but I don't have any documentation in this PC... I belive ExCyber has more SH2 docs than me. Also to notice is that all SH2 registers are 32-bit long, and unswitchable, so the only way to check for those 8-bit flags is to perform other logical operations, or to load them at the final 8 bits (ie. shift them all 24 bits to the left)

Maybe I'm not making any sense, but I'm in a hurry ^^;
Oh, i whis you happy X-mas, TakaIsSilly! =) (seems that the early congrats is not so bad really :)))

Will wait for your return (in next year =))) )!
First of all, I think it might be a bit easier to work from a reference doc or two, at least initially, rather than an emulator written in x86 assembly language... a lot of the optimizations possible for an x86-hosted emulator will probably not translate well to SH2, because they're based on the fact that x86 is derived from the same architecture as Z80 (not directly of course, but they're both extensions to the 8080, so their instruction sets are much closer than either one is to SuperH). With my complaining now out of the way:

btw about sign. Can we use it to detect sign:

MOV.B @(R0,Rm),Rn 0000nnnnmmmm1100 (R0 + Rm) --> Sign extension --> Rn

i.e. we can move value to r0 then use this instruction and then check Rn??

I'm not sure what you're trying to accomplish exactly.

That instruction will read a byte from memory into a register and handle the sign extension (this just means padding it with zeroes if it's positive and padding it with ones if it's negative). It's for loading bytes without mangling the positive/negative aspect of it in the transition to a 32-bit number.

I'm not as familiar with SH or Z80 as I'd like to be... this information might not be completely correct, and probably isn't the fastest way to implement these checks...

If you want to know how to set the negative flag of the Z80 F register, as far as I know you can just copy bit 7 of the result to the emulated Z80 F register (according to the doc I linked above, this is how it works in a real Z80).

As for the other flags, here are my thoughts:

Note that when I refer to "copying" a bit, it is probably most easily achieved by using an instruction sequence similar to the following:

and #(bit mask), r0

xor #(bit mask), r0 (if an inversion is needed)

(necessary SHLLx/SHRLx instructions to shift the bit to the appropriate place)

or r0, (register where emulated Z80 F is stored)

Zero : Use CMP/PL on the result, then copy and invert the T flag (of the SH status register). This is also used by the comparison/search instructions as a true/false flag, which will need a different implementation.

bit 5: A copy of bit 5 of the result. This is officially unused, but a couple ZX Spectrum games are known to depend on it working correctly. It's probably not important for now.

Half-carry: As this relates to 4-bit math, I'm not sure how to implement it quickly (but then, I'm not fully awake yet either...).

bit 3: A copy of bit 3 of the result. Basically the same as bit 5.

Parity/Overflow Flag: This is a mess. It's used for indicating an overflow for add/subtract, but also for indicating parity for logical operations. It's probably best to use lookup tables of some sort for this one.

N: Set to 0 by addition instructions, and set to 1 by subtraction instructions. Apparently that's all there is to it.

Carry: For adds, I think it should be sufficient to copy bit 8 of the result. For subtracts, it would be necessary to invert the bit.

I also did some calculations regarding performance. For a 3.58MHz Z80, you get almost 8 SH cycles per Z80 cycle. Z80 instructions take varying amounts of time to execute, but the minimum is 4 cycles. So, you get somewhere around 30 SH cycles for these instructions. The ones that access memory give you more cycles (these are listed in the Z80 manual; in Z80 terms, clock cycles are called "T states" ). You have to take a 16 (I think; it will be more if the Saturn RAM has wait states) cycle loss on each Z80 instruction due to the pipeline flushes from jumping in and out of the instruction code. I think this means that some instructions will need more than their fair share of cycles. Hopefully the difference will be made up by unused cycles in the longer Z80 instructions...

Hope this helps :).

Edit: the code tag apparently doesn't treat angle brackets correctly...

(Edited by ExCyber at 2:23 pm on Dec. 24, 2001)

Tryed this (unsuitable to direct insertion into z80.c, because at&t format broken, but have my comments):

#define ADD(value)

__asm__ volatile (

" add %0,%2;"

mov %2,%%r0;

and #0x80, %%r0;

or %%r0,%1; //bit 7 "sign" setted

cmp/pl %2;

movt %%r0;

xor #0x01,%%r0;

shll %1;

or %%r0,%1; //bit 6 "zero" setted

//have sign and zero bits setted

shll %1;

mov %2,%%r0;

and #0x20, %%r0;

or %%r0,%1; //5 bit setted

shll2 %1; //move to bit 3 (4 unknown)

mov %2,%%r0;

and #0x08, %%r0;

or %%r0,%1; //3 bit setted

shll %1; //move to bit2 (p/v)

mov #0x01,%%r0;

or %%r0,%1; //bit 2 setted (overflow)

shll %1; //move to bit 1

mov #0x00,%%r0;

or %%r0,%1; //bit 1 setted (addition)

shll %1; //move to bit 0

mov %2,%%r0;

and #0x100, %%r0;

or %%r0,%1; //0 bit setted

:"=r" (_A), "=r" (_F)

:"r" (value), "1" (_F), "0" (_A)


r0 used as temporary register.

If it converted into at&t format it still will'not work (actually even compile), because of r0 register (if you will use it as %%r0), but if you will use it as @r0 (and will use mov as mov.l and just r0 in other cases) it will work, but carsh the ssf/saturn. So im sure here is the problem with at&t format in gcc...

ExCyber can you look into this???
I'm not familiar with this syntax... it also looks like it's been mangled somehow (e.g. all the occurences of "96"). Are you sure that you don't need to use gas's SH syntax?
Well, I'm back. On the trip, I suddenly remembered something... Considering we can find the space, could we not use the hardware multiplier described somewhere around the docs GCC to make a _amazingly fast_ version of some of the complex opcodes using arrays? Er... Let me get all my stuff back in place and i'll see what can i do about the SH code...

ExCyber, Denis has the amazing capacity of getting his code from the least obvious place... He has things that are commented only on the /sgl210us folder on the sms emulator. It's quite bold of him :)
After reviewing the gas manual, I'm pretty sure that AT&T syntax is useless for SH code. This should be of interest. I haven't tried this syntax for the version of binutils in the Sega build environment, but I know that it does work fine for binutils 2.10.1.
Hnn... This might not be a relevaltion, but around the end of the GCC.FAQ file there is the following hint :

11) Generating mixed C/assembly listings

  To generate a nice assembly listing with interspersed C source, try:

  gcc -O2 -m2 -g -S foo.c

  as -ahld foo.s > junk.tmp

  "junk.tmp" will now contain a nicely formatted mixed C/assembly listing.

This allows to not only retrieve a correctly formated assembly code, as it lets you see what can you optimize better ^^; Well, back to reading :)