about Saturn dual CPUs

vbt

Staff member
After playing with dual proc. sample from the SBL6. I tried to include the use of the slave processor into a simple program. The result is that it works only if I do a sprintf (???) into my slave function like this :

sprintf(toto,"%08d" ,SlaveParam );

SlaveParam is used to share a parameter between Master proc and Dual proc.

Is there a reason about this strange behavior ?
 
If you're sharing data between the CPUs the most important things to remember are to declare the variables as volatile and to access them bypassing the cache. The SH2s do not feature bus snooping, so one CPU has no idea the other one wrote something to memory. They also don't maintain cache coherency, so data that is still only in the cache of one CPU is invisible to the other one.

EDIT: Another important thing I forgot to mention is that memory accesses are not atomic, ie. any operation can be interrupted at any time. Therefore any shared variables or memory areas should be guarded with locks or some other signalling mechanism. Sega's libraries should provide the necessary primitives, and GCC comes with a basic set as well.
 
Originally posted by antime@Dec 30, 2003 @ 08:58 PM

If you're sharing data between the CPUs the most important things to remember are to declare the variables as volatile and to access them bypassing the cache

I declared my variable as volatile and it solved my problem :) Thanks a million :D

Thanks also for memory access tips :)
 
Yeah these guys are impressive. What really amazes me though, is how Charles Macdonald did all of his programming WITHOUT using the SGL\SBL libraries, and he did this years ago.
 
Do you mean Charles Doty? His programs use SGL. Charles MacDonald didn't start coding for the Saturn until I had released the C version of my (libless) copperbar sample. Bart Trzynadlowski, Tyranid and others made some programs without libraries and Azuco started on his own set of libraries. Some of the documentation (VDP1 and VDP2 manuals, plus a few others) have been available on the net since around 1997 or so, it's just a matter of being able to read and understand them.
 
Originally posted by M3d10n@Dec 31, 2003 @ 04:19 AM

Any evil plans for the newly found dual CPU power, VBT?

;)

For now nothing, in fact on my simple tests(applied to sms plus after a useless prog) I lost speed. I registered the bg function to be used with the slave proc and while this one was running the master proc ran the sprite rendering function. Something like that :

Code:
....

#ifdef SLAVE2

    useSlave((Uint32)line, render_bg_sms);    

#else 

    render_bg(line);

#endif    

    /* Draw sprites */

    render_obj(line);

    

#ifdef SLAVE2

    waitSlave();

#endif 

...

//---------------------------------------------------------------------------------------------

void useSlave(Uint32 param, volatile Uint32 *function)

{

  slave_command= function;

  SlaveParam = param;

    

  *(Uint16 *)0x21000000 = 0xffff; /* slave FRT inp invoke */

}

//---------------------------------------------------------------------------------------------

void waitSlave()

{

  while( slave_command != 0 )

  ;

}

//---------------------------------------------------------------------------------------------
 
Both CPUs are connected to the rest of the Saturn using a single, shared bus. When both CPUs want to access something, one CPU gets the bus and the other has to wait until it's free, resulting in slowdown. To help against this the cache of the CPUs can be configured as 2K shared cache and 2K RAM (normal mode is 4K mixed cache), and IIRC the slave CPU is configured like this by default. By working out of cache on data in the internal RAM external bus accesses can be minimized which should lead to better performance.
 
Ok I'll try to use the cache and if I understood I have only to create each time a second variable that points on the source variable address with 0x20000000 added and it will copy the variable to the cache automatically.
 
No, that would bypass the cache entirely. When reading a data location with the top three address bits set to zero an entire cache line (16 bytes on the Saturn's CPUs) is read into the cache (which is why hardware register accesses have to bypass the cache). The cache chapter in the 7604 manual describes how it works. It's a tricky subject and not really worth bothering with unless you suspect you actually have a performance problem due to it (like having arranged your data so you always get cache misses).

When the cache is configured as cache+RAM, 0xc0000000 to 0xc00007ff become RAM so copy your data there, do whatever operations you want to on it and copy it back out to wherever you want it. The code that operates on this data should be as small as possible to make effective use of the remaining cache, which means many small loops rather than one big loop and so forth.
 
Yes, you can do that as well, I forgot about that possibility. To create code that runs from that area you must play around with your link script and use GCC's section attribute to map the code and data to the right addresses. The ld manual has an example on how to create a section with different load and virtual addresses which you can use pretty much as-is.
 
VBT: There's another sample program and some more information in Saturn Technical Bulletin #28 if you need some more code to look at.
 
Back
Top