So I'm thinking about porting a multi-threaded program I'm working on (an interpretter for a dataflow language if you're curious) to the 32X and/or Saturn and I was trying to think of a sane way to use both processors in a reasonably efficient fashion. I think I have a reasonable solution, but I'd like some feedback to make sure I haven't missed anything obvious. So here it goes:
Threads locked to the processor they started on (I'll probably stick with a single thread per processor for this particular application)
Code and stack accessed through cached memory region
Globals and heap accessed through non-cached memory region
The logic behind this being that code is read only and therefore we don't have any coherency problems there. Each stack will only be touched by the single thread it belongs to and since threads can't move between processors only one processor will ever look at a given stack. Globals and heap on the other hand are potentially shared by all threads and manually flushing everything is going to be difficult to do properly and probably not very performant (apart from some special cases where the data is mostly read only, but those I can handle as exceptions if there's enough performance to be gained).
Will this approach work? Does it sound like a good compromise between performance and complexity?
I also posted this over on the Spritesmind forum, but I figure you Saturn devs would have more experience dealing with the SH-2s unfortunate lack of cache coherency.
Threads locked to the processor they started on (I'll probably stick with a single thread per processor for this particular application)
Code and stack accessed through cached memory region
Globals and heap accessed through non-cached memory region
The logic behind this being that code is read only and therefore we don't have any coherency problems there. Each stack will only be touched by the single thread it belongs to and since threads can't move between processors only one processor will ever look at a given stack. Globals and heap on the other hand are potentially shared by all threads and manually flushing everything is going to be difficult to do properly and probably not very performant (apart from some special cases where the data is mostly read only, but those I can handle as exceptions if there's enough performance to be gained).
Will this approach work? Does it sound like a good compromise between performance and complexity?
I also posted this over on the Spritesmind forum, but I figure you Saturn devs would have more experience dealing with the SH-2s unfortunate lack of cache coherency.