Mmmh ... there isn't any multithreading instruction in the SH2. If you want to create a multithreaded library, you'll to do it from scratch at C level.
But that sounds like a good idea (if the execution doesn't kill the speed 😛)
[edit] The SH2 execution looks pretty much like a thread : as soon it's started it loops waiting for the master SH2 FRT signal to be issued. When it's done, it then do some stuff and goes back to the waiting state when it's over ...