Saturn's SCSP and Dreamcast's AICA: What would it take to make homebrew sequenced music possible on them? Ideas, and a call for help and suggestions!

Hello there! My name is baku-chan, and I'm writing here to inquire about the possibility of creating homebrew sequenced music for the Saturn's SCSP — as well as its close relative AICA for the Dreamcast — as well as what it would take to make it happen.

Just to let you know all know about my background with all of this: while do I have basic technical knowledge, I'm no programmer myself. I'm just a lover of video game music and especially of sequenced video game music; the kind from eras long past when everything was done either in chiptune or on synthesizers, many of which I don’t think have really had their potential tapped out yet. Like, for instance, I think that sound chips like the Mega Drive/Genesis’s YM2612 and the SNES’s SPC700 have really gotten a lot of attention when it comes to authenticity in emulation and pushing the limits of the hardware, and that’s great! But I also think that later sample-based sound hardware like the SCSP and AICA — plus the PS1’s SPU and PS2’s SPU2 — have been almost forgotten in comparison and aren't taken not taken as seriously when it comes to the idea of writing music for them, like we do now with YM2612 and SPC700-based homebrew music in an almost casual, taken-for-granted way that was unthinkable not too long ago!

I know that there are some very knowledgeable people in this community who I’m hopeful might be able to help make my crazy dream possible, haha, and to help share that dream with others who might be interested in such a thing. Whether you’re a musician or a lover of video game music or just someone looking for a challenge — or if you’re some combination of all of the above! — then I’d love to make your acquaintance.

For the uninitiated or anyone just stopping by here wanting to know why I would want to chase such a dream, I would like to take the time to go over some the things that make both SCSP and AICA special, my perspective on all of that compared to other sound chips, and why I think that they deserve to have the ability to write homebrew sequenced music on them.

To start, much has been said — rightly or wrongly — about the capabilities of the rather infamous SCSP in the Sega Saturn, as well as arguably overshadowed younger sibling AICA in the Dreamcast. Of course, I'm sure that everyone here knows the basics about these chips, included the much-touted features of the SCSP in particular that have often been considered to be significant advantages over the likes of, say, the SPU found in the PS1. Such as, for instance, FM support via using SCSP's 32 voices as modulators (such that you could supposedly have up to eight channels of 4-op FM, six channels of 6-op FM, or even a single channel of 32-op FM should one be so inclined), as well as a DSP with several more effects over the basic reverb, echo, and delay capabilities of the PS1's SPU including chorus, low- and high-pass filters, an equalizer, and more. With said features, the SCSP is theoretically very powerful, and indeed arguably more so than the SPU in many ways (or even later sound chips like AICA and the PS2's SPU2, for that matter).

With that said, the SCSP is also well-known for its "fatal flaw" in the form of its lack of hardware ADPCM compression. This is the one thing that the PS1's SPU does have over the SCSP, and it's a significant advantage indeed. One that I'd personally argue actually makes the SPU superior to the SCSP overall; vastly so, even. The lack of hardware ADPCM compression maeans that SCSP has only one-fourth of the effective sound RAM that SPU does, making memory management a significant issue for anything making music for the SCSP. You can hear the effects of this even with some of the most accomplished sequenced music OSTs on the Saturn — such as Radiant Silvergun or Panzer Dragoon Saga — where you can hear a noticeably "compressed" sound to them compared to similarly accomplished sequenced PS1 OSTs. For further consideration, take two such accomplished sequenced PS1 OSTs: Rhapsody: A Musical Adventure (with its sequel Little Princess having similarly constructed tracks) and Aitakute... your smiles in my heart. Would either OST be possible on the Saturn? Well, with Rhapsody, the sound test shows that most songs — which are highly orchestral in nature, by the way — key on all 24 of SPU's channels throughout the vast majority of their runtime, all while pushing what are clearly very high quality samples compared to most Saturn OSTs or even less ambitious PS1 OSTs. Meanwhile, observing the PSF file sizes of Aitakute... reveals that the game's songs frequently exceed the 400 KiB mark compressed — with the highest number being a staggering 489 KiB, barely enough to be able to enable reverb on top of that — all again while using very high-quality samples. Pulling off either of these OSTs on a system with just over a fourth of effective sound RAM available to it would inevitably be extremely difficult, if not impossible, without significant cuts to both sound quality and overall complexity. After all, Aitakute...'s maximum sound RAM usage, in uncompressed terms, would be 1739 KiB (or 489 KiB times the compression ratio of SPU's ADPCM scheme, which is 32/9 or roughly 3.55555556:1). Assuming that Aitakute...'s samples are all 44.1 KHz, then cutting the sample rate in half down to 22.05 KHz still results in 869 KiB worth of samples being used, with a noticeable — if arguably acceptable — drop in sound quality all the while. Cutting that number in half would then give you a workable 435 KiB, but now the sample rate is an unequivocally unlistenable 11.025 KHz. Lowering the sample rate simply isn't an option past a certain point; your only options then are to remove instruments until everything fits into 512 KiB, or to lower the sample resolution to 8-bit in order to accomplish the same, with predictable — and very likely still unlistenable — results.

To really and finally drive the point home, here's a rough summary of how many total samples Saturn's SCSP, PS1's SPU, Dreamcast's AICA, and Sony's SPU2 can each store in the form of seconds at multiple commonly used sample rates, with any bespoke compression schemes converted to a standard unit of uncompressed PCM and after things like sequence data, tone data, sound driver, and DSP effect RAM usage are taken into consideration:
NOTE: Sound driver, sequence data, tone data estimates don't exist for SPU and SPU2 here because those are all stored in main RAM for the former and in the IOP's RAM for the latter, rather than in their separate pools of sound RAM.

SCSP = 384 KiB ((512 KiB - 44 KiB driver [based on "Saturn Sound Driver Implementation Manual" (ST-241-042795, pg.26)] - 48 KiB DSP [average DSP usage guesstimate] - 32 KiB [sequence data guesstimate] - 4 KiB [tone data guesstimate]) * (1/1) [already uncompressed])
@ 22.050 KHz => (384 KiB / 44.1 KiB/sec) = 8.70748 seconds
@ 32.000 KHz => (384 KiB / 64.0 KiB/sec) = 6.00000 seconds
@ 44.100 KHz => (384 KiB / 88.2 KiB/sec) = 4.35375 seconds
SPU = 1479 KiB ((512 KiB - 96 KiB reverb unit [max possible reverb unit RAM usage]) * (32/9) [Sony SPU-ADPCM])
@ 22.050 KHz => (1479 KiB / 44.1 KiB/sec) = 33.53741 seconds
@ 32.000 KHz => (1479 KiB / 64.0 KiB/sec) = 23.10938 seconds
@ 44.100 KHz => (1479 KiB / 88.2 KiB/sec) = 16.76871 seconds
AICA = 7392 KiB ((2048 KiB - 64 KiB driver [guesstimate based on 32 KiB x 2 for Puyo Puyo DA!'s MANATEE.DRV driver] - 96 KiB DSP [average DSP usage guesstimate from SCSP x 2] - 32 KiB [sequence data guesstimate] - 4 KiB [tone data guesstimate]) * (4/1) [Yamaha ADPCM])
@ 22.050 KHz => (7392 KiB / 44.1 KiB/sec) = 167.61905 seconds
@ 32.000 KHz => (7392 KiB / 64.0 KiB/sec) = 115.50000 seconds
@ 44.100 KHz => (7392 KiB / 88.2 KiB/sec) = 83.80952 seconds
SPU2 = 6372 KiB ((2048 KiB - 256 KiB reverb units [based on reverb unit RAM usage for both cores]) * (32/9) [same ADPCM as PS1's SPU])
@ 22.050 KHz => (6372 KiB / 44.1 KiB/sec) = 144.48980 seconds
@ 32.000 KHz => (6372 KiB / 64.0 KiB/sec) = 99.56250 seconds
@ 44.100 KHz => (6372 KiB / 88.2 KiB/sec) = 72.24490 seconds

Yeah. Ultimately, all of the above underscores the kind of challenges that developers faced back in the day when writing sequenced music for the SCSP, and it all quite clearly explains why cross-platform games tended to sound worse on the Saturn compared to their PS1 counterparts. The SPU's ADPCM compression capabilities really were that much of a critical feature, and the SCSP's lack of them really was that much of a critical omission. It's arguably the chip's only real flaw with what is otherwise a very sophisticated and powerful system that's worth its praise, but it's also a very, very serious flaw with easily observable real-world results. Anyone who wants to write homebrew music for the SCSP will constantly have to keep that flaw in mind as they write, and understand that for all of the wonderful features that SCSP has for one to play with, the chip is in many ways much less practical than the simpler but still very powerful SPU (which is really quite the common refrain for pretty much the entire Saturn hardware compared to the PS1, isn't it?). There are things that you simply will not be able to do on SCSP that the SPU will be able to do quite easily. And so I think that anyone gaping at awe at the legendary SCSP with its supposedly superior sound chip needs to temper their expectations a bit, and understand that SCSP is not really inherently more powerful than the SPU and is really more so different from it, sometimes for good and sometimes for ill.

So after spending several paragraphs dunking on SEGA hardware on a SEGA fan forum, haha, why I am here again? Well, the other point to take away here is that just because the SCSP has weaknesses compared to its contemporaries that are often downplayed or unrecognized, doesn't mean that it isn't still an excellent piece of hardware that's more than worthy of having homebrew music developed for it.

Finally, trying to find ways around the SCSP's weaknesses has led me to appreciate the silent power and potential of its successor AICA in the Dreamcast. Again coming from the PlayStation scene trying to get sequenced music working there, comparing the PS2's SPU2 to AICA revealed to me that the latter is an essentially a supercharged version of SPU2, having 16 more channels while inheriting the superior DSP of its predecessor and adding hardware ADPCM support that said predecessor failed to include. In that sense, it is also arguably everything that the SCSP should've been, with the ADPCM support in particular being, by far, the single greatest improvement that alone quadruples the effective amount of sound RAM. And combined with an increase of the actual physical sound RAM from 512 KiB to 2 MiB, this means that AICA has a staggering sixteen times (!!!) the effective amount of sound RAM compared to the SCSP. Its sole regression compared to its predecessor is that FM is no longer possible, which is unfortunate but also no great loss compared to everything that's gained in exchange.

I've already mentioned some of the PS1 era's best sequenced OSTs like the aforementioned Rhapsody and Aitakute... such OSTs would represent less than a fourth of what sound chips like AICA are able to handle if we're measuring by how much RAM is available to them, let alone considering the additional features of the latter's DSP that those SPU soundtracks were never able to use. Likewise, I've heard someone say in regards to the SCSP with all of its features that, so long as you can manage its memory deficits, then there is pretty much nothing that you can't do on it; if that's true, then imagine what could be done on AICA! In fact, let's imagine for a moment, using some of the greatest sequenced OSTs on the PS2 and its SPU2 sound chip for comparison. There, you have the likes of Final Fantasy X and X-2, Final Fantasy XI and its expansions, Final Fantasy XII, Kingdom Hearts 1 & 2, the Atelier series (Lilie, Judie, Viorate, Iris), the Summon Knight series (3, 4, Granthese), the Infinity series (Never 7, Ever 7, Remember 11), the Super Robot Wars series (MX, Z, OGs), Tokimeki Memorial 3 ~At the place of our promise~, Namco x Capcom, and Berwick Saga representing some of the greatest pieces of sequenced music ever written for a video game console... and not a single track from any of those games exceeds a single megabyte of RAM compressed; they literally don't even represent half of what SPU2 is capable of, let alone AICA.

If there's nothing that we're not able to do on SCSP, imagine what's possible with AICA... that's what driving me here now to see if we can make homebrew music possible for these sound chips! But first, there's a lot of questions to ask and problems to be sorted out, and so I'm posting here hoping if some of you here can answer at least some of those questions for me. Thank you in advance for taking the time to read! Because, please be forewarned, there's a lot of stuff here, haha...

(continued on the next page...)
Last edited:
  • Like
Reactions: vbt
The SCSP and AICA's DSP: How does it work and what can you do with it?
From what I've read about it, the SCSP's DSP can execute up to 128 steps of instructions written for it, doing so in an exact order with no looping, conditions, or repeats. It's also apparently programmable to at least some extent, with effects running on there as "modules" that are essentially mini-programs that run on the DSP. Such modules, according to SEGA's documents, include effects such as "Reverb", "Echo (Delay)", "Chorus", "Flanger", "Filter", "Parametric EQ", and others. All of this seems to differ significantly from what I know about the PS1's SPU, which on the contrary appears to have only a reverb unit that's effectively a "fixed-function DSP" of sorts, in the sense that it's logic baked into the SPU itself that does nothing but calculate reverb coefficients and can't be programmed (easily) to do anything else.

Meanwhile, AICA supposedly inherits a similar if not identical DSP. I've heard some rumblings that there have been some changes made to AICA's DSP versus that in the SCSP, but I don't about any documentation that exists about what said changes actually are. In any case, I'm assuming that at least inherits the same effects that its SCSP equivalent does?

(Also, for the record and kinda-sorta related to the DSP's clock speed, AICA's clock speed is 22.5792 MHz just like the SCSP, and not 67 MHz or some variant of that number as widely reported on several sites including Sega Retro; that's actually the clock for AICA's SDRAM interface. I confirmed that (Dreamcast Hardware Specification Outline (Rev.0819), pg.26) because I was researching whether AICA's alleged increase in clock speed versus the SCSP could equal more DSP effects being playable at once on AICA, but again, said higher clocks were never a thing in the first place.)

"SCSP/DSP Effect Module Specifications" (ST-069-121693) provides a list of the DSP modules available as standard on the SCSP. The document describes said list as "tentative" with the values outlined within supposedly subject to change, but it also appears to be the latest reference released that I'm aware of.

Weirdly, the document gives the RAM usage for each DSP module in kilowords. That said, knowing that 1024 words equals one kiloword and that a word equals two bytes, it's an easy conversion to kibibytes; essentially, multiply the given kilowords by exactly two to get the amount in kibibytes. The DSP modules mentioned are listed below with said conversion and with the following legend:

DSP module name = Steps for main module (Steps for subset module) / RAM usage for main module (RAM usage for subset module)
= 34 (28) steps / 30 KiB
Early Reflection = 100 (60) steps / 26 KiB
Echo (Delay) = 20 (10) steps / 52 (26) KiB
Pitch Shifter = 64 (32) steps / 30 (4) KiB
Chorus = 22 steps / 2 KiB
Flanger = 20 steps / 4 KiB
Symphonic = 21 steps / 2 KiB
Surround = 23 steps / 30 KiB
Voice Cancel = 36 steps / 0 KiB
Auto Pan = 4 steps / 0 KiB
Phaser = 22 steps / 4 KiB
Distortion = 20 steps / 0 KiB
Filter = 5 steps / 0 KiB
Parametric EQ = 5 steps / 0 KiB

Perhaps unsurprisingly, those effects that generate reverb (such as Reverb, Early Reflection, and Echo) take up both the most steps and the most RAM to generate. Everything else consumes relatively little RAM, but does require a moderate but actually not terribly high amount of steps to generate. Not bad, especially compared to the PS1's SPU where the most intensive reverb unit settings could require as much as 96 KiB out of its own 512 KiB total RAM! That is, assuming that these numbers are even accurate, of course, given the again supposedly "tentative" nature of the document where they're sourced from. Certain parts of the same document are also rather confusing and vaguely defined as well, such as the concept of subsets. Also, it should be noted that the "Saturn Sound Tools Manual Supplement" published in late 1994 makes reference to two other DSP modules called "Dynamic Filter" and Mixer" (ST-198-R1-121594, pg.6). Aside from the latter having four different kinds of input settings available to it (2-, 3-, 4-, and 8-input) no other details about these modules are given in said document. And yet more mystery DSP modules include "Q Sound" and "Yamaha 3D Sound", both mentioned in the "Saturn Sound Driver Implementation Manual" published in early 1995 (ST-241-042795, pg.48). The latter is stated to not have been released yet; in this case, it's unknown if it was indeed ever released at all.

So based on everything that I've read in those documents and elsewhere, it appears that you can use any amount and combination of the DSP modules listed above — including more than one of the same module — on any of the SCSP's 32 hardware voices as long as the total number of "steps" spent executing them doesn't exceed 128 and the total number of modules used doesn't exceed 16, is this correct?

Some more questions: are these DSP programs above built into the SCSP itself (as in, any game running any sound driver can access them) or are they only available via certain SEGA-provided sound drivers? Also, would the programmable nature of the DSP mean that it's theoretically possible to write custom DSP programs that can do whatever you want, as long as it can be run within the DSP's 128 steps and without exceeding the available sound RAM? (See: ST-228-R1-030596, a "dAsms User Manual" where dAsms is apparently assembly for the DSP.) And how would any of the above be different, if at all, for AICA on the Dreamcast?

One final thing: I suspect that one of SCSP's DSP modules might've been used by developers to partially compensate for the need to use low-resolution samples to fit within the SCSP's relatively small 512 KiB of sound RAM, namely either "Filter" or "Parametric EQ". Ironically, the game that gave me that idea was Puyo Puyo DA!, a Dreamcast game that has streamed ADX files recorded at 22.05 KHz. Playing them raw gives you precisely the kind of muddy and low-resolution sound that you'd expect from a 22.05 KHz ADPCM file, but playing it on real Dreamcast hardware in the actual game engine results in a significantly more detailed sound that almost approaches CD quality (albeit with noticeable if not necessarily distracting aliasing). Could it indeed be either the DSP's "Filter" or "Parametric EQ" modules that's responsible for creating such a dramatic difference from seemingly nothing in that case? (On a side note, it's especially weird to be hearing such potential trickery used with AICA which certainly isn't lacking in capacity for high-resolution samples, but hey...)

SCSP and AICA panpot resolution (including versus PS1's SPU and PS2's SPU2)
According to the "Saturn SCSP User's Manual", the SCSP is able to pan voices up to 15 steps each for the left and right channels plus a center step for a total of 31 steps (ST-77-R2-052594, pg.12). On the surface, this seems rather crude compared to the seemingly much higher panpot resolution of the PS1's SPU and by extension the PS2's SPU2, both of which can pan voices up to 63 steps each for the left and right channels plus a center step for a total of 127 steps (a more than four-fold difference!). However, a supposedly extended variant of panning that utilizes the DSP is also mentioned in the SCSP user's manual, referred to as "process panning" (complete with a graphic — Figure 4.55 — on pg.82). Meanwhile, the only source that I was able to find for the panpot resolution of AICA is an old overview from 2002 of some of chip's registers that the author admits might be spotty in accuracy; nontheless; it suggests that AICA inherits the same panning system as SCSP, as in: 15 steps each for the left and right channel plus a center step for a total of 31 steps. Would anyone here happen to have any knowledge of this?

AICA's ARM CPU: what can it do, if anything?
The current authoritative resource on the capabilities of AICA's ARM CPU appears to be this venerable DCEmulation thread (which included such contributors as Rand Linden, one of the main developers of the Bleemcast! PS1 emulator). The main takeaway that I gained from there is that AICA's ARM CPU is a very, very slow piece of kit — much slower than you'd expect from its apparently deceptively high clock speed — and that it's not useful for much else other than running driver code.

Or is it...? For all that we do and don't know about this CPU, one thing that do know sure, by sheer inference, is that — at least for running driver code, anyway — that it's at least as powerful in practical terms as the 68K that ran the SCSP in the Saturn. After all, AICA is a direct superset of said SCSP! If AICA's ARM was actually less capable than its direct predecessor's 68K, that would be pretty pathetic and would make you wonder how the hell it's useful for anything at all, wouldn't it?

Also, there are a few benchmarks — of sorts — available for both SCSP's 68K and AICA's ARM, specifically when it comes to performing decompression of CRI Middleware's proprietary ADX ADPCM. For one, there's ponut64's ponèSound, which according to them allows the SCSP's 68K to play roughly 25,000 ADX samples per second. Furthermore, based on an estimate — but not test — made by them, the same code can play around 50,000 IMA ADPCM samples per second (which, if correct, underscores how much more processor-intensive decompressing ADX ADPCM is vs. IMA ADPCM). Meanwhile, the developer of another homebrew music playing project that I found, this time for AICA's ARM (one that I don't remember the source for and can't find right now, unfortunately) claimed that the number of ADX ADPCM samples playable per second with their code on said processor is higher than 44,100 (equivalent to a single 44.1 KHz PCM mono stream) but less than 88,200 (equivalent to a single 44.1 KHz PCM stereo stream), with the true number presumably being somewhere between the two. Such a result points to around a two-fold increase in power for the AICA's ARM vs. Saturn's SCSP, or for decoding ADX ADPCM specifically, at least. Finally, according to the developer of another ADX player for the Dreamcast called LibADX who published actual results from their code, playing a single stereo stream of ADX supposedly uses only one percent of the Dreamcast's main SH-4 CPU, while playing the same stream on the ARM CPU failed to do so in real-time, or in order words: it took up 100 percent of the CPU and technically exceeded that to obviously no avail. No sample rate was given for either test, but assuming that it was 44.1 KHz for both (in other words: 88,200 total ADX samples per second for the aforementioned stereo stream), then that would roughly track with the claims made by the previous developer and, irrespective of sample rate, would make AICA's ARM CPU more than 100 times slower than the main SH-4 CPU! Again for whatever specific type of calculations are needed to decode ADX on a general-purpose processor, anyway (and also keeping in mind that the ARM CPU apparently lacks cache which, according to the aforementioned Rand Linden from the DCEmulation thread, is a major reason why it's supposedly so pathetically slow in the first place).

Make of all of the above what you will, but what I'm getting at is that I'm wondering if there might be any use. For instance... what about FM? The lack of it on AICA has often been called a disappointment compared to its predecessor. Now, it could be argued that the massively increased effective memory available to AICA vs. the SCSP via just the addition of hardware ADPCM compression by itself — let alone the four-fold increase in actual physical RAM — makes FM completely unnecessary in any practical way in AICA in the first place, and that if you really wanted to have FM in AICA, you could easily just sample FM sounds from a synthesizer like the Yamaha DX7, or even from an FM-capable console like the Mega Drive with its famous YM2612 or, indeed, a Sega Saturn with its bespoke FM capabilities. But then, where's the fun in that? If we instead wanted to push the boundaries of what can be done on AICA and try to make the impossible possible, how much can its ARM CPU help us with that? Would it be possible for it to run a program that takes a waveform, pushes it through various FM algorithms, and then creates a new waveform that can then be manipulated by AICA with ADSR, DSP effects, and such, just like any other sample? And with such a program, would it be possible to approximate the functions of Yamaha FM chips such as the aforementioned YM2612 or the SCSP in FM mode, possibly even to the point where music written for those sound chips could run on them? Or could it even be possible to approximate more sophisticated Yamaha FM hardware like YM2608 (of PC-98 fame) or the also aforementioned DX7? And if it's not something that the ARM CPU can do, then what about perhaps the main SH-4 CPU instead...?

The possibility of expanding SCSP's RAM
With that said, the SCSP certainly has the ability to side-step its memory limitations in several potential ways, one of which is to simply read sound data from elsewhere. Things are never that simple, however, and there are immediate complications that come up when approaching such a solution. For instance, consider the possibility of just swapping sound data into the SCSP's RAM from the CD-ROM drive, whenever it's actually being. Such a trick is not a substitute for an actual increase in sound RAM and certainly isn't equivalent in either literal or practical terms to an unlimited data source - because, again, you'd have to constantly swap data - but clever timing could certainly provide an effectively higher amount of sound data that the SCSP can work with in total throughout a given sequenced song, if not necessarily at all once. However, the obvious problems there are the significant bandwidth and seek time limitations that come with using the CD-ROM drive excessively for such a purpose, not to mention the massive wear that this would have on the physical drive itself! And so it's no surprise that there's no commercial software — that anyone talks or knows about, at least — that uses the CD-ROM drive in such a manner, especially given that developers had far more important things to worry about like streaming software code and graphical elements from disc in the middle of gameplay instead.

Meanwhile, though, the Saturn has more than just a CD-ROM that it can grab data from; it also has a cartridge slot. Which, as we all know, had official RAM expansions made for it up to a significant 4 MiB (the maximum amount addressable, from what I've read). I understand that the SCSP has access to the cartridge slot via the SCU's "B-bus" that it shares with the two VDPs (whereas the "A-bus" is connected to the two SH-2s). Does the SCSP have direct access to the cartridge slot via this bus; as in: can it access it the same way that it can access its own 512 KiB of sound RAM, thus effectively expanding its memory capacity to a total of 4.5 MiB? Or, like with the CD-ROM drive, would data have to be swapped such that the SCSP can theoretically read as much data as it wants from the cartridge slot, but only actually use up to 512 KiB of said data at any given time? There are a number of possibilities that I've been thinking about if the former is true. For instance, if you were to take two Sega Saturns that each have a 4 MiB RAM cartridge attached to them and have the two consoles play in tandem, then such a setup would provide a total of 8.5 MiB for SCSP that would effectively equal — and actually even slightly surpass — the effective amount of sound RAM of the Dreamcast's AICA when its 4:1 ADPCM compression scheme is applied. (Which, relative to the SCSP with its lack of compression, would indeed roughly be 8 MiB.) Such a setup would effectively give you an AICA on Sega Saturn, except arguably even better than an actual AICA because then you would also have the SCSP's FM support, not to mention two DSPs to play with instead of just one (that is, a single DSP each per 32 channels) à la the PS2's SPU2 and its two reverb units (in its case, one each per 24 channels). Meanwhile, even a much smaller 2 MiB expansion cartridge attached to a single Sega Saturn would result in a total of 2560 KiB for SCSP that would surpass the effective amount of sound RAM of the PS1's SPU when its own 32:9 ADPCM compression scheme is applied! (Which, again relative to the SCSP and its lack of compression, which would roughly be 1820 KiB.) Meanwhile, even in a scenario where sound data must indeed be swapped from the cartridge slot, then if you were to again take two Sega Saturns that both have 4 MiB RAM expansions attached to them and have the two consoles play in tandem, then that would again provide a total of 8.5 MiB for SCSP. And in this case, one could theoretically simulate the Model 3's SCSP setup including its 64 total sound channels, dual 68K processors, dual DSPs, 1 MiB of total sound RAM, and — most critically — roughly half of its 16.5 MiB sample ROM.

However, one concern that I do have about all of this is the necessity using the SCU's B-bus to shuffle sound data around. Indeed, given the proportion between the SCSP's small pool of local 512 KiB of sound RAM and the as much as 4 MiB of cartridge RAM, it's very likely — and almost certainly if one is trying to push SCSP to its absolute limits — that the B-bus is going to have sound data shuffled across it almost literally all of the time. Which means that if the SCU has to shuttle VDP data to another part of the system or stream graphical data from the CD-ROM drive (or in the case of homebrew, wherever the game data is coming from, like perhaps the Video CD slot), then that could create the potential for bus contention, wouldn't it? Now, this might not necessarily be a problem in an environment where the system as a whole is doing nothing but playing music, such as if it's being played in a sequence player with a relatively simple UI that demands little if anything from the VDPs. However, if said music is being played at the same time that a graphically-intensive game is running, then that problem potentially becomes relevant again...

(continued on the next page...)
  • Like
Reactions: vbt
Mono samples vs. the possibility (or usefulness?) of stereo samples
In the world of sample-based synthesizers, there's often a distinction made between mono samples and stereo samples, with the general assumption typically being made that the latter are of higher quality (one example of that being when trying to simulate a realistic grand acoustic piano sound, for instance). However, the total polyphony of synthesizers is generally calculated based on mono samples, which means that for a synthesizer that has, say, 64-voice polyphony, using only stereo samples means that the synthesizer now effectively only has 32-voice polyphony, as two voices are taken to play a single sample. Now, I'm almost certain that the sample-based sound chips in console like Saturn and the Dreamcast play only mono samples, unless I'm mistaken there. With that said, would it be theoretically possible to make stereo samples work there anyway, and even if it is possible, would it even be worth it? Now, as for whether it's possible or not, my hypothesis is that is could be done, maybe, but probably not using the official file formats which almost certainly had no functions for such a thing given that, again, the sound chips themselves almost certainly don't have that feature. That said, one feature that those sound chips most certainly do have is panning, including — of course — hard panning on either the left or right channel even as the actual sample being played is mono. So then, would it not be possible to just use two different mono samples playing simultaneously hard-panned to the left and right channels respectively — both associated with the same instrument — and use that to represent a single stereo sample? That said, in order to make that possible, we would almost certainly have to go the route of creating a new sequence format to accommodate such a relatively arcane function rather than existing formats that, again, almost certainly don't even have any notion of such a concept as stereo samples. Which, actually, leads me to the next section...

Reverse-engineering the sequence, wave, and instrument definition formats (SEQ/TON, MSB/MPB) for SCSP and AICA
There are three "official" file formats for the Saturn and Dreamcast handling sequenced music that I'm aware of. First and for the former, there's SEQ/TON, which I presume is roughly equivalent to the PS1's SEQ/VAB in that SEQ is a relatively small MIDI-like series of note events and such while TON is a relatively large collection of raw sample files and instrument definitions that the SEQ file references. Second and for the latter, there's MSB/MPB which I presume is basically the Dreamcast equivalent of SEQ/TON, which I've personally found in Puyo Puyo DA! and in the "Kuuzokuban" trial version of Skies of Arcadia. Finally and also for the latter, there's MLT, an apparently very versatile container format that uses a bank system to potentially hold several different files at once including — among other things — MIDI-like sequence data, and which I've personally found in the final Skies of Arcadia release. Also based on my experience with that game and the fact that its MLT files are waaaaaaay too small to hold a full sample-based sequenced song of any meaningful complexity (as in not even triple-digit kilobytes, whereas the far less sophisticated Puyo Puyo DA!'s music regularly hovers around the 500 KiB range for its MSB/MPB files), it also appears that MLT files reference one or more external files for at least raw sample data if not instrument definitions as well, or at the very least that it can reference external files for such purposes, if not necessarily must.

Naturally, in order to create homebrew music on SCSP or AICA and make it possible to share it with people, said music has to be stored on something. And so, there are a few ways to accomplish that. The first option would be to reverse-engineer one or more of the above "official" sequence formats and use those as our deliverables for homebrew music. The closest thing that I could find for documentation of Saturn's SEQ format is a MIDI conversion program by CyberWarriorX named seq2mid (currently maintained by Misty De Méo on GitHub here). Meanwhile, TON has an official overview for it in the form of "Tone Editor User's Manual Addendum: File Formats" (ST-235-030795), and some of kingshriek's old Python conversion scripts such as toncnv might also be of use. However, the Dreamcast's MSB/MPB file formats appear to be completely unexplored and undocumented, while the closest thing that I could find for documentation of the MLT file format is how to convert it to DSF via another kingshriek Python script named dsfmake. The source code comments for dsfmake make frequent references to MSB and MPB — among other formats like MDB ("MIDI drum bank") and OSB ("one-shot bank") — and seems to suggest that MLT is not actually a sequence format itself but rather truly just a container format to hold the actual standard sequence formats for the Dreamcast which is indeed MSB/MPB, as well as other essential files for playback like DSP program files. Or rather, it can hold the latter files if one so chooses (and incidentally, the comments say that one indeed should move all of said requisite files into an MLT file before running them through dsfmake). In any case, perhaps one way to approach is reverse-engineering these formats is compared. Which I suggest as part of my experience with the SEQ and SQ formats of the PS1 and PS2 respectively, which according to people familiar with both are extremely close — if not practically identical save for a few minor changes — to SMF MIDI format 0. While I have no idea if either SEQ or MSB or anywhere near as close to standardized MIDI as Sony's sequenced formats apparently are, it should be noted that an official MIDI converter specification exists for SEQ on Saturn (ST-066-121593). And so with that said, one question that I'd like to ask is this: how close are Saturn SEQ and Dreamcast MSB to SMF MIDI format 0, and how trivial would it be to reverse-engineer them such that arbitrary data can be written to them using a sequence editor or tracker?

Alternatively, a second if rather potentially insane option exists. While I admit to knowing relatively little about how MIDI works, I've read that it was meant to be an expandable and adaptable format given its need to be compatible with many different synthesizers with their own unique capabilities. So addition to your standard MIDI events, I believe - anyone here familiar with MIDI, please correct me if I'm wrong! - that there are ways to define custom ones in order to cover everything that different pieces of hardware compatible with MIDI has to offer. If this is true, then I wonder: could we simply treat the SCSP and AICA as such different synthesizers, create custom events for them in MIDI (like, for instance, activating DSP modules?), and then just throw the resultant MIDI along with the requisite sample data to a sound driver for it to play back on SCSP and AICA directly? I mean, a part of me is certain that if it were that easy, then SEGA would've done the same thing for themselves from the beginning instead of creating bespoke formats like SEQ and MSB. But another part of me is wondering if it really is that simple... or at least realistically possible. Honestly, the only reason why I'm bringing up this possibility versus just reverse-engineering SEQ/TON and MSB/MPB is that I'm thinking about how universal MIDI is as a format compared to ancient, custom file formats designed to work with only a single piece of hardware as deliverables converted from more familiar formats like MIDI, rather than being dealt with directly as if they actually were MIDI. That is to say, practically every single piece of music software in the world recognizes MIDI, while absolutely nothing knows that SEQ or MPB or any other console sound chip format even exists let alone how to handle them. In addition, I've been thinking about the prospects of compatibility across multiple sound chips. For instance, if I composed a piece of sequenced music for the Saturn, what if I wanted to create a version. If I created said song directly in SEQ/TON, then it's stuck in that bespoke format unless I convert it to PS1's equivalent in SEQ/VAB... which means that a converter needs to be written and tested and such (including porting all of the instrument data and converting PCM to Sony's custom ADPCM format and re-doing any DSP effects among many other things), and then that's all repeated for every other system that you want your music to play on with their own unique ways of doing things. Whereas if MIDI is used for everything instead, at least the most basic things like key on and offs, pitch bend values, velocity, etc. are all standard across any sound chip that supports reading MIDI, with only things unique to certain sound chips needing to be changed. With that said, I'd imagine that the success of such an approach depends on how just close SCSP and AICA's inner systems track with how MIDI was designed, right? Which is most definitely no guarantee, and perhaps explains why bespoke formats exist for console sound chips in the first place, so please forgive my ramblings, haha. (And of course, it goes without saying that even of all of that was possible, it would all require a completely custom driver, as obviously the official SEGA sound drivers don't work anything like the above theoretical way.)

As such, perhaps the simplest solution to that last prospect especially is a third and final option, which would be to just, well, use MIDI, from the very beginning of the composition process and then convert the final result to each console's bespoke file formats from there, just like the composers attached to actual commercial software projects did back in the day. To not even worry about reverse-engineering formats like SEQ/TON and MSB/MPB at all as something to actually be edited themselves as if they were MIDI, but rather as a means to figure out how to convert a standard format like MIDI to them as accurately and painlessly as possible. Perhaps, anyway. What does everyone here think would be the best option out of the three here?

(continued on the next page...)
  • Like
Reactions: vbt
Choosing a deliverable format for homebrew SCSP and AICA music: SSF or DSF or...?
If you're familiar with how most people listen to Saturn and Dreamcast music outside of real hardware or OSTs of the in-game music that are almost certainly recorded from real hardware, then you're probably aware of the homebrew-created SSF and DSF file formats — themselves offshoots of the original PSF format for PS1 music — that are widely used to store and play back said music ripped from the original games. You might be thinking, then: hey, why not just use that format to distribute our homebrew Saturn and Dreamcast music on? It makes sense to use such common and well-known formats like SSF and DSF, right?

To put it simply, I don't think that would be a very good idea. For many reasons, but mainly because the state of both Saturn and Dreamcast emulation has, up until very recently, been rather dire. And in particular, the state of the programs that people use to actually listen to SSF and DSF files has been especially so in a way that, unlike general-use emulators, has yet to really be rectified as of late. Or rather, program, singular, at this point, said program being Highly Theoretical. Other programs that can play SSF and DSF files exist, yes, but aside from Mednafen for the former — which I'll be mentioning many times again later — Highly Theoretical is the only one that's been updated at anything resembling a recent date (2020/09/02 for version 2.0.53, as of this time of writing).

Now, I've had many encounters with Highly Theoretical going back to when I was creating videos for (now-defunct) my video game music YouTube channel, and the first thing that struck me when comparing it to real Saturn hardware was just how awfully off it was on practically every level. The track that first showed me that using the SSF rip was not going to be helpful for my goal of having the most hardware-accurate audio possible on my videos was "Twin Seeds (Growing Seeds)" from NiGHTS Into Dreams..., namely the intro section where entire sections of notes failed to decay correctly. Meanwhile, "Peaceful Moment" (the stage clear theme infamous in certain circles for supposedly being an example of FM on the Saturn, rightly or wrongly) is a night-and-day difference on Highly Theoretical where the obvious DSP usage is implemented incorrectly with a heavier and somewhat "crunchier" sound instead on the offical 2008 OST CD, one that's beyond what can be explained by possible mastering differences on the latter. For its part, Mednafen did a much better job with both songs using the same SSF files, although even then it wasn't quite perfect. But at least it was usable! Indeed, I ended up using the then-latest Mednafen to record the individual "mood" variations of the stage tracks that weren't available on the OST and wasn't accessible to me via the sound test at the time (I believe I would've had to have completed the game first), which, for the record, all demonstrated fewer glaring issues on Highly Theoretical but nonetheless did show smaller issues like sound clipping and instrument volume balance issues that didn't come up elsewhere. In general, I can say that I trust Mednafen to provide at least a decent facsimile of what Saturn music is supposed to sound like; certainly better than what Highly Theoretical appears to be currently capable of. If you would like to confirm these observations for yourself, it's easy enough to track down the SSF and play it back on the latest versions of both Highly Theoretical and Mednafen alongside the OST; I'm confident that you'll come away with similar conclusions.

As for how well Highly Theoretical handles Dreamcast music, you can probably guess if that it can't emulate the SCSP properly, then it certainly can't be trusted to handle anything from AICA. You'd be correct. Skies of Arcadia's DSF rip confirmed that for me, in this case in direct comparison with recordings that I myself made from real Dreamcast hardware using the PAL prototype's sound test. Like with Nights Into Dreams..., you're more than able to make comparisons yourself with the current version of Highly Theoretical against even Skies of Arcadia's flawed (incomplete and missing tracks, weird mastering, and such) but still hardware-accurate OST CD release. To start, "Sky Pirate Hideout" has a brighter sound on Highly Theoretical versus real hardware that appears to be slightly filtered in comparison, while the wind sounds that appear in and out in the background on the latter are barely audible on the former. "Kingdom of Ixa'taka" lacks the bass response of real hardware on Highly Theoretical, even when both are adjusted for volume. "Sudden Storm" seems to have instrument volume balance issues on Highly Theoretical that don't exist on real hardware, causing some parts in the background to cut through the mix more than they should. "Uninhabited Island" sounds noticeably flat and thin on Highly Theoretical with the background strings especially, apparently due to it not rendering the reverb properly where it sounds more abundant and enveloping on real hardware. "Theme of Loneliness" is a track that makes heavy use of reverb and what appears to be low-pass filtering to create a kind of lo-fi concert atmosphere for its sole piano accompaniment; said effects are almost completely missing on Highly Theoretical (save for the tiniest bit of reverb remaining). "Theme of Fina" surprisingly doesn't have the same problem with its own reverb-heavy piano solo featured in its opening seconds, but it falters elsewhere with a lack of bass response and weak, quiet-sounding background strings compared to real hardware. And finally, "The Dark Rift" — a track not featured on the OST — is significantly lacking in reverb on Highly Theoretical as can be heard from the very beginning of the track, an omission that makes itself especially known in a synth-heavy section beginning around the two-and-a-half-minute mark that's utterly awash with reverb — as well as what appears to be a filter of some sort — on real hardware to ominous effect. (There are more examples than this, but these are the most glaring ones that I was able to find in just a few minutes of listening; I'll spare you all any more.)

So, I've obviously made it clear that I don't consider Highly Theoretical a great choice for emulating either SCSP or AICA. But again, it's currently the only real choice for playing the existing SSF and DSF formats on. I find this problematic, both in general and in the context of composing homebrew SCSP and AICA music especially; what's the point of doing so when there's no guarantee that it's even going to play correctly on the players that everyone uses? This doesn't necessarily mean that SSF and DSF formats themselves are at fault here or are useless — even if I would argue that those, and the xSF family of file formats themselves, are ripe for improvements, especially in the metadata department, but I digress — but without either a suitable update to Highly Theoretical or a worthy competitor to it, they're largely dead-end formats, I think. With that said, while how to handle the Dreamcast in this scenario remains an open question, things aren't completely hopeless on the Saturn side of things as there is still Mednafen with its ssfplay extension that does exactly what you'd expect from the name: play SSF files using Mednafen. Which, as I alluded to above, tends to result in a much closer representation of what the SCSP is supposed to sound like compared to most of its peers.

Plus, of course, you could just choose to play everything on real hardware instead. And which I'd argue you should do anyway, especially if you're a composer! Which leads nicely into...

Actually playing homebrew SCSP and AICA music... on real hardware
On a conceptual level, this is very simple! Just create a sequence player that runs as homebrew on either a Saturn or Dreamcast, including on real hardware where music can be played there with 100 percent accuracy.

But then, of course, there are several complications that very quickly come up with that, namely actually acquiring real Saturn and Dreamcast hardware in the first place, and then figuring out how to get homebrew running on them. While neither the Saturn nor the Dreamcast enjoyed the level of sheer market saturation as, say, the PlayStation 2, both appear to be fairly easy to find in working order on the used market nonetheless. However, also unlike the PS2, there isn't, to my knowledge, any easy way to run homebrew on either console (a la the PS2's FreeMCBoot) without having to shell out a significant amount of money for an ODE (optical disc emulator). That's more of an accessibility problem than a technical problem, of course, but it is a problem nonetheless for what will likely be a large subset of people who simply cannot afford or justify the cost of an ODE. Thus, if there's any way to run relatively small programs on either a Saturn or a Dreamcast another way, I'd be very happy to hear your suggestions!

Meanwhile, once you do find a way to get homebrew running — however that may be — a potentially big problem still remains, or at least if you care about sound quality there is. That problem being the noisy analog output of both the Saturn and the Dreamcast, both of which I can personally attest to (it's not good). Compare that to the PS2, where getting perfect sound quality is an utter non-issue thanks to the fact that every single PS2 ever made has a built-in S/PDIF digital out port. Of course, both the Saturn and the Dreamcast can be modded for S/PDIF out (or even HDMI out with the latter; I've personally used this with a S/PDIF splitter to create digital hardware recordings of Skies of Arcadia and Puyo Puyo DA! with excellent results), but said mods are neither easy to do nor cheap to have someone do for you. For most people, this is not a practical option. Unfortunately, that means that for most people noisy analog audio out is the only option. Which might make some pine for emulation again, but then the caveats of that road have already been outlined in detail.

In any case, I'd imagine that the sequence player itself would require the most actual work on the programming side of things. You could theoretically do lots of cool things with it: allow users to adjust or turn off DSP programs, create custom playlists from multiple sequences, mute individual instruments... the possibilities are endless! But of course, I'd imagine that just the basics would be difficult enough... maybe? Would we play sequences from the SEQ/TON or MSB/MPB files themselves with related driver files elsewhere or would the program basically be an SSF and/or DSF hardware player? Do either formats even allow for them to be run on real hardware? (I know that PSF does, as it was literally designed from the very beginning to do that, and 2SF does too, but other xSF formats — such as PSF2 especially with its many hacks to get things running sans the main EE CPU — are much more suspect, I think.) And if not, would perhaps a new xSF-like format need to be created, built from its very foundation to run on real hardware and specifically catered for homebrew music?

I think that I'll stop here for now — finally, haha! — and see where any potential conversation goes from here. Again, thank you for reading and especially if you've gotten this far! Let's see what we do.

Additional resources that might prove useful:
  • Like
Reactions: vbt
I have code that plays VGM, MDX, and MOD on SCSP. VGM is mainly just a curiosity, and generally not practical for games because of how large the files are. MDX uses all 32 operators for FM so it is also not ideal for games. MOD is fine for games although it has some limitations.

Some XM files can be played by converting to MOD. This is what I did for SHMUP SALAD

If there was interest, maybe the code could be expanded to support XMs natively.