Wizardry VI & VII Complete - Translation project

While I had already announced this on the Discord server some days ago, I've since reached an important milestone and so I feel this project now deserves its own thread.

I'll be translating VI first by using the official English script found in the original DOS version. Then, depending on how this goes, I'll decide if I want to tackle VII's script too.

First off, this was made possible by the findings made by Gertius for their Wizardry VII PSX translation patch, the source code of which can be found here (I'll publish mine too once I have a patch ready to be released). As the Saturn version employs the same Huffman-encoded text file formats as the DOS and PSX versions, I was able to roll my own Huffman decoder/encoder to reencode the English script with my edits.

Here's a quick rundown of the relevant files:
  • msg6j.dbs -> contains the majority of the game's text (dialogues, descriptions, etc.) encoded with Huffman. The actual decompressed text is Shift-JIS encoded
  • msg6j.hdr -> contains headers referencing the strings in msg6j.dbs by so-called indices (integer numbers)
  • misc6j.hdr -> Huffman table used to encode/decode
  • msgj.dbs, msgj.hdr, miscj.hdr -> Wizardry VII equivalents of the above
  • wiz6.bin -> contains both code and hardcoded strings (in plaintext, fortunately) specific to Wizardry VI (enemies, items, spells, combat log, UI, class information, etc.)
  • wizardry.bin -> more code and strings, shared by both games (character creation)
  • sl.bin -> from what I've gathered, it's a Wizardry VII file analogous to wiz6.bin, but I haven't looked much into it
  • font_1.dbb -> font tiles used for UI and menus only. Does not contain any kanji, only ASCII chars and kanas
  • kanji12.fon -> this one's obvious, but it also contains latin letters used for dialogues. It's encoded in a proprietary format that I still haven't been able to reverse engineer. The only thing I know is that it uses '\' and '/' as separators between groups of glyphs
Now, you'd think that the devs would place all or at least most of the strings used by both games in a common file, but instead they chose to duplicate them and then scatter them throughout the binaries that I mentioned above. Nice.

So, I've mentioned that the Huffman-compressed files of the Saturn port follow the same exact format as their DOS equivalents, so why not try and brutally patch the original English files in, just like that?
If only it were that simple:

Screenshot_20250701_014120.png


This is supposed to be the very first dialogue that plays at the start of the game. Pressing C like 20-30 times will exhaust all invisible text and resume the game. For reference, the actual EN dialogue is made of 6 lines each filling the entire box, which means 6 presses should have been enough.
The reason for this behavior is to be found in the code of the print function. Using Ghidra to decompile it, I've found that almost all ASCII characters are outright ignored, whereas some few special symbols are used for formatting, such as whitespace triggering a line break.
To tell you the truth, I only discovered this about 3 days ago. When I first started this project, I chose the lazy route and actively avoided decompiling the game. I've since changed my mind, but only after trying everything that I could and finally hitting an insurmountable wall.

When I was presented with that blank dialogue box, I simply thought ASCII characters where being ignored or something, so I tried converting all characters to their full-width Shift-JIS equivalents and compressed the text using the JP Huffman table. This is what I got:

Screenshot_20250628_022404.png


This garbled phrase is supposed to appear much later on in the game. I found out by cutting like 80% of the text that this was due to the size of the newly encoded DBS file (166 KB), which was almost double that of the original (84 KB).
I tried to optimize the Huffman table: the one used by the Saturn version understandably encodes latin letters to very long bit sequences, so I grabbed the DOS one and replaced all ASCII characters with their SJIS equivalents, while also associating byte '0x82' to the shortest possible encoding. This strategy saved me about 50 KB and let the game show the actual starting dialogue:

Screenshot_20250628_024553.png


(Btw, the text is all uppercase because that's just how the DOS script is.)

However, the file still being too large meant that, eventually, we'd hit a point in the game where we're again shown the wrong text. For instance, the game showed this instead of "You detect something":

Screenshot_20250628_024839.png


By comparing the EN and JP scripts, I discovered that a *lot* of NPC dialogues were either shortened or cut out entirely. I suspect this was done because, contrary to the DOS version where you ask NPCs specific information by typing relevant topical words, the Saturn version will simply make NPCs say all that they have to say as some kind of weird monologue. Also, as I mentioned before, many strings have been hardcoded into the binaries, specifically, everything with an index smaller than 8179 (save for a couple of strings). With this knowledge, I deleted all unused strings and also many control characters that made sense in the DOS version, but which the Saturn one either ignores or interprets in a different way. With this, I managed to shrink the text file to about 88 KB.

With 4 KB left to shave off and me being short on ideas, I embarked on the excruciatingly boring task of condensing much of the game's flavor text that does not contain any relevant details.
As an example of what I mean, let's look at this string:

Screenshot_20250701_015212.png


... which I condensed to this:

Screenshot_20250630_143021.png


(Don't mind the missing spaces, that's due to how I handle line breaks.)

I recognize that my version sounds drier, but I figured this would be the best approach to save on valuable space without omitting important information. I've also decided not to touch any lines spoken by NPCs, as I know I'm not that good of a writer to properly replicate their style and tone.
And so, after shortening many, *many* strings that I deemed were too verbose, I finally got that file to under 84 KB. Mission accomplished?

I then wondered if the same thing could be feasibly done to the script of Wizardry VII. The original JP encoded file is 224 KB. So I grabbed the English script, deleted all strings with index < 8000, reencoded the text to SJIS using my optimized Huffman table and managed to obtain a humongous 300 KB file. If cutting all that text from VI's script just to save on 4 KB was painful, well, cutting 66 KB from VII's script would be absolute torture.

With this approach clearly not being good enough anymore, I made a dump of the high working RAM while a dialogue box was open, analyzed it with Ghidra and found where the print function was stored.
Here's a very small excerpt of decompiled code:

Screenshot_20250630_145419.png


Basically, relative to the current 16-bit character in the string that the function needs to print, it takes both its high and low bytes and passes them to another function (which I'm going to refer to as "classifier") that returns a number between 0 and 11 defining how that character should be handled. If that number is 0, that means the character is printable, otherwise it could be a control character, a null terminator character or even a printable character that has to be replaced with another (e.g. three dots will be replaced by an ellipsis).
If the classifier receives a pair of ASCII characters as input, it may either decide that they must be ignored (returning 11) or, if one of them is a control character, it returns the appropriate number. Unfortunately, we cannot simply edit this function to make it return 0 with either SJIS or ASCII chars: if a character is deemed printable, the print function will *always* assume it's a 16-bit character and pass it to another function that renders its glyph, then increases the current character index by 2 to parse the next pair of bytes.

Thus, I chose to remove some of the code handling certain control characters (the one that handles ellipsis, one that replaces the current character with a dot, and one that handles the control character '@', which works in a rather weird way and I've never managed to use it properly) to make room for the new code, and then made it so that, before the classifier is called, ASCII letters are converted to SJIS; at the same time, I made sure that the current character index is increased by 1 instead of 2. Otherwise, if the character is already SJIS or it's not an ASCII letter, then the print function handles it like it did previously.

Why am I ignoring numbers and symbols? Well, not all numbers should be subjected to this conversion at runtime, as the game requires to interpret certain strings with ASCII numbers in a very specific way to handle player choices in dialogues, a relic from the DOS version. If a sequence of numbers is actually supposed to be shown to the player, then I simply encode it to SJIS beforehand, and do the same for punctuation marks since there's not much space left to handle them in the print function (well, there is enough to handle dots and commas at least, but it'll be for another time).

With these edits and even without the shortened strings that I had worked on previously, I managed to bring VI's text file down to about 71 KB! I guess all that rewriting work has been for nothing lmao.
As for VII, its text file is now 237 KB! Still 10 KB too large, but I haven't yet compared it to the JP script for cut lines. I can definitely work with this.

While I was at it, I found a way to reduce the absolutely excessive letter spacing, which in turn also increased the maximum number of characters per line from 13 to 17! I think there's potential to add support for VWF, but I'm not sure I wanna deal with that.
Look at this! It's... kinda readable now, at least:

Screenshot_20250630_190347.png


The real issue here though is the atrocious font. Since the relevant glyphs are stored in kanji12.fon, I have no way of editing them at this time. I can, however, make it so that the text gets proper capitalization, as I've seen how the lowercase font looks and it's not too bad.

But then, it happened: while I was messing around in the game and reached the first NPC, I discovered the most maddening bug that would only occur when some (not all) text is encoded to ASCII instead of SJIS. In essence, when entering the room where that NPC is located, I would normally be met with his dialogue located at index 12000, and then, when pressing the "TALK" button, he would start reciting his monologue, which the above screenshot is part of. If the text involving these dialogues is ASCII-encoded instead of SJIS, then the first line that he speaks when entering the room is actually the one at index 12002, and when pressing "TALK", the dialog box does not even show up. I naively tried to put all the text at index 12000 into index 12002 and then also into 19998 and 12001, but then other completely different strings would be shown.

Later, something occurred to me: the line at index 12002, the only one that was shown, began with a SJIS-encoded exclamation mark. That was left over by my encoder because I messed up the code used to trim trailing characters, which in turn made that string the only one containing a SJIS character. With this detail in mind, I made the encoder replace the first character of each string belonging to that NPC to its SJIS equivalent, and that actually fixed the issue!

What did ASCII ever do to Data East to deserve this much hate and contempt?

As for hardcoded strings, there's not much that's worth talking about that you haven't already seen in other translation projects. Having added ASCII support has allowed me to use ASCII characters almost everywhere though, so instances where I have to mess around with pointer tables to carve out extra space are actually pretty rare now. I say 'almost' because some specific strings, like the one that shows up when an encounter begins, use YET ANOTHER print function that shows filled rectangles instead of ASCII characters. However, since such strings are not too frequent, I don't really care to investigate for now.

I guess I can showcase how the combat UI and log look like:

Screenshot_20250701_004152.png
Screenshot_20250701_004103.png
Screenshot_20250701_004245.png


Player choices:

Screenshot_20250701_005029.png


And that's it! I've skimped over many details in an effort to make this post as brief as I could. I will release a patch once I've at least managed to translate the whole of Wizardry VI.

If there are any new notable developments, I'll make sure to post about them.
 
Last edited:
Back
Top