Translating YU-NO - The Girl that Chants Love at the Edge of the World

I decided to move further discussion and progress information into a new thread after announcing my efforts on porting the existing fan translation for the Windows version of Yu-No here.

I finally managed to inject text into the game. This is still just for testing, so ignore the full-width font, remaining Japanese characters and wrong line breaks here. This will all be addressed.

YU-NO_Message.png



YU-NO_Option.png


Getting here proved to be a lot more work than I originally thought because in my last post I thought I had understood the structure of the scenario files. Turns out there was more to it.

A scenario file for the Saturn version of Yu-No consists of five parts:
  • A 16-byte header containing a magic byte, the number of characters in the character/string dictionary (discussed below) and the offsets of the next four parts. Padded with 0s.
  • The script, which is a concatenated sequence of byte commands. Part of these are message display and menu option commands. We'll come back to these later.
  • A table of two-byte section or entry offsets into the script. The offsets are from the beginning of the script. This also imposes a soft length limit on the script: If it is longer than 64 KiB, any offset will wrap.
  • A table of two-byte string offsets. These strings form a dictionary from which the messages displayed in the current scenario are formed. As there are quite some repeated characters, this allows for some reduction in size of the whole, uncompressed scenario file. As with the section offsets, the format of these offsets impose a soft 64 KiB limit on the character/string dictionary.
  • A character/string dictionary for the messages. The first entries are single characters followed by 0-terminated strings. The very first string after the characters is pointed to by the first offset in the table before. All characters and strings consist strictly of two-byte, SHIFT-JIS-encoded characters.
As I wrote in my last post, messages in Yu-No are byte sequences pointing into the character/string dictionary. Bytes between 0x01 and 0xFA are direct indices, bytes 0xFB, 0xFC and 0xFD add 250, 500 and 750 respectively, so they sort of act as bank switches. Byte 0x00 terminates a message byte string. 0xFE and 0xFF never occur in the game, but I guess that they function similar to codes 0xFB - 0xFD.

Menu option are much easier, thankfully. They each are just a SHIFT-JIS, 0-terminated string 🙂

So what does this all entail? To inject custom text, one has to first extract the message and menu option commands from the script without misinterpreting any of the other commands. I managed to write a parser that does exactly that.

The much harder part is to compose a scenario file:

First, all new messages and menu options have to be encoded to bytes. This must respect custom control characters such as end-of-message, newline etc. There are quite a few of these, and I am still not sure what most of them do. However, most of them occur as the last character in a message, so it will be enough to just put them there in the translated text as well. This step is also where the half-width-font-inside-full-width-font magic will happen that I hinted at in my last post.

Then, from all messages to inject, a new character/string dictionary has to be generated. I skipped this step as of now, encoding each message as its own dictionary entry. This unfortunately results in later messages being garbled, probably due to some memory overflow or the like. So this must be implemented later on.

From the original script bytecode and list of section offsets, the script must be rebuilt while inserting the new message and menu option bytes, all while keeping track of how the section offsets shift by doing so.

Then, the section offset table must be built, and after that, the string offset table and the character/string dictionary.

Finally, from all the lengths of the individual parts, the header can be generated and with this, the whole file can be put together.

The file then of course has to be compressed with Yu-No's custom LZSS compression and then embedded in SENARIO.ABL, the main scenario archive. As of now, I simply overwrite the existing file at the correct offset without any regard for its length, but in the end, I will have to implement actually rebuilding SENARIO.ABL, or any ABL file for that matter.

Rebuilding ABL files is also important for GRAPH.ABL, which holds nearly all graphics of the game. Some of the translated images I created are larger than their original counterparts. However, the encoder I wrote is a little bit more efficient than what the game's developers used. Reencoding all images in the game frees up around 1.5 MiB of space, more than enough to fit all edited images.

There is still a long way to go, but as of now I think that there are no more potential roadblocks to finish this translation. So it is just a matter of time and effort 🙂
 
Last edited:
It has been a month since my last update, and rereading it I felt it was more a stream-of-conciousness dump rather than a well-structured, informative post on the whole project. So let me give an introduction on the general file structure and types of YU-NO, and how I approached them. This is a long post since there is a LOT to cover. I am sorry if this post gets a little bit rambly but most of the time my method was really nothing more than trial and error while writing a lot of Python code to try things out and noting done any information on the file formats I was able to gather. Much like bashing your head against a wall until you're through, but without knowing if it is even possible 😉

File structure​

As with all Saturn games, YU-NO comes on regular CDs mastered in mixed-mode containing an ISO Mode 1 file system and audio data. This is just a fancy way of saying that it behaves like a mix of a regular CD-ROM and a regular Audio CD. As such, it can just be inserted into your PC's CD drive to take a look at the data stored on it. Likewise, with an image of the CD, one would use a virtual drive. Software like SSP or AnyBurn allow extracting files from CD images without resorting to a virtual drive.

Anyway, in YU-NO's case, there aren't even subfolders, just a flat folder containg four types of file here: ABL, BIN, PCM and TXT. The TXT files are CD authoring information files and are not of interest. You will find similar named files on any Saturn game disc. Likewise, CDDA.PCM or a similarly named file seems to be present on any Saturn game disc. While I do not know what exactly it does, one can infer from its name that it most likely contains some redbook CD audio or the like. This file is not important as well.

Archives: ABLINK​

This brings us to the lonely BIN and all of the ABL files. Files similarly named to 0.BIN are found on all Saturn game discs. This file contains the data that is loaded on game launch. As such, it mainly contains byte code, but may also hide data! Setting this file aside for now, the ABL files are what is really interesting. From their names one can gather that they most likely contain the game's resources. So, if one manages to decode their format, this brings us one step closer to accessing the data we need to edit for a translation.

I began by researching if ABL is a format that is known, like TXT is known to contain plain text or BIN to contain binary data. Believe it or not, I not only found some old posts of Spazzery, but also a website from the late 90s/early 2000s from a Japanese hacker calling themself sage. On their website, there is an article on YU-NO and how they decoded the image format. Unfortunately, there is no further information on how ABL files store their data. But if it was possible to decode YU-NO's images in the late 90s, it must be possible now!

Hence I decided to try it myself. Opening e.g. GRAPH.ABL in a HEX editor (I like the one in VSCode), one can find the following data at the file's start:

graph_abl1.png


So plain ASCII "ABLINK" and some 12-byte packs with plain-ASCII strings inside of them. This looks suspiciously like a header followed by table entries - most likely for the data packaged in the ABL file. Perusing GRAPH.ABL and others, I repeatedly stumbled over lines in the HEX editor that looked like this:

graph_abl2.png


Plain-ASCII "ABZ" or sometimes "ABS" followed by another three ASCII-encoded letters. Curious - sage mentioned ABZ and ABS were file formats used in YU-NO. So these may indicate where a file begins. Relating the locations of these lines to the table entries mentioned above revealed that some bytes in the table entries give the start of the ABS and ABZ files within the ABL file in multiples of 2048-byte sectors. Why 2048? I don't know, but this is the size of a CD sector, so it at least makes sense in that way. This also means that the ABL format itself is only a way to pack files, not to compress them (like for example ZIP is capable of). This is very important information, since at this point I knew that it was possible to get at these files without having to reverse-engineer some kind of encoding or compression algorithm!

Continuing on that trail, I managed to make sense of most of the bytes at the beginning of an ABL file:

The first 16 bytes are a header.
  • Bytes 0 - 5: "ABLINK" encoded in ASCII
  • Bytes 6, 7: Always 0x0201 except for VOICE.ABL, for which it is 0x231. VOICE.ABL is the only ABL file not containing a file list and that I was not able to decode.
  • Bytes 8 - 11: Size of all files contained without padding for sector boundary alignment
  • Bytes 12 - 15: Number of files

This is followed by a file table that is a sequence of 12-byte entries. Each entry is structured as follows:
  • Bytes 0 - 7: File name in ASCII, 0x00-padded.
  • Byte 8: Denotes the type of file contained.
  • bytes 9 - 11: Start of file in 2048-byte sectors.

With that out of the way, I was able to partition all ABL files besides VOICE.ABL into extents of files. Maybe you noticed that the length of a contained file is not encoded anywhere in the ABL metadata. So this has to be found out from the contained file's themselves. Some file formats can be derived from the bytes found at the start of an extent: MOVIE.ABL contains SEGA FILM files, EFFECT.ABL mostly contains AIFF files. These store their size in their header data in some way or another, so by just truncating the extents accordingly, the actual files are finally extracted from the ABL archives.

Packed containers: ABZ and ABS​

But what about those ABZ and ABS files? Sage mentioned them on their page, that they employ some kind of compression scheme and that their uncompressed size is stored somewhere inside of them... After much poking around, I decided to try the LZSS decompression Python tool kingshriek wrote for the TLWiki YU-NO translation project on ABZ files. After some trial and error, I noticed that the file sizes produced when applying the tool to the data after the first 16 bytes of a file are indeed stored in the files! To be exact, this information is part of the first 16 bytes (that are not decompressed by the tool). Also, by truncating the input file size and throwing the decompression tool on it, I was able to determine the actual input file sizes - which I then also found in the file data. This information is absolutely crucial, since it means that the compression scheme is known already! Step by step, this lead me to fully unvover the ABZ and ABS file formats.

First and foremost, both ABZ and ABS are just container formats. I.e. they themselves just store another file - ABZ with compression, ABS plainly. The format is as follows:
  • Bytes 0 - 2: ABS: "ABS" in ASCII; ABZ: "ABZ" in ASCII
  • Bytes 3 - 5: File type contained in ASCII. There are multiple, explained further below.
  • Byte 4: ABS: Always 0x00; ABZ: Always 0x01
  • Byte 5: File code (relates to the file type contained within the container)
  • Bytes 8 - 11: ABS: Stored file size excluding ABS header, i.e. total size - 16; ABZ: Decompressed file size
  • Bytes 12 - 15: ABS: Always 0x00000000; ABZ: Stored file size excluding ABZ header, i.e. total size - 16
With that I was now able to not only correctly truncate the cut-up ABLINK files to the correct size, I was also able to extract almost all files from the game in their proprietary formats!

The graphics: ABG​

The final frontier was now determining how the propietary scenario and graphics formats work. I already described the MES file format for the scenario files in brief, so I won't go into detail here. So I just want to shortly describe how I uncovered the ABG file format.

Again, sage plays a role. I have not mentioned it, but they not only decoded the ABG file format, they also wrote a tool called MamiList that allows for opening game archive files and viewing the graphics stored within for a variety of older visual novels. It is still available for download on their site. This tool was most likely originally written for Windows 95, but lo and behold, it also runs on Windows 11, almost 30 years later! This proved to be very helpful since now I knew the image data each file contained. With a reference to work with, the whole reverse-engineering got a whole lot easier.

Taking a long, hard look on the byte data, I was able to make out where the image dimensions where stored and that the first couple of bytes contain palette data. However, the actual pixel data seemed to be encoded somewhat - and sage did not lose any word about that. It turned out that there is a simple run-length-encoding employed. I was able to deduce that from the few images in the game that mainly consist of a single color: The number of repeating same-colored pixels was the same as some of the byte values in the pixel data. Lots of trial and error later, and I got the scheme:

Bytes come in chunks. If the MSB of the first byte is 1, mask the byte with 0x7F. The byte now gives the number of literal bytes that follow it. If the MSB of the first byte is 0 instead, the byte gives the number of repetitions of the following byte. That way one can reconstruct the whole image's pixel data.

Of course, the format is more complicated than that, providing varying palette size options, palette offsets, the pixel data being transposed, the works. I repeatedly modified my Python code, trying things out and automatically decoding all images until they all looked like the reference images I got from MamiList.

To be honest, getting here felt a lot like doing a puzzle: A mixture of sheer luck, determination and lots of pattern recognition. After also being able to get a grasp of the scenario file format, I was finally sure that YU-NO for Saturn can be translated.

Progress​

With that finally out of the way, I can give a short update on progress: I've edited all images besides the credits and created two custom fonts: One narrow and one wide. The narrow one will be used for all regular text, the wide one is necessary for some parts of the game which presumably show some kind of computer terminal and all generated text (i.e. game progress percentage and play time). Here they are, let me know what you think!

pxfont2_narrow_latin.png


pxfont2_wide_ascii.png


I also finally managed to understand the format of all control characters in the game's strings. Turns out exactly two of them are single-byte - as opposed to every other character in the game's text. I also implemented the string dictionary encoding for the scenario files and finalized the intermediate format for scenario text for translation. There are still some bugs, but once these are ironed out, "only" font tile generation and the translation itself are left.

Stay tuned!
 
Back
Top