Translating Culdcept: tutorials, notes, whatever

Hi everyone,

Following the acquisition of the original disc, I explored a bit the content of the card/board-RPG game "Culcept" and found out that it "might be" relatively easy and straightfoward to translate it to english. In the following thread I will post updates about my finds, but also call for help or advices when I'm stuck. Ultimately, my aim is to translate it.

On the other hand, I really appreciated other segaxtreme threads which documented step by step some approach for other translations.
This was very helpful to explore my 1st saturn game and might be help to other future enthousiasts.
So I decided to make my posts a small tutorial, when possible.

And I found that this approach (forum posts) is much much better than having knowledge spread in discord conversations (finding some information a few weeks later is a nightmare...)


Current situation is:

Confirmed:

- main script is in shift-JIS and can be edited relatively easily
- a full translation exists for the DS port AND is very well documented here : Culdcept DS translation wiki
- there are minor modifications between both scripts, but not more than a few kanji here and there

In progress :

- multiplayer mode will need specific re-translation: it is wifi-based in the DS version, while it is local 4 player in the saturn version and there are quite a few differences
- it seems that both fixed and variable width fonts are present in the game (needs confirmation, more on this later)
- on top of shift-JIS, used for dialog windows, there are some accessory smaller fonts. They do not seems to match any classic fonts, but I may be wrong. Values and offsets need to be determined.


Will need help at some point :

- how to modify the text routine to switch between variable width and fixed width, depending on context (e.g. dialogs VS menus)


SO, after this short introduction. Let's dive into content.

----------------------------------------------------------

######################################
PART 1 : Discovery of text encoding
######################################

Game version used in tutorials :

REDUMP lists 2 version:
My disc matches v1.004, any offset listed below is valid for this particular version.
I had a look at the track files, but there is only 1 small binary file of a few megabytes and 2 large bin files.

So I went back to work on the full track dump.
All offset below are valid for track_1.bin, sha-1 == 82003c2bf26d23f8824f8934ce5ca9ae403f0043

If anyone has information about any potential difference between these version, please let me know.


Tools:


  • windows calculator (for octal/hexa converstions and shifts)
  • notepad++ : for my markdown notes
  • crystaltile & tilemolester : tile search, police search, texture searches
  • vxMedit : direct edition of main scenario text, pattern or texts searches...
  • mednaffen and yabasanchiro emulator : VDP1/2, CPU RAMs exploration via debuggers, creation of savestates, test of modifications...
  • Hex to String Converter Online - DenCode : for rapid hex / shift-JIS conversions

Dialogs font & text replacement :
  • shift-JIS police is present in : [ 1DE64,36404 )
HOWTO :
  • open track_1.bin with crystaltile
  • click on "tile" icon in the top bar
  • on left menu, select width=16, height = 12, tile form = solid 1bpp
  • set offset at 1DE64 to see the shift-JS police

crystaltile_shift-JIS.png


So it may be that the game use the same code as in https://mattsmessyroom.com/uploads/sjis.tbl.
We will test that by searching for a simple japanese word that uses a few easy-to-recognize hirigana or katakana.

For instance, will will replace the word "creature" (クリーチャー) which appears a lot in the first game dialogs.

How did I got this ? (I do not speak japanese) Well, using the deepL app and my phone camera, I observed the dialogs in the 5 first minutes of the main scenario. And I saw that this word appears many times early in the tutorial dialogs.


HOWTO :
  • using the shift-JS table above, we expect that クリーチャー in shift-JS hexa is 834E 838A 815B 8360 8383 815B
  • open track_1.bin with wxMedit
  • menu display -> encoding -> est asian -> select shift-JIS
  • menu search -> check search hexa chain -> paste 834E838A815B83608383815B
  • click count, you should see 246 occurences
  • click next, then previous to display the 1st occurence
  • you should see text dialogs on the right, as shown in the following image:
vxMedit_look_for_creature.png


  • If you look for the 1st occurence at 00CE110A, you can see this word "creature" (クリーチャー) appears in a block of text, separated from other blocks of text by a run of 00 values:
vxMedit_text_block.png


  • We can copy a sentence and check it using the website: Hex to String Converter Online - DenCode
    • overline a sentence piece
    • menu edit -> advanced -> copy as hex string
    • past in the aformentionned website, make sure it uses shift-JIS, I used this piece of sentence:
      • hex value : 834A838B8368838982CC8EF4949B82AA81418DA182BE82C989E482AA91CC82F0 0A 8D5391A982B582C482A282E982C682A282A482CC82A9814581458145
      • in the website, it should be displayed as :
        カルドラの呪縛が、今だに我が体を
        拘束しているというのか・・・
        • note that all characters are 2-bytes (8xxx or 9xxx in hexa), while 0A is a 1-byte character, highlighted with space in example above
        • this one has been been converted as a line return
      • so, 0A : will likely be a line return
      • spoiler : 07 value will be a special code meaning "go to new dialog window"
  • If you browse this block with your mouse in wxMedit, will will notice blocks of text are split by a chunk of seven "00".
    • you can see this pattern: 00 seven times, then 32 00 ** 0F [some shift-JIS text] [some control code to end the block]
  • So, there is a chance we get the text data structure, with maybe some pointer to character names at the wondo header, or any other control codes.

  • Now, let's move back to the existing translation, we are looking for the beginning of above sentence (カルドラの呪縛が) and do a CTRL-F on each page of the translation website, which gives a match in this page :
  • We just translated something !

Going further, let's see if modifying some text directly in wxMedit will actually makes the translation to appear in-game

HOW TO :
  • We will replace the word "クリーチャー" (creature in japanese) which shift-JS hexa is 834E 838A 815B 8360 8383 815B, by 63 72 65 61 74 75 72 65 20 20 20 20 which is "creature ", e.g. 8 characters plus 4 spaces, e.g hex value 20, to make it as many bytes as in "クリーチャー" (note that the spaces in the hexa pattern above are only for understanding)
  • menu search -> replace -> check "find hex string"
  • then replace 834E838A815B83608383815B, by 637265617475726520202020
  • save the results to track_1.bin (make sure the name fits the cue file associated to your bin/cue dump)
  • then load this modified game image into mednaffen
  • in the main menu, select first icon in main menu ant enter you name to create a new game
  • you enter the main scenario, after a few dialogs, you will see this 🙂


vxMedit_creature_text.png






TODO in next updates :

- supplementary fonts found via VDP memory exploration
- deciphering the dialog control codes
- observations related to variable width font in menus
 
Last edited:
You can work on the BIN as 1 big file or extract the contents , but Culdcept has the lowest file count I've ever seen.

1713708376346.png


If you open Culdcept.dto in a hex editor, you'll see the same data.
1713708736112.png



I've never made an SSP file using the whole bin I've always done it by file but I'm sure there's a way to make it work. Great write up, looking forward to watching the progress and ultimately playing this 🙂

EDIT: just looked and the txt files are in English, neat.
 
######################################
PART 2 : Control codes / text data structures
######################################

This next post will show how I explored text data structures and deduced some textbox control codes, cards statistics flags, offset tables, etc ...

First, I looked for a good hex editor that would
1) allow me to see text as shift-JIS police (see previous posts) and
2) allow me to highlight some bytes patterns, defined as regular expressions.

(If you do not know what is regexp, check Wikipedia, or some good tutorials in whatever coding language you like. For humanity's sake, do not use chatGPT, that burns 20 times more eqCO2 per query...).

I went for 010edito which is open source, Mac / Unix / Windows compatible. An equivalent software (open source) would be imhex.
After installation, I open the file CULDCEPT.DT0 where we previously spotted some text.
Set the byte translation to shift-JIS : View -> Charset -> International -> select shift-JIS

My first regexp will match bytes translation intervals of the shift-JIS police, e.g. one single byte interval (Latin characters mostly) and several 2-bytes (Japanese) characters. You can observe the full shift-JIS byte translation and these intervals there: (picori::shift_jis_1997 - Rust), bytes intervals are basically 1-byte ([x81-x9F] | [xE0-xEF]) & 2 bytes ([x81-x9F] | [xE0-xEF])([[x40-x7E]|[x80-xFC]]). You can notice that value 7F is excluded for 2nd byte. That would be a few intervals to enter, but for quick exploration, I summarize this to interval [x8140-x9FFC] which is more than shift-JIS but OK for this tutorial.

I created a "highlight" in the editor to background colour whatever matches this interval. When translated to base 10 (decimals) this 2-bytes interval (also named shorts in 010editor) match [33088-40956]. Go to menu View -> Highlight -> Edit Highlights, remove existing entries and create a new one, as a 'short' (2-bytes) value and select a colour (I used clear blue).

I also entered some control code that I spotted earlier, eg. x0A for line return, x07 for wait_button_input (both in red), and interval [xF00-xFFF] which looks to be character portraits displayed on top of the dialog window (in green).

highlights.png


Now my aim is to scroll down in the bytes with page_down key and find some large coloured blocks, which will probably be text-related. The right summary pane is helpful for that. In Culdcept's case, we are lucky because texts are grouped in relatively dense blocks and not compressed, and text structures appear rapidly. See the image ? Look at this big block full of blue, that's probably some text block !

text_block.png


You will notice that the highlight tool is rather limited, as soon as a 1-byte control code is following 2-bytes characters, the highlight tool is 1byte shift and will not highlight in blue until the next 1-byte shift. But this is enough to spot blue blocks. After full file scrolling in the opened file, I listed 21 text blocks.

After copying a few words from each block and searching for matches in the existing Nintendo DS translation wiki, I managed to associate each block to a text category (scenario, taunts, cards, items...) and determine their offsets. This is a long work. All together, I probably spent around 6 hours to reach this step. Weirdly, 6 blocks are a repetition of the scenario text, all with exact same bytes (data packing oversight from the developers, I suppose).

Now let's dig into text data structures. I found 3 general patterns :

1) offset tables + dense text : easy to guess as you will see (scenario, tutorials).
2) pointer tables + dense text. There is more than control codes here, some bytes blocks between texts have some function.
3) Data/text mixed tables, with pointer and/or tables : short text are in the middle of structured game assets, such as cards, items, spells statistics...

Next post will detail pattern 1.

###########################
## Text data pattern n°1

This is the simplest text patterns. We are going to use the "tutorial" text as an example.

Have a look at this screenshot taken from position xC0B9C0, which is the "Scenario" text.

block_scenario.png


You can clearly see the text block in blue (full block on the summary pane on the right, beginning og the block in the hexadecimal pane) with control codes appearing in red/green. We could already start translating via 010editor from here, but with a strong limitation : we would have to stick to the same byte sizes for each text. Meaning that if a text was 14 bytes long (so 7 shift-JIS jap characters, as they are 2 bytes long), we could at best replace them with, at best, 14 Latin characters (they are 1-byte in shift-JIS, e.g. ASCII-compatble).

Could we hope for more freedom ? Well so far I found that yes, but not much more. We are stuck by the fact that all data is packed in 1 file and until someone guess the packing format (not me) we will have to fit English translation into the limits of these blue blocks. But, we would be happy to get some freedom to shift text blocks to our convenience in this interval, because some sentences, when translated from Japanese to English, might need more characters, while others might need less.

Did you notice something with the bytes just before the text block ? Have a close look to each pair, can you guess a pattern ? Some clue, look values every 2 bytes.

[try before going to next senetence !]

So x00F0, x030D x0607 x06CA ... until x6116, x622A, x0765. Then starts the text.
These are systematically increasing values !
x00F0 < x030D < x0607 < x06CA < ... < x6116 < x622A < x0765.

Let's take adress of 1st pair x00F0, which is located at address xC0B9C0 :
C0B9C0 + 00F0 = address of 1st sentence ! More Precisely x0F02 which is character portrait followed by text (spoiler, x1307 is replaced by platyer name).
Let's take the second : adress of 00F0 (C0B9C0) + 030D = 2nd block of text !

block_offsets.png



So we may have a way to set where is starting each text block. With a bit of scripting, that will allow us to build tools to modify this text with more freedom. Also, you will notice that each sub-block targeted by this list of offsets is ending with x00 (highlighted with black background in my screenshot). If you had launched the game and had compared the text you would have confirmed that the 1st sub-block is the 1st conversation of the game ( on the world map). The second is the 1st conversation at the start of the 1st battle ... etc...

Now, let's modify bytes to confirm we guessed everything correctly.
I modified sub-blocks 1 and 2. I changed their text, but also some portraits and offset of block 2. Basically the dialog on world map (1st sub-block) will be much shorter and dialog in 1st battle intro (2nd sub-block) will be longer.

I used Sega Saturn Patcher from KnighOfDragon, using Malenko's tutorial to patch the image with the modified CULDCEPT.DT0 file.

Here is the result, recorded from Mednaffen. 🙂

 
Last edited:
###########################
## Text data pattern n°2

[ongoing writting !]

Scenario and tutorials are classics dialogs, for which you scroll the text windows after windows. They are loaded sequentially, which may explain how fast we understood how to control them.

However, this is not the case for the other half of NPC dialogs. In particular, the "taunts" that NPCs throw during gameplay depend on what is actually happening in the round, for instance some evil laugh comment just after you lost a card in a battle. These taunts are structured using offset tables and batch of bytes associated to each text (it that may be related to their frequency or some game logic).

To summarize, we will aim to edit the bytes corresponding to the text, without breaking anything in game logics. This makes a minimal study of the associated data structure a compulsory step. Let's start this study.

I chose the block that matches Taunts #1 in the Nintendo DS translation, e.g. taunts from character "Zeneth". He launches taunts in the very first battle, so that will be useful for rapid tests.

Matching sentences can be found at a text block starting at xB462DE. Similarly to the previous section, we can easily distinguish a preliminary pattern, maybe an offset or pointer table, from a block containing many shift-JIS characters. However, we can immediately observe that more non-characters bytes and in particular batch of x00 (highlighted with black background/grey police) are separating taunt texts.

taunt_1_start.png


Batches of x00 are often (not always) a consequence of "filling" unused bytes when binary data structures are fixed or normalized sizes. For instance, imagine a sheet of paper with a grid, you draw a table of 2 columns anf 5 lines, you decide that each column is 8 grid steps. If you write 'sword' in 1st column, 1st line, 1 letter per grid square, you have 3 square left without a letter. If you write 'helmets' in 2nd column, you have 1 empty square. When serializing (converting) a data structure to bytes, something similar can happen, batches of x00 may be these empty space fillers.

To test that, we will copy/paste bytes in a text editor and align everything. Maybe we will observe a pattern. Here, I used bytes xXXXX to xXXXX. After playing with those for some time, we end with this screenshot :


Here, we can have several hypotheses. Mine was that independant taunts start with x10FX, then some blocks of x00 or a few non zero bytes d finishing with text as we decryted in the scenario case. These unknown bytes might be related to game logic.

We will try to confirm that by studying the probable pointer or offset table that preceeds these patterns.

Table starts at xB58B04 :

00 7A 00 C6 01 16 01 78 01 CA 02 2F 02 9A 02 D6 03 28 03 4B 03 7D 03 A5 03 D7 03 F9
 
Last edited:
Back
Top