The SPEECH file format of WC2 FM TOWNS

The FM TOWNS WC2 is the only Japanese version of Wing Commander 2. It was also not fully dubbed, but after all, professional Japanese voice actors were invited. We could hear Touru FURUYA's Maniac, for example. What is more, the lines are a bit different from the English version sometimes. So, I think this FM TOWNS WC2 is an interesting port.

Back to the technique topic.

The FM TOWNS WC2 SPEECH files have a different data format. I have found out that the audio sample format was a bit weird. Although it is also 8 bit, 11025Hz PCM (EDIT: DOSBox Debugger indicates the samplerate is 10752Hz), the data format is not the same as the normal version. You could use the following method to convert it to normal PCM data:

Python:
    for i in range(0, len(in_bytearray)):
        if in_bytearray[i] < 0x80:
            in_bytearray[i] = 0x80 - in_bytearray[i]

This way the converted samples can be played properly in the DOS version.

But then the problem is, there is a different index area. I have not figured out any information on this except for there seems to be a file name in the index data area.

With "brute" data replacement by pasting, I've gotten the dialogue speech of Kilrah emperor and Thrakhath in the opening of WC2 into Japanese. Of course, the dividing of the lines is incorrect.

Now I need help to analyze the index format of the speech files.

Thanks in advance for any support.

EDIT:
Version 0.1 of the converter, runs in Python 3.

Copy the voice file to the same folder, rename it to input.bin, then run the script. You get output.bin.
Import output.bin to Audacity as 8bit 11025Hz RAW PCM (EDIT: DOSBox Debugger indicates the samplerate is 10752Hz).
 

Attachments

  • FMTOWNS2DOSv0.1.zip
    754 bytes · Views: 2
Last edited:
The basic format of WC2 speech files (DOS):

0x00~0x03: The total length of this file, stored in 4-byte little-endian hex data. Note this is actually the end address of the last audio section. If you leave these 4 bytes 00, the last section will not be played.

0x04~0x07: The first 3-bytes stores the starting address of the first audio section, also in 4-byte little-endian hex data. You can also know the length of the index area of the file.

0x08~the end of the index area: The begining addresses of following audio sections.

The rest of the file just simply stores all the audio sections one after the other. The audio is stored as 11025Hz, 8bit, mono raw pcm.

The speech files of FMTOWNS have an almost same file format, except for:

1. Each audio section contains a 32-byte header area. It seems that the header contains the orginal audio file name.

2. The raw pcm format is special. You could use the method I mentioned earlier to convert the samples to normal raw pcm samples.

3. The last byte of audio section address is not the same as the DOS ones. Not sure what the secret is.

Finally, the test converter:

NOTE: The converted SPEECH.J00 will cause a lot of noise and finally an error in the game. I am not sure what the problem is yet. Please modify the 0x08~0x0B to FF FF 00 00 in the generated file to get it right for now.

Python:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import with_statement

import argparse
import os
import traceback
import struct

import binascii

def main():
 
    fmtowns_bytearray = bytearray(0)
    header_bytearray = bytearray(0)
 
    sectionIndex_bytearray = bytearray(0)
 
    headerLength = 0
    pcmIndexAdress = 0
    nextIndexAdress = 0
 
    dospc_bytearray = bytearray(0)
    dospcSectionIndex_bytearray = bytearray(0)
 
 
    with open("fmtowns.bin", "rb") as f:
        fmtowns_bytearray = bytearray(f.read())
 
    rawpcm_bytearray = fmtowns_bytearray
 
    print (binascii.hexlify(fmtowns_bytearray[4:8]))
 
    # The address of first section is equal to the header length
    sectionIndex_bytearray = fmtowns_bytearray[4:8]
 
    print (binascii.hexlify(sectionIndex_bytearray))
 
    # Ignore the forth byte of the data
    sectionIndex_bytearray[3] = 0
    print (binascii.hexlify(sectionIndex_bytearray))
 
    headerLength = struct.unpack("<I", sectionIndex_bytearray[0:4])[0]
    print (headerLength)
    header_bytearray = fmtowns_bytearray[0:headerLength]
 
    for i in range(0, headerLength):
        dospc_bytearray.extend(bytearray([0]))
 
 
    print (binascii.hexlify(header_bytearray))
    print (binascii.hexlify(dospc_bytearray))
 
    for i in range(4, len(header_bytearray), 4):
 
        print (i)
        print (binascii.hexlify(header_bytearray[i:i+4]))
        sectionIndex_bytearray = header_bytearray[i:i+4]
 
        sectionIndex_bytearray[3] = 0
        print (binascii.hexlify(sectionIndex_bytearray[0:4]))
        pcmIndexAdress = struct.unpack("<I", sectionIndex_bytearray[0:4])[0]
        print (pcmIndexAdress)
 
        if i < (len(header_bytearray) - 4):
            sectionIndex_bytearray = header_bytearray[i+4:i+4+4]
            sectionIndex_bytearray[3] = 0
            print (binascii.hexlify(sectionIndex_bytearray[0:4]))
            nextIndexAdress = struct.unpack("<I", sectionIndex_bytearray[0:4])[0]
            print (nextIndexAdress)
        else:
            nextIndexAdress = len(fmtowns_bytearray)
            print (nextIndexAdress)
 
        rawpcm_bytearray = bytearray(fmtowns_bytearray[pcmIndexAdress+32:nextIndexAdress])
 
        for j in range(0, len(rawpcm_bytearray)):
            if rawpcm_bytearray[j] < 0x80:
                rawpcm_bytearray[j] = 0x80 - rawpcm_bytearray[j]
         
        print ("FILE LENGTH:")
         
        dospc_bytearray[i:i+4] = bytearray(struct.pack("<I", len(dospc_bytearray)))[0:4]
 
        print (len(dospc_bytearray))
        dospc_bytearray.extend(rawpcm_bytearray)
        print (len(dospc_bytearray))
 
    dospc_bytearray[0:4] = bytearray(struct.pack("<I", len(dospc_bytearray)))[0:4]
 
 
    with open("dospc.bin", "wb") as f:
         f.write(dospc_bytearray)

    print ("OK! Convertion finished!")
    print ("Rename dospc.bin with the normal speech file name.")

if __name__ == '__main__':
    main()
 
Last edited:
Top