ScummSpeaks v3 is a program that assists in adding or replacing speech in LucasArts games. It has options for creating sound resources in the compressed MONSTER.SOU formats (as used by SCUMM V5 and V6 games), and compressed .BUN files (as used by SCUMM V7 and V8 games). The current release of ScummSpeaks only supports sound resources that have been compressed with ScummVM's tools (compress_scumm_sou and compress_scumm_bun). As such, the resulting games can only be played using ScummVM, and not the original interpreter.
ScummSpeaks requires the use of the third-party tool, scummtr, which has been included in the binary distribution of ScummSpeaks.
ScummSpeaks works by manipulating a "Speech Map". This Speech Map contains a listing of sounds, with information like the ID of the sound, any metdata like lip-synching information, the location of the sound data on your hard drive, and a mapping of line numbers that make use of the sound. It also contains any comments for lines of text or sounds that you want to add.
Here is an example of using ScummSpeaks to replace the speech for The Dig.
Compress the original game file DIGVOICE.BUN, using ScummVM's compress_scumm_bun tool. For this example, we will compress it in the Ogg Vorbis format, and the output file will be digcomp.bun.
Extract the dialogue text from The Dig using ScummTr. For this example, the output file will be dig.txt.
Go to File→New Speech Map.
Point the Text File field to dig.txt, and the Sound Resource field to digcomp.bun.
Highlight a sound in the right-hand column and click on the "Replace Sound" button.
Navigate to a new sound file.
Go to File→Export→Export to Game Resources.
Specify a new text file and sound resource name, e.g. dignew.txt and digout.bun
Wait for a while…
Use ScummTr to insert the text from dignew.txt back into the game.
Delete the original DIGVOICE.BUN, and replace it with digout.bun (renaming it to DIGVOICE.BUN).
Start up the game!
When you play the game, once you reach the line of dialogue that has the new sound, you should hear that sound.
On the left is a list containing each line of dialogue from the text file. On the right is a list containing each sound in the game resources. Following is a description of each column in the lists.
Line = the sequence number of the line of text, starting at 0.
Text = the actual line of text
SFX = the sequence number of any sounds that are mapped to this line of text.
Description = a custom description or comment that you can add to the text.
Num = the sequence number of the sound, starting at 0.
ID/Original Offset = an ID used that will be used when exporting the game resources to play this sound. For MONSTER resources, the ID must be a unique number. For BUN resources, the ID must be like an 8.3 DOS filename (e.g. "AIRLOCK.001").
Lines Used = a list of the sequence numbers of any lines that are mapped to this sound.
Description = a custom description or comment that you can add to the text. If no custom description has been entered, and the sound is mapped to a line of text, this column displays the associated line of text.
Metadata = sounds can contain "metadata", such as lip synch tags. This information is visible in this column, but cannot be edited in ScummSpeaks. If you want to edit the metadata, you can either save the Speech Map XML and edit the XML file containing the metadata, or you can export the sounds, edit the XML containing the metadata for each sound, then import the sounds.
You can click on some column headers to sort by those columns. This has not been implemented for all columns.
Because replacing each sound one at a time will be pretty slow, there is the option to export all sounds at once, and import all sounds at once.
Exporting sounds will save all sounds to a specified directory. Each sound file will also have an XML file saved alongside it, storing information required for importing the sound again into ScummSpeaks. You can rename the sounds to whatever you want, but you have to remember to also rename the XML file. Also, be careful! Exporting sounds will generate thousands of files, which can slow down Windows Explorer when you try to browse the directory.
Importing will replace any existing sounds, and replace any mapping of sounds to lines of text. This information will come from the external XML file associated with each sound. If an imported sound has no XML file with the same name, it will be imported with some default metadata and given a new sound ID, but it will not be mapped to any lines of text.
When importing an external sound, some metadata will be automatically generated (e.g. for BUN resources, the "Regions", "Channels", and "Frequency" metadata is determined from reading the sound file). As such, this information will not be written to the XML files when exporting, and will not be read from the XML files when importing. This generated metadata will override any "default" metadata, outlined in the section Default Metadata.
There are also some options to export nicely formatted text in case you want to make a script for voice actors. Some of this functionality requires comments or descriptions entered for the lines of text you want to export.
When ScummSpeaks saves its data, it is stored in XML files.
Speech Map XML:
SpeechMap = root node - TextPath = the absolute path of the dialogue text file for this speech map - ResourceType = MONSTER or BUN - CompressionType = MP3, OGG, FLAC, or Original. (Original is not yet supported) - Sounds = multiple SoundEntry nodes -- Sound Entry --- ID = should be a number for MONSTER resources, should be in format "ABCDEFGH.123" for BUN resources. Inserted into lines of text to trigger sounds. --- Source = a node containing information on where the sound data is located on your hard drive. ---- Path ---- Offset ---- Size --- Metadata = the contents of this node will change depending on the resource type. For MONSTER resources: ---- LipSynch = a series of numbers seperated by commas and spaces, e.g. "10, 20, 30" will trigger lip synch actions after 10, 20, and 30 units of time. For BUN resources: ---- BitDepth = defaults to 16. I don't know if any other value is supported by ScummVM. ---- Channels = defaults to 1. Read from sound data when importing. ---- Frequency = defaults to 22050. Read from sound data when importing. ---- Jumps = iMUSE data. I don't think it's used for speech. ---- Markers = iMUSE data. I don't think it's used for speech. ---- Regions = iMUSE data. This is used for speech. ScummSpeaks only supports 1 region, and will generate it automatically from the sound data. ---- Syncs = iMUSE data. I don't think it's used for speech. ---- Version = defaults to 3. Refers to the version of the compressed BUN format. --- LinesUsed = a series of numbers seperated by commans and spaces, e.g. "1, 2, 3" will map the sound to lines #1, #2, and #3 in the dialogue text. Note that numbering starts from 0. - TextComments = multiple Comment nodes -- Comment = contains an attribute "id", which maps the comment to a line of dialogue. The text value of the node will be the comment. - SoundCOmments = multiple Comment nodes -- Comment = contains an attribute "id", which maps the comment to a sound's ID. The text value of the node will be the comment.
When exporting sounds, XML files are saved with each sound. When importing sounds, ScummSpeaks will look for these XML files. The format is roughly similar to the Speech Map XML.
ExternalSound = root node - SoundEntry -- ID -- Metadata For MONSTER resources: --- LipSynch For BUN resources: --- BitDepth --- Jumps --- Markers --- Syncs --- Version -- LinesUsed - Comment = unlike the Speech Map XML, there is no "id" attribute.
The main difference is the lack of a "Source" node, less metadata for sounds used in "BUN" resources, and no "id" attribute in the Comment node.
For MONSTER resources:
LipSynch : 0xFFF
For BUN resources:
Version : 3 BitDepth : 16 Frequency : 22050 Channels : 1 Regions : none Jumps : none Syncs : none Markers : none
Order of Sounds
Sounds imported from game resources or external sound file are ordered by their IDs. MONSTER resources will be sorted in order of ascending numbers. BUN resources will be sorted in alphabetical order.
Any sounds added using the "Add Sound" button will be added to the end of the sound resource.
ScummSpeaks provides two options under the Tools menu, labelled "Extract Text from Game" and "Insert Text into Game". These options will bring up a new dialog window, where you can select the command line switches for ScummTr. This provides a nice frontend for ScummTr, auxilliary to the main ScummSpeaks functionality.
By default, ScummSpeaks automatically selects some options for you:
Use Windows newlines (Cr/Lf).
Convert to ANSI.
Use Hex char codes for output.
If these options are okay, all you need to do is select the game ID, the path for the text file, and the path for the game directory.
ScummSpeaks will require a path to an actual external text file, which does not have to be one that you have open for the Speech Map.
ScummSpeaks expects the ScummTr executable to be in the same directory as ScummSpeaks, and will fail if it is not present.
ScummSpeaks includes a small utility called trspack. This utility is used for converting *.TRS files, which provide subtitles for SAN cutscenes (as used by The Dig), into text files, and vice versa.
To unpack a TRS file, use it as follows:
trspack -u DIGTXT.TRS digout.txt
And to pack a text file to a TRS file:
trspack -p digout.txt DIGTXT.TRS
The following keyboard shortcuts exist:
BACKSPACE Remove speech text DELETE " " " INSERT Insert speech text