Translation Retriever

Introduction

The goal of Translation Retriever is to retrieve translations from other files, whether they are from an ancient version of a mod or from the original game (for instance, the unidentified description of a long sword).

Translation Retriever operates from .tra files.

Translation Retriever retrieves translations in a set of reference files: one in the original language (typically English) and the equivalent file in the target language (French, Spanish, ...). For instance, when using Translation Retriever to retrieve the translation of the unidenfied description of the long sword, you would need .tra files generated from the dialog.tlk file in the original language and the target language.

The strings to translate come from a .tra file in the original language. Translation Retriever loads this .tra file then, for each string of the file, it browses the reference file looking for the exact same string.

This tool was written to automate as much as possible the translation of The Darkest Day, in its WeiDU form, by retrieving the translation already made for the original version. Since a WeiDU mod usually has many .tra files, Translation Retriever allows to specify a directory instead of a single .tra file as the "to translate" data. The software will process all the .tra files contained in that directory and create files with the same names in the destination directory.

When using Translation Retriever with references from a WeiDU mod, you have to pick a couple of .tra files (one in original language and the equivalent in the target language) as reference. In this case, you may want to leave only one .tra file from the mod to translate (one that is supposedly using the same texts as the reference) at a time in the "to translate" directory.

Principle

Translation Retriever automates the following process:

Summary example

For a given string to translate, Translation Retriever compares it to each string of the reference. At first comparison is strict, although it is case insensitive. If an exact match is found, the corresponding string in the translated reference is retrieved and will appear in the resulting .tra file. Note: according to an option, only a comment pointing to the string id in the reference file may be added instead.

If a perfect match is not found, Translation Retriever splits the string to translate in words then browses again the reference strings and compares the strings at word level.

First, the tool compares words in the string order (first word with first word, second word with second word, ...). If there is no difference up to 90% of the string words, the string from the reference file is considered as a likely match. The likeliness is then evaluated as the ratio of identical words on the number of words in the searched string decreased by the ratio of string size difference.

If a difference appears before 90% of the words in the strings are compared, words are compared without particular order. The likeliness is then evaluated as the ratio of identical words on the number of words in the searched string, decreased by a fixed amount. This allows addition/modification/removal of a word at the beginning of the string without reducing too much the likeliness.

The Id in the reference file of the 5 most likely matching strings are provided for the searches string. The string is copied as is (in original language) in the resulting .tra file.

If there is no likely match beyond a likeliness of 50%, the original string is copied into the resulting .tra file with a statement of failure to retrieve any translation. Note: this statement can be disabled in the options.

Using the software

Translation Retriever currently only has one main window:

Main window

Use the first two "file" buttons to select the reference .tra file in the original language and the equivalent reference file in the target language.

Use the third "file" button to select the directory where all the .tra files to translate are located. There can be any number of .tra files in it (from 1 to ...).

Use the fourth "file" button to select the directory where the translated .tra files shall be created. It is recommended to use an empty directory otherwise files may be overwritten without warning!

Click on Import to start the retrieval process. At the end of the process, the operation summary is displayed:

Import results for TDD

There is no summary recorded in a file, so it's a good idea to make a screen capture of the summary for reference. The exact match count report the number of string already retrieved. The likely match count refers to the number of strings for which at least one likely match was found. See chapter How to use the resulting files for details about the resulting files and the information provided by Translation Retriever about the likely matches.

Please note that if the same original language .tra file is processed several times to retrieve strings from several reference files (for instance BG and BG II), Translation Retriever always overwrites the destination file and does not add the translated strings to the ones already loaded. Maybe in a future version. ;-) So, when trying to recover translation for a single .tra file from several reference files, the resulting file shall be saved after processing each reference file. Merging the various resulting files will require a manual operation. See chapter Other useful tool for some tool that can ease that operation.

Options

When Translation Retriever is not importing anything, the Preferences... button opens a window that lets you adjust several behaviours:

Options

Output line matching information instead of copying the texts acts as a kind of simulation: Translator Retriever won't replace the text to translate but will tell where to retrieve it.

Comments on retrieved texts defines for which lines Translator Retriever should add a comment:

Minimum likeliness percentage sets the bar for assuming a string is a likely match (see Principle chapter). I suggest you don't set it too low, otherwise you will get lots of likely match comments for wrong texts.

Warning

The retrieval process can be very long depending on several factors:

For instance, processing the 387 of TDD WeiDU took about 4 hours on my Athlon XP 3000, with 5600 exact matches among the 6200 strings. Check the Bodies, with 6600 strings, took about 12 hours (!), with only 500 exact matches and 800 likely matches.

Hint

While Translation Retriever it running, it reopens the file dialog for references in the same directory as the previous selection. However, for directories, you have to browse the whole disk again each time. To save time, it is easier to keep the same directories all the time and use Explorer to move file in and out of the directories when working with individual files to translate.

How to use the resulting files

The translated file has the same content as the original file. In addition to the strings themselves, Translation Retriever also copies comments and empty lines from the original file.

Strings for which an exact match was found in the reference are replaced by the translated string from the reference.

Notes

Strings with likely matches have a preliminary comment providing the Id (using the numbering of the reference .tra file) of the matching strings along with the evaluated likeliness. Only likeliness beyond 50% are provided and only the 5 most likely ones are given.

// Translation Retriever: Likely string found in reference at Id 73121, likeliness 99.2
// Translation Retriever: Likely string found in reference at Id 30652, likeliness 93.3
// Translation Retriever: Likely string found in reference at Id 61713, likeliness 93.3
@nn = ~original string~

Likeliness beyond 99% usually indicate a string where the only differences are in punctuation only. It is possible to obtain a 100.0% likeliness, for instance if a semi-colon is replaced by a full stop.

Likeliness beyond 90% indicate a huge word similarities, whether in strict order or without order.

String without any match (or likeliness below 50%) are marked with a preceeding comment stating failure to retrieve any translation:

// Translation Retriever: Exact match not found
@nn = ~original string~

In practise, it should be enough to look for Translation Retriever comments to identify strings that require some additional work.

Locating Translation Retriever comments easily

Provided you didn't disable generation of comments in the resulting tra files, there will be Translation Retriever comments in several files. Finding and analysing them can be tedious. However, the search in files feature of editors like Notepad++ or PSPAD can help greatly, or similar features in the Windows Explorer search. However that did'nt work so well under Windows XP so I started to use a free and very power tool called Agent Ransack.

Here is an example showing some results with BG1 NPC Project and texts from Baldur's Gate.

Agent Ransask example with BG1 NPC Project

You'll need to open the file in a editor to see the texts. However this will give a shorter list of files to process (104 files out of 128 for BG1 NPC Project, for instance).

Note that Agent Ransack can also help locating reference texts that escaped Translation Retriever and its rather rigid search. In Agent Ransack (as well as editors such as Notepad++), you can search using regular expressions, hence you can search string starting with "Once upon a time" and ending "they lived happily everafter." with a pattern such as "Once upon a time.*they lived happily everafter\." (. means any character, * means any number of occurrences, \. is required to search for a dot and not any character).

Comparing likely matches to the text to translate

In the example above, line 49 of x#pcinit_tmp.tra is set before the biography of Viconia.

// Translation Retriever: Likely string found in reference at Id 10217, likeliness 99.4
@50 = ~When asked about her past, VICONIA reveals (quite proudly) that she is a dark elf from the Underdark city of Menzoberranzan. She says very little about her reasons for leaving that sunless realm, though separations of such a nature are never gentle. She does claim to no longer worship the spider goddess Lolth; a change that even you know is often fatal. Her new faith is in the night goddess Shar, an appropriate choice for a drow, though this is not a firm indication that she has given up the brutal ways of her people. She finds the laws of the surface world quaint and more than a little strange, but this is simply because of her lack of experience. Likewise she seems a bit naive about how her race is viewed by surface dwellers. Many will not give her the chance she seems to expect, and even being seen with her may affect how people think of you. You know that as a drow, she has resistance to magic, both beneficial and harmful.~ [DUMMYSND]

In Baldur's Gate file, the string is:

@10217 = ~When asked about her past, VICONIA reveals (quite proudly) that she is a dark elf from the Underdark city of Menzoberranzan. She says very little about her reasons for leaving that sunless realm, though separations of such a nature are never gentle. She does claim to no longer worship the spider goddess Lolth; a change that even you know is often fatal. Her new faith is in the night goddess Shar, an appropriate choice for a drow, though this is not a firm indication that she has given up the brutal ways of her people. She finds the laws of the surface world quaint and more than a little strange, but this is simply because of her lack of experience. Likewise she seems a bit naive about how her race is viewed by surface dwellers. Many will not give her the chance she seems to expect, and even being seen with her may affect how people think of you. You know that as a drow, she has resistance to magic, both benificial and harmful.~

The very high percentage means there is only a slight difference, which makes it (or them) difficult to spot. Visual difference tools can help with this task, for instance KDiff3, which is able to highlight only a character among a line, contrary to less efficient tools.

To use it, you need to create two temporary files, one containing line @50 from x#pcinit_tmp.tra, the other line @10217. Then compare them with KDiff3. Under Windows, provided you created the two files in the same directory, you just need to select the two files and use the context menu to ask KDiff3 to compare them.

KDiff3 example with BG1 NPC Project

Apart from the obvious @nnnnn part, you'll notice that KDiff3 highlighted one character in beneficial and beneficial, to show you where one difference was. No wonder it was difficult to spot!

Note that you can keep KDiff3 running while you change the content of the files. So you can replace the texts in both files with new strings from a mod file and the reference file, then ask KDiff3 to reload and compare them (F5 shortcut), saving the time to repeat selection of the files to compare.

KDiff3 also handles file merging from up to 3 sources. This can be handy when trying to merge translations retrieved from several references (for instance BG1 and BG2) for a single .tra file.

History

Version 1.0

Version 2.0

Version 2.0.1

Version 2.0.2

Version 2.0.3

Version 2.0.4

Version 2.0.5

Version 2.0.6