The goal of Translation Retriever is to retrieve translations from other files, whether they are from an ancient version of a mod or from the original game (for instance, the unidentified description of a long sword).
Translation Retriever operates from .tra files.
Translation Retriever retrieves translations in a set of reference files: one in the original language (typically English) and the equivalent file in the target language (French, Spanish, ...). For instance, when using Translation Retriever to retrieve the translation of the unidenfied description of the long sword, you would need .tra files generated from the dialog.tlk file in the original language and the target language.
The strings to translate come from a .tra file in the original language. Translation Retriever loads this .tra file then, for each string of the file, it browses the reference file looking for the exact same string.
This tool was written to automate as much as possible the translation of The Darkest Day, in its WeiDU form, by retrieving the translation already made for the original version. Since a WeiDU mod usually has many .tra files, Translation Retriever allows to specify a directory instead of a single .tra file as the "to translate" data. The software will process all the .tra files contained in that directory and create files with the same names in the destination directory.
When using Translation Retriever with references from a WeiDU mod, you have to pick a couple of .tra files (one in original language and the equivalent in the target language) as reference. In this case, you may want to leave only one .tra file from the mod to translate (one that is supposedly using the same texts as the reference) at a time in the "to translate" directory.
Translation Retriever automates the following process:
For a given string to translate, Translation Retriever compares it to each string of the reference. At first comparison is strict, although it is case insensitive. If an exact match is found, the corresponding string in the translated reference is retrieved and will appear in the resulting .tra file. Note: according to an option, only a comment pointing to the string id in the reference file may be added instead.
If a perfect match is not found, Translation Retriever splits the string to translate in words then browses again the reference strings and compares the strings at word level.
First, the tool compares words in the string order (first word with first word, second word with second word, ...). If there is no difference up to 90% of the string words, the string from the reference file is considered as a likely match. The likeliness is then evaluated as the ratio of identical words on the number of words in the searched string decreased by the ratio of string size difference.
If a difference appears before 90% of the words in the strings are compared, words are compared without particular order. The likeliness is then evaluated as the ratio of identical words on the number of words in the searched string, decreased by a fixed amount. This allows addition/modification/removal of a word at the beginning of the string without reducing too much the likeliness.
The Id in the reference file of the 5 most likely matching strings are provided for the searches string. The string is copied as is (in original language) in the resulting .tra file.
If there is no likely match beyond a likeliness of 50%, the original string is copied into the resulting .tra file with a statement of failure to retrieve any translation. Note: this statement can be disabled in the options.
Translation Retriever currently only has one main window:
Use the first two "file" buttons to select the reference .tra file in the original language and the equivalent reference file in the target language.
Use the third "file" button to select the directory where all the .tra files to translate are located. There can be any number of .tra files in it (from 1 to ...).
Use the fourth "file" button to select the directory where the translated .tra files shall be created. It is recommended to use an empty directory otherwise files may be overwritten without warning!
Click on Import to start the retrieval process. At the end of the process, the operation summary is displayed:
There is no summary recorded in a file, so it's a good idea to make a screen capture of the summary for reference. The exact match count report the number of string already retrieved. The likely match count refers to the number of strings for which at least one likely match was found. See chapter How to use the resulting files for details about the resulting files and the information provided by Translation Retriever about the likely matches.
Please note that if the same original language .tra file is processed several times to retrieve strings from several reference files (for instance BG and BG II), Translation Retriever always overwrites the destination file and does not add the translated strings to the ones already loaded. Maybe in a future version. ;-) So, when trying to recover translation for a single .tra file from several reference files, the resulting file shall be saved after processing each reference file. Merging the various resulting files will require a manual operation. See chapter Other useful tool for some tool that can ease that operation.
When Translation Retriever is not importing anything, the Preferences... button opens a window that lets you adjust several behaviours:
Output line matching information instead of copying the texts acts as a kind of simulation: Translator Retriever won't replace the text to translate but will tell where to retrieve it.
Comments on retrieved texts defines for which lines Translator Retriever should add a comment:
Minimum likeliness percentage sets the bar for assuming a string is a likely match (see Principle chapter). I suggest you don't set it too low, otherwise you will get lots of likely match comments for wrong texts.
The retrieval process can be very long depending on several factors:
For instance, processing the 387 of TDD WeiDU took about 4 hours on my Athlon XP 3000, with 5600 exact matches among the 6200 strings. Check the Bodies, with 6600 strings, took about 12 hours (!), with only 500 exact matches and 800 likely matches.
While Translation Retriever it running, it reopens the file dialog for references in the same directory as the previous selection. However, for directories, you have to browse the whole disk again each time. To save time, it is easier to keep the same directories all the time and use Explorer to move file in and out of the directories when working with individual files to translate.
The translated file has the same content as the original file. In addition to the strings themselves, Translation Retriever also copies comments and empty lines from the original file.
Strings for which an exact match was found in the reference are replaced by the translated string from the reference.
Notes
Strings with likely matches have a preliminary comment providing the Id (using the numbering of the reference .tra file) of the matching strings along with the evaluated likeliness. Only likeliness beyond 50% are provided and only the 5 most likely ones are given.
Likeliness beyond 99% usually indicate a string where the only differences are in punctuation only. It is possible to obtain a 100.0% likeliness, for instance if a semi-colon is replaced by a full stop.
Likeliness beyond 90% indicate a huge word similarities, whether in strict order or without order.
String without any match (or likeliness below 50%) are marked with a preceeding comment stating failure to retrieve any translation:
In practise, it should be enough to look for Translation Retriever comments to identify strings that require some additional work.
Provided you didn't disable generation of comments in the resulting tra files, there will be Translation Retriever comments in several files. Finding and analysing them can be tedious. However, the search in files feature of editors like Notepad++ or PSPAD can help greatly, or similar features in the Windows Explorer search. However that did'nt work so well under Windows XP so I started to use a free and very power tool called Agent Ransack.
Here is an example showing some results with BG1 NPC Project and texts from Baldur's Gate.
You'll need to open the file in a editor to see the texts. However this will give a shorter list of files to process (104 files out of 128 for BG1 NPC Project, for instance).
Note that Agent Ransack can also help locating reference texts that escaped Translation Retriever and its rather rigid search. In Agent Ransack (as well as editors such as Notepad++), you can search using regular expressions, hence you can search string starting with "Once upon a time" and ending "they lived happily everafter." with a pattern such as "Once upon a time.*they lived happily everafter\." (. means any character, * means any number of occurrences, \. is required to search for a dot and not any character).In the example above, line 49 of x#pcinit_tmp.tra is set before the biography of Viconia.
In Baldur's Gate file, the string is:
The very high percentage means there is only a slight difference, which makes it (or them) difficult to spot. Visual difference tools can help with this task, for instance KDiff3, which is able to highlight only a character among a line, contrary to less efficient tools.
To use it, you need to create two temporary files, one containing line @50 from x#pcinit_tmp.tra, the other line @10217. Then compare them with KDiff3. Under Windows, provided you created the two files in the same directory, you just need to select the two files and use the context menu to ask KDiff3 to compare them.
Apart from the obvious @nnnnn part, you'll notice that KDiff3 highlighted one character in beneficial and beneficial, to show you where one difference was. No wonder it was difficult to spot!
Note that you can keep KDiff3 running while you change the content of the files. So you can replace the texts in both files with new strings from a mod file and the reference file, then ask KDiff3 to reload and compare them (F5 shortcut), saving the time to repeat selection of the files to compare.
KDiff3 also handles file merging from up to 3 sources. This can be handy when trying to merge translations retrieved from several references (for instance BG1 and BG2) for a single .tra file.
Version 1.0
Version 2.0
Version 2.0.1
Version 2.0.2
Version 2.0.3
Version 2.0.4
Version 2.0.5
Version 2.0.6