Translation, Theory and Technology

Sections
Homepage
Theory
Technology
New! XLT Page
SALT Project
OSCAR
Press Releases
CLS Framework
TAMA 2001
About us

Machine Translation

Another Major Factor

The requirement for terminology management applies to both human and machine translation. Neither a human nor a computer can magically select consistent equivalents for specialized terms. I repeat, using machine translation instead of human translation does not reduce the need for terminology management. If anything, machine translation increases the need for terminology management. A machine translation system can only put out the terms that are put into it.

Therefore, the specification of which termbase to use (which is part of the second leg of the translation tripod) and the inclusion the actual termbase (the third leg) are not relevant factors in the choice of whether to use human or machine translation. If the termbase does not yet exist, in which case the translation job (whether for human or machine) should be delayed until the termbase is ready. So far, the major factors in the decision of whether to use human translation or machine translation for a given job have been have been whether the source text is available in machine-readable form and whether high-quality translation is needed. If the source text is available in machine-readable form and high-quality translation is required, then another major factor is the nature of the source text.

Skilled human translators are able to adapt to various kinds of source text. Some translators can even start with poorly written source texts and produce translations that exceed the quality of the original. However, current machine translation systems strictly adhere to the principle of "garbage in -- garbage out." Therefore, if high quality translation is needed yet the source text is poorly written, forget about machine translation. There is more. Machine translation systems cannot currently produce high-quality translations of general-language texts even when well written. It is well-known within the field of machine translation that current systems can only produce high-quality translations when the source text is restricted to a narrow domain of knowledge and, furthermore, conforms to some sublanguage. A sublanguage is restricted not just in vocabulary and domain but also in syntax and metaphor. Only certain grammatical constructions are allowed and metaphors must be of the frozen variety (that is, used over and over in the same form) rather than dynamic (that is, creatively devised for a particular text). Naturally occurring sublanguages are rather rare, so the current trend is toward what is called "controlled language."

A controlled language is almost an artificial language. It is a consciously engineered sublanguage. Rules of style are set up to reduce ambiguity and to avoid known problems for the machine translation system. This leads to another requirement concerning the nature of the source text: There must be lots of it. It is cheap to set up a machine translation system to produce indicative translation. It is expensive to develop a document production chain that includes high-quality machine translation. Therefore, for such a document chain to be cost-effective, there must be a large quantity of similar text in the same sublanguage going into the same target language or languages.

Now it should be somewhat clearer why less than ten percent of what is translated is appropriate for publication-quality machine translation. To qualify, a text must be (1) available in machine readable form, (2) part of a voluminous series of similar texts, and (3) restricted to a single sublanguage. The first requirement is becoming easier and easier to meet. The second requirement is purely a question of economies of scale that allow development expenses to be spread over a large quantity of text. The third requirement is the most difficult to satisfy. If the nature of the source text does not allow the a machine translation system to produce high-quality output, then there is little that can be done to remedy the situation, other than obtain a better machine translation system or assign a human translator to revise the raw output of the machine-translation system. This type of revision is usually called post-editing. We will discuss possibility of improving the quality of raw machine translation and the pros and cons of post-editing, but first I would like to list an alternative set of requirements for successful use of machine translation by a translation company. This list was provided by a colleague, Karin Spalink.

Spalink says (in a paraphrase of a slide she sent me) that machine translation may be right for a translation company (1) the number of language pairs is small, (2) the number of domains [with each domain being the at the core of a sublanguage] is small, (3) the source text is available in machine readable form with format codes that can be handled by the machine translation system, (4) the complexity of the source texts [another aspect of restriction to a sublanguage] matches the capabilities of the machine translation system, and (5) the costs of customizing and maintaining the machine translation system are bearable [a factor directly related to volume of similar texts that are processed]. Spalink indirectly includes the three requirements I have given (machine-readable source text, volume considerations, and restriction to a sublanguage) and other requirements as well. There is a growing consensus concerning for when machine translation is appropriate. Now we will return to the questions of post-editing and improving raw machine translation.

Sections Homepage Theory Technology New! XLT Page SALT Project OSCAR Press Releases CLS Framework TAMA 2001 About us

Machine Translation

Another Major Factor

Sections
Homepage
Theory
Technology
New! XLT Page
SALT Project
OSCAR
Press Releases
CLS Framework
TAMA 2001
About us