The requirement for terminology management applies to both human
and machine translation. Neither a human nor a computer can
magically select consistent equivalents for specialized terms. I
repeat, using machine translation instead of human translation does
not reduce the need for terminology management.
If anything, machine translation increases the need for
terminology management. A machine
translation system can only put out the terms that are put into it.
Therefore, the specification of which termbase to
use (which is part of the second leg of the translation tripod) and
the inclusion the actual termbase (the third leg) are
not relevant factors in the choice of whether to use human or machine
translation. If the termbase does not
yet exist, in which case the translation job
(whether for human or machine) should be delayed
until the termbase is ready. So far, the major factors in the
decision of whether to use human translation or
machine translation for a given job have been
have been whether the source text is available in machine-readable form
and whether high-quality translation is needed. If the source text
is available in machine-readable form and high-quality translation is
required, then another major factor is the nature of the source text.
Skilled human translators are able to adapt to various kinds
of source text. Some translators can even start with poorly written source
texts and produce translations that exceed the quality of
the original. However, current machine translation systems strictly
adhere to the principle of "garbage in -- garbage out." Therefore, if high
quality translation is needed yet the source text is poorly written,
forget about machine translation. There is more. Machine
translation systems cannot currently produce high-quality translations
of general-language texts even when well written. It is well-known
within the field of machine translation that current systems
can only produce high-quality translations when the source
text is restricted to a narrow domain of knowledge and,
furthermore, conforms to some sublanguage. A sublanguage is
restricted not just in vocabulary and domain but also in
syntax and metaphor. Only certain grammatical constructions are
allowed and metaphors must be of the frozen variety (that is, used
over and over in the same form) rather than
dynamic (that is, creatively devised for a particular text).
Naturally occurring sublanguages are rather rare, so the
current trend is toward what is called "controlled language."
A controlled language is almost an artificial language.
It is a consciously engineered sublanguage. Rules of
style are set up to reduce ambiguity and to avoid known
problems for the machine translation system. This leads to another
requirement concerning the nature of the source text: There must
be lots of it. It is cheap to set up a machine translation system to produce
indicative translation. It is expensive to develop a document
production chain that includes high-quality machine translation.
Therefore, for such a document chain to be cost-effective, there must
be a large quantity of similar text in the same sublanguage going
into the same target language or languages.
Now it should be somewhat clearer
why less than ten percent of what is
translated is appropriate for publication-quality machine translation.
To qualify, a text must be (1) available in machine readable form, (2)
part of a voluminous series of similar texts, and (3)
restricted to a single sublanguage. The first requirement is becoming
easier and easier to meet. The second requirement is purely a
question of economies of scale that allow development expenses to
be spread over a large quantity of text. The third requirement is the
most difficult to satisfy. If the nature of the source text does not
allow the a machine translation system to produce high-quality
output, then there is little that can be done to remedy the situation,
other than obtain a better machine translation system or assign
a human translator to revise the raw output of the machine-translation
system. This type of revision is usually called post-editing. We
will discuss possibility
of improving the quality of raw machine translation and
the pros and cons of post-editing, but first I would like
to list an alternative set of requirements for successful use
of machine translation by a translation company. This list was
provided by a colleague, Karin Spalink.
Spalink says (in a paraphrase of a slide she sent me) that machine
translation may be right for a translation company (1) the number of
language pairs is small, (2) the number of domains [with each
domain being the at the core of a sublanguage] is small, (3) the source
text is available in machine readable form with format codes that
can be handled by the machine translation system, (4) the complexity
of the source texts
[another aspect of restriction to a sublanguage]
matches the capabilities of the machine translation
system, and (5) the costs of customizing and maintaining the
machine translation system are bearable [a factor directly related
to volume of similar texts that are processed]. Spalink indirectly
includes the three
requirements I have given (machine-readable source text,
volume considerations, and restriction to a sublanguage) and other
requirements as well. There is a growing consensus concerning
for when machine translation is appropriate. Now we will return
to the questions of post-editing and improving raw machine translation.
|