A translation project can be thought of as sitting on a
tripod whose three legs are the source text, the
specifications, and the terminology. If any of the three
legs is removed, the project falls down.
[Figure 1: 16k GIF]
Source text
Obviously, no translation can be done without a
source text (i.e., the document to be translated).
But for machine translation, an additional basic requirement is
that the source text
be available in machine-readable form. That is, it must come
on diskette or cartridge or tape or by modem and end up as a
text file on your disk. A fax of the source text is not
considered to be in machine-readable form, even if it is in
a computer file. A fax in a computer file is only a graphical
image of the text, and the computer does not know which dots compose
the letter a or the letter b. Conversion
of a source text on paper or in a graphical image file
to machine-readable form using imaged character recognition (ICR)
is not usually accurate enough to be used without human editing, and
human editing is expensive,
adding an unacceptable cost component to the total cost of
machine translation. Thus, for machine translation to be appropriate,
it is usually necessary to obtain the word processing or desktop
publishing file from the organization that created the source text.
But this is only one of many requirements.
Specifications
All translations projects have specifications. The problem is
that they are seldom written down. Specifications tell how the
source text is to be translated. One specification that is
always given is what language to translate into. But that is
insufficient. Should the format of the target
text (i.e., the translation) be the same as
that of the source text or different? Who is the
intended audience for the target text? Does the level of
language need to be adjusted? In technical translation, perhaps the most
important specification is what equivalents to use for technical terms.
Are there other target texts with which this translation should be
consistent? What is the purpose of the translation? If the purpose
is just to get a general idea of the content of the source text,
then the specifications would include "indicative translation
only." An indicative translation is usually for the benefit of
one person rather than for publication and need not be a high-quality
translation. Thus, publication-quality translations are
high-quality translations (and are usually the result of
human translation), while indicative translations are low-quality
translations (and are usually the result of machine translation).
These two types of translation are not normally in competition with
each other, since a requester of translation will typically want
one type or the other for a given document and a given set of
specifications. Sometimes, the two types are complementary, such
as when an indicative translation is used to
decide whether or not to request a high-quality translation of
a particular document. In this environment, an indicative translation
may be requested for a number of documents, and, using the indicative
translations, the requester may select one or two documents for
publication quality translation.
As previously mentioned, indicative translations are
usually done using machine translation
and high-quality translations are usually done using
human translation. This fact reveals a basic difference between humans
and computers. Humans, with proper study and practice, are good at
producing high-quality translations but typically can only translate a few
hundred words an hour to approximately a thousand words an hour,
depending on such factors as the difficulty of the source text.
Even with very familiar material, human translators are limited by how
fast they can type or dictate their translations. Computers are good at
producing low-quality translations very quickly. Some machine translation
systems can translate tens of thousands of words an hour. But as they
are "trained" by adding to their dictionaries and grammars, they reach
a plateau where the quality of the output does not improve. By upgrading
to a more powerful computer, the speed of translation improves but not the
quality. By upgrading to a "more powerful" human translator, the
quality of translation improves but not necessarily the speed. Here
we have a classic case of a trade-off. You can have high speed or
high quality but not both.
Indicative translation (high speed, low cost, but low
quality) represents a new and growing market but does not substantially
overlap with the existing market for publication quality
translation. The existing market,
variously estimated at 10,000,000,000 to 20,000,000,000 US
dollars world-wide per year, is primarily for high-quality technical
translation. If, on the one hand, your
specifications include low quality (barely
understandable) translation, then machine translation is for
you, and you can stop reading right
here. If, on the more likely hand, your specifications include
high-quality translation, then it
is not obvious that machine translation is
appropriate for your current translation job.
Here quality would be measured by whether the target
text is grammatical, accurate,
understandable, readable, and usable. Usability
can be measured by selecting tasks, such as maintenance
operations, which can be accomplished by a source-language reader
with the help of the source text and seeing whether those same
tasks can be performed by a target-language reader with the
help of the target text. Such measurements are notoriously
expensive, but a skilled reviewer can accurately predict
usability simply by studying the source and target texts.
Grammaticality, and understandability, and readability,
which are progressively more stringent requirements, can be
measured by a target-language monolingual person. But accuracy
requires the assistance of a skilled bilingual person who
examines both the source and target texts.
Terminology
The treatment of terminology could have been included soley under
specifications. But terminology is so important that the actual
terminological database (also called a "termbase")
supplied with a source text has been listed
as a third essential component of a translation job. The aspect
of terminology that does fit under specifications is the
requirement that the translation job use a certain termbase
into order to achieve desired consistency. Let me explain
what I mean by consistency.
Translation requesters typically want the terminology in their
translated documents to mesh closely with terminology in related
documents. For example, a software company will want all revisions of
a software manual to use the same terms as the original, to avoid confusing
readers. Translation requesters should track all terminology relevant to a
given document and deliver that terminology to the translation provider
along with specifications and source text. The specification component
of the job tells what appropriate termbase to use and, as is all too
common, tells what to do if a source-text term is missing from
the termbase. The terminology component of the job contains the
termbase itself.
Now we can define an appropriate translation job
(for a human or for a computer) as
one that sits on a stable tripod. It
must include a source text (in machine-readable
form if for machine translation); it must include
well-defined follow the specifications; and
it must include any specified termbase. In addition, we can
define an appropriate translation as a translation that
combines the source text and the termbase in a way that
matches the specifications. Note that I said "appropriate"
translation, not "good" translation. A poor (low-quality) translation
may be appropriate if the specifications include a requirement
for a fast, indicative translation.
|