Contacts and discussions with colleagues working with lexicographical
and NLP applications (i.e., thesauri and MT-based lexical resources)
have elicited comments to the effect that MARTIF is an inappropriate
interchange vehicle, i.e., that is not powerful enough
for their purposes. Obviously, MARTIF does not address the interchange
needs of these environments because it was designed specifically
for the interchange of concept-oriented terminological data, where
each entry treats a concept and all the terms associated with
that concept. Lexicographical entries, on the other hand, are
word rather than concept oriented. They treat a
word and all its meanings. Links between the two environments
would have to join a meaning node in a lexicographical entry with
an individual terminology information group (i.e., a specific
term) in a terminological entry. The complexity of these links
would increase with the number of languages included in the databases,
the number of subject fields covered, and the degree of polysemy
inherent in those subject fields.
Positive experience testing the existing MARTIF format and the
definition of the blind interchange levels (see above) leads to
the conclusion that it does not make sense to expand the existing
MARTIF format itself to accommodate these essentially different
applications. Even though certain data categories are used in
common in these different environments, they are frequently interpreted
and used differently as a result of structural variation and the
divergent objectives of the two theoretical approaches. As a result,
different kinds of systems employ different data modeling conventions.
In order to coordinate data exchange between these two environments,
it would be highly desirable to pursue parallel development between
MARTIF and interchange formats designed for use in specific related
areas and to provide linking mechanisms among these formats. In
fact, the TEI work group had originally hoped to achieve this
kind of linkage as a result of the work done by the lexicography
group in that project. Unfortunately, the counterpart TEI lexicography
group failed to resolve internal differences in their own discipline
and returned two conflicting DTD fragments to the TEI central
committee, at which point efforts to coordinate DTDs between terminology
and lexicography were regrettably abandoned.
The first prerequisite for a renewed attempt to coordinate between
terminological and lexicographical interchange formats will require
that a comparable lexicography group develop a format that is
based on the general SGML approach and that reflects the level
of sophistication that MARTIF has reached over the years that
it has been under development. Once this requirement has been
met, it will be possible to design an integrated framework within
which the exchange of information among lexicographical, terminological,
and other approaches to linguistic information processing could
take place. Initial steps have been taken to design such a uniform
framework, and cooperation has begun with the EU-supported OTELO
project, as well as with the MARCLIF (Machine-readable Conceptual
and Lexicographical Interchange Format) project being conducted
by the International Association for Machine Translation (IAMT).
Some critics have questioned the idea of using SGML as a language
for expressing terminological data structures unless the SGML
DTD is accompanied by a conceptual data model. Although MARTIF
was originally developed without using this methodology, there
is a commercial endeavor [CMR-TermSoft RELTEF] to develop
a relational database that parallels MARTIF, and this relational
database is designed according to a conceptual data model consisting
of an entity-relationship diagram. This model is designed to address
issues arising in the environment of the MARCLIF framework.
|