Convertible UTX must conform to the UTX-Simple 1.00 specification (with one exception, noted below) and these guidelines. A sample file, data.utx
, is included in the Samples.
TBX-Glossary and convert_glossary
were developed for UTX-Simple 1.00. Since then, the UTX standard has advanced to version 1.11, gaining certain features which are not compatible with our software. To prevent incompatibilities, we describe them here:
First, convert_glossary
is not prepared to bypass additional, descriptive lines in the UTX header. It must find the column definitions in the second and last header line. However, additional descriptions can be included as a glossary-wide note, as described below.
Second, UTX 1.11 provides for bidirectionality, grouping by concept ID, and term status. None of these data categories are convertible according to our current design, which was based on UTX-S 1.00, and all of this information will be lost in conversion. Entries are converted as though they were all separate concepts, all monodirectional, and all approved terms. In order to produce a file with no forbidden terms, with only approved terms, etc., one must pre-filter the UTX.
In the remainder of this documentation, UTX refers to UTX-Simple 1.00.
This document also describes our quick-input format, illustrated by another
sample, data.txt
. This format is identical to UTX, except (a)
that quick input files may omit any UTX element not mentioned below,
even if it is mandatory in proper UTX, and (b) that quick input provides
several conveniences for entering the mandatory part-of-speech data.
Although the UTX standard specifies a carriage-return/linefeed sequence for its end-of-line code (conventional in Windows systems), this converter will also accept files with linefeed by itself (conventional in Unix-like systems, including Mac OS X). The converter does not waive UTX's prohibition of files starting with a byte-order mark.
Source and target languages are expressed in the first header line, as in all UTX.
Subject field is expressed in the first header line, as an 'optional' field indicated by the key word 'subject'. It is mandatory for convertibility.
A glossary-wide note is expressed in the first header line, as an optional field with the key word 'comment'.
Source and target terms are expressed as in all UTX.
In convertible UTX, source part-of-speech is expressed as in all UTX. In the quick-input format, it can be implied: a blank in the src:pos column, or that column's entire absence, indicates that the source term is a noun.
Target part-of-speech is mandatory for convertibility. In convertible UTX, it may be expressed explicitly in a tgt:pos column, or implicitly: A blank in the tgt:pos column, or that column's absence, indicates that the part of speech is the same as in the source language. In the quick-input format, a third option joins these two: The 'note' field can contain the tag 'tgt:pos:' followed by a part of speech. This special note formatting will override the implicit same-as-source assumption (but will not override an explicit tgt:pos in its proper place). This is designed to allow the quick-input user to avoid keying a tgt:pos column; implicit same-as-source covers the most common case, and special note formatting covers the exceptions. (The tgt:pos portion of the note field is removed before the note is processed further.)
The convertible part-of-speech values are adjective, adverb, noun, properNoun, and verb. Sentence is not a convertible part of speech.
The remaining data categories (note on an entry (in source language), definition, source of definition, contextual example, and source of contextual example) are convertible but not mandatory. They must appear in columns headed by the correct abbreviations, as seen in the sample file. Per the UTX standard, columns after the mandatory three may appear in any order so long as they are consistent within a file.
When UTX is selected as the output format, the converter will produce
files conforming to the UTX-Simple 1.0 specification and the above
requirements, with this exception: Language tags in the RFC 4646 format
will neither be expanded nor reduced to conform to the narrower xx-XX
format shown in the UTX spec. This may be done after conversion if desired.
Back to Convertibility guidelines.