Setting up entries

Dear readers!

September 28, 2012 by Barbara Inge Karsch

Thank you very much for your positive feedback while I was busy with things like Windows 8 terminology, teaching at NYU, attending TKE and the ISO meetings in Madrid, and doing webinars. During one of the webinars, we didn’t get around to all questions. I will be addressing some of these here now.

Question: As you add terminology into your database, you might not remember that you have already entered some word that is a synonym. So, might you not end up with a different ID for 2 synonyms?

Answer: Yes, that is a scenario that is very common and that everyone setting up terminology entries is facing: We do our best to enter terms and names in canonical form in order to find them again and to avoid creating duplicates. So, we document, say, operating system and not Operating Systems, or we enter purge, and not to purge or purged in the database. Even though we were good about the form of our terms, we might not remember the meaning of all entries created and thus willy-nilly create doublettes in our database. Often times, we create them because we are not aware that one entry is a view onto a concept from one angle and a second entry might present the same concept from another angle, similar to these two pictures of the some flower.

Here are a few thoughts on what might help you avoid duplicate entries:

Start out by specifying the subject field in your database. It will help you narrow down the concept for which you are about to create an entry. You might do a search on the subject field and see what concepts you defined at an earlier time. Sometimes that helps trigger your memory.
As you are narrowing down the subject field and take a quick glance through some of the existing definitions, you might identify and recognize an existing concept as the one you are about to work on.

If you set up a doublette anyway—and it is bound to happen—you might find it later in one of the following ways and eradicate it:

Export your database into a spreadsheet program and do a quick QA on your entries. In a spreadsheet, such as Excel, you can sort each column. If there are true doublettes, you might have started the definition with the same superordinate, which, if you sort the entries, get lined up next to each other.
Maybe you don’t have time for QA, then I would simply wait until you notice while you are using your database and take care of it then. The damage in databases with lots of languages attached to a source language entry is bigger, but there are usually also more people working in the system, so errors are identified quickly. For the freelance translator, a doublette here and there is not as costly and it is also eliminated quickly once identified.

Developers of terminology management systems might eventually get to a point where maintenance functionality becomes part of the out-of-the-box program. At Microsoft, a colleague worked on an algorithm that helped us identify duplicates. The project was not completed when I left the corporate world, but a first test showed that the noise the program identified was not overwhelming. So, there is hope that with increasing demand for clean terminological and conceptual data such functionality becomes standard in off-the-shelf TMSs. In the meantime, stick with best practices when documenting your terms and names and use the database.

Doublettes—such a pretty term, yet such a bad concept

June 10, 2011 by Barbara Inge Karsch

Sooner rather than later terminologists need to think about database maintenance. Initially, with few entries in the database, data integrity is easy to warrant: In fact, the terminologist might remember about any entry they ever compiled; my Italian colleague, Licia, remembered just about any entry she ever opened in the database. But even the best human brains will eventually ‘run out of memory’ and blunders will happen. One of these blunders are so called doublettes.

According to ISO TR 26162, a doublette is a “terminological entry that describes the same concept as another entry.” Sometimes these entries are also referred to as duplicates or duplicate entries, but the technical term in standards is doublette. It is important to note that homonyms do not equal doublettes. In other words, two terms that are spelt the same way and that are in two separate entries may refer to the same concept and may therefore be doublettes. But they may also justifiably be listed in separate entries, because they denote slightly or completely different concepts.

As an example, I deliberately set up doublettes in i-Term, a terminology management system developed by DANTERM: The terms automated teller machine and electronic cash machine can be considered synonyms and should be listed in one terminological entry. Below you can see that automated teller machine and its abbreviated form ATM have one definition and definition source, while electronic cash machine and its abbreviated form, cash machine, are listed in a separate entry with another, yet similar definition and its definition source. During database maintenance, these entries should be consolidated into one terminological entry with all its synonyms.

It is much easier to detect homographs that turn out to be doublettes. Rather, it should be easier to avoid them in the first place: after all, every new entry in a database starts with a search of the term denoting the concept; if it already exists with the same spelling, it would be a hit). Here are ‘homograph doublettes’ from the Microsoft Language Portal. While we can’t see the ID, the definition shows pretty clearly that the two entries are describing the same concept.

Doublettes happen, particularly in settings where more than one terminologist adds and approves entries in a database. But even if one terminologist approves all new concepts, s/he cannot guarantee that a database remains free of doublettes. The right combination of skills, processes and tool support can help limit the number, though.

Gerunds, oh how we love them

December 9, 2010 by Barbara Inge Karsch

Well, actually we do. They are an important part of the English language. But more often than not do they get used incorrectly in writing and, what’s worse, documented incorrectly in terminology entries.

I have been asked at least a few times by content publishers whether they can use gerunds or whether a gerund would present a problem for translators. It doesn’t present a problem for translators, since translators do not work word for word or term for term (see this earlier posting). They must understand the meaning of the semantic unit in the source text and then render the same meaning in the target language, no matter the part of speech they choose.

It is a different issue with machine translation. There is quite a bit of research in this area of natural language processing. Gerunds, for example, don’t exist in the German language (see Interaction between syntax and semantics: The case of gerund translation). But more importantly, gerunds can express multiple meanings and function as verbs or nouns (see this article by Rafael Guzmán). Therefore, human translators have to make choices. They are capable of that. Machines are not. If you are writing for machine translation and your style guide tells you to avoid gerunds, you should comply.

Because gerunds express multiple meanings, they are also interesting for those of us with a terminologist function. I believe they are the single biggest source of mistakes I have seen in my 14 years as corporate terminologist. Here are a few examples.

Example 1:	Example 2:

In Example 1, it is clear that logging refers to a process. The first instance could be part of the name of a functionality, which, as the first instance in Example 2 shows, can be activated. In the second instance (“unlike logging”) is not quite clear what is meant. I have seen logging used as a synonym to the noun log, i.e. the result of logging. But here, it probably refers to the process or the functionality.

It matters what the term refers to; it matters to the consumer of the text, the translator, who is really the most critical reader, and it matters when the concepts are entered in the terminology database. It would probably be clearest if the following terms were documented:

logging = The process of recording actions that take place on a computer, network, or system. (Microsoft Language Portal)
logging; log = A record of transactions or events that take place within an IT managed environment. (Microsoft Language Portal)
Process Monitoring logging = The functionality that allows users to …(BIK based on context)
log = To record transactions or events that take place on a computer, network or system. (BIK based on Microsoft Language Portal).

Another example of an –ing form that has caused confusion in the past is the term backflushing. A colleague insisted that it be documented as a verb. To backflush, the backflushing method or a backflush are curious terms, no doubt (for an explanation see Inventoryos.com). But we still must list them in canonical form and with the appropriate definition. Why? Well, for one thing, anything less than precise causes more harm than good even in a monolingual environment. But what is a translator or target terminologist to do with an entry where the term indicates that it is an adjective, the definition, starts with “A method that…”, and the Part of Speech says Verb? Hopefully, they complain, but if they don’t and simply make a decision, it’ll lead to errors. Human translators might just be confused, but the MT engine won’t recognize the mistake.

So, the answer to the question: “Can I use gerunds?” is, yes, you can. But be sure you know exactly what the gerund stands for. The process or the result? If it is used as a verb, document it in its canonical form. Otherwise, there is trouble.

Dear readers!

Doublettes—such a pretty term, yet such a bad concept

Gerunds, oh how we love them

BIK Terminology

From the Blog

Find It Here

BIK Terminology

From the Blog

Find It Here

Follow Me