BIK Terminology—

Solving the terminology puzzle, one posting at a time

  • Author

    Barbara Inge Karsch - Terminology Consulting and Training

  • Images

    Bear cub by Reiner Karsch

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 791 other followers

Archive for the ‘Setting up entries’ Category

Dear readers!

Posted by Barbara Inge Karsch on September 28, 2012

Thank you very much for your positive feedback while I was busy with things like Windows 8 terminology, teaching at NYU, attending TKE and the ISO meetings in Madrid, and doing webinars. During one of the webinars, we didn’t get around to all questions. I will be addressing some of these here now.Photo by Ute Karsch

Photo by Ute KarschQuestion: As you add terminology into your database, you might not remember that you have already entered some word that is a synonym. So, might you not end up with a different ID for 2 synonyms?

Answer: Yes, that is a scenario that is very common and that everyone setting up terminology entries is facing: We do our best to enter terms and names in canonical form in order to find them again and to avoid creating duplicates. So, we document, say, operating system and not Operating Systems, or we enter purge, and not to purge or purged in the database. Even though we were good about the form of our terms, we might not remember the meaning of all entries created and thus willy-nilly create doublettes in our database. Often times, we create them because we are not aware that one entry is a view onto a concept from one angle and a second entry might present the same concept from another angle, similar to these two pictures of the some flower.

Here are a few thoughts on what might help you avoid duplicate entries:

  • Start out by specifying the subject field in your database. It will help you narrow down the concept for which you are about to create an entry. You might do a search on the subject field and see what concepts you defined at an earlier time. Sometimes that helps trigger your memory.
  • As you are narrowing down the subject field and take a quick glance through some of the existing definitions, you might identify and recognize an existing concept as the one you are about to work on.

If you set up a doublette anyway—and it is bound to happen—you might find it later in one of the following ways and eradicate it:

  • Export your database into a spreadsheet program and do a quick QA on your entries. In a spreadsheet, such as Excel, you can sort each column. If there are true doublettes, you might have started the definition with the same superordinate, which, if you sort the entries, get lined up next to each other.
  • Maybe you don’t have time for QA, then I would simply wait until you notice while you are using your database and take care of it then. The damage in databases with lots of languages attached to a source language entry is bigger, but there are usually also more people working in the system, so errors are identified quickly. For the freelance translator, a doublette here and there is not as costly and it is also eliminated quickly once identified.

Developers of terminology management systems might eventually get to a point where maintenance functionality becomes part of the out-of-the-box program. At Microsoft, a colleague worked on an algorithm that helped us identify duplicates. The project was not completed when I left the corporate world, but a first test showed that the noise the program identified was not overwhelming. So, there is hope that with increasing demand for clean terminological and conceptual data such functionality becomes standard in off-the-shelf TMSs. In the meantime, stick with best practices when documenting your terms and names and use the database.

Posted in Maintaining a database, Setting up entries, Terminology 101 | Tagged: | 8 Comments »

Doublettes—such a pretty term, yet such a bad concept

Posted by Barbara Inge Karsch on June 10, 2011

Sooner rather than later terminologists need to think about database maintenance. Initially, with few entries in the database, data integrity is easy to warrant: In fact, the terminologist might remember about any entry they ever compiled; my Italian colleague, Licia, remembered just about any entry she ever opened in the database. But even the best human brains will eventually ‘run out of memory’ and blunders will happen. One of these blunders are so called doublettes.

According to ISO TR 26162, a doublette is a “terminological entry that describes the same concept as another entry.” Sometimes these entries are also referred to as duplicates or duplicate entries, but the technical term in standards is doublette. It is important to note that homonyms do not equal doublettes. In other words, two terms that are spelt the same way and that are in two separate entries may refer to the same concept and may therefore be doublettes. But they may also justifiably be listed in separate entries, because they denote slightly or completely different concepts.

As an example, I deliberately set up doublettes in i-Term, a terminology management system developed by DANTERM: The terms automated teller machine and electronic cash machine can be considered synonyms and should be listed in one terminological entry. Below you can see that automated teller machine and its abbreviated form ATM have one definition and definition source, while electronic cash machine and its abbreviated form, cash machine, are listed in a separate entry with another, yet similar definition and its definition source. During database maintenance, these entries should be consolidated into one terminological entry with all its synonyms.

clip_image002clip_image003

It is much easier to detect homographs that turn out to be doublettes. Rather, it should be easier to avoid them in the first place: after all, every new entry in a database starts with a search of the term denoting the concept; if it already exists with the same spelling, it would be a hit). Here are ‘homograph doublettes’ from the Microsoft Language Portal. While we can’t see the ID, the definition shows pretty clearly that the two entries are describing the same concept.

image

Doublettes happen, particularly in settings where more than one terminologist adds and approves entries in a database. But even if one terminologist approves all new concepts, s/he cannot guarantee that a database remains free of doublettes. The right combination of skills, processes and tool support can help limit the number, though.

Posted in iTerm, Maintaining a database, Microsoft Language Portal, Process, Setting up entries | Tagged: , | 4 Comments »

Gerunds, oh how we love them

Posted by Barbara Inge Karsch on December 9, 2010

Well, actually we do. They are an important part of the English language. But more often than not do they get used incorrectly in writing and, what’s worse, documented incorrectly in terminology entries.

I have been asked at least a few times by content publishers whether they can use gerunds or whether a gerund would present a problem for translators. It doesn’t present a problem for translators, since translators do not work word for word or term for term (see this earlier posting). They must understand the meaning of the semantic unit in the source text and then render the same meaning in the target language, no matter the part of speech they choose.

It is a different issue with machine translation. There is quite a bit of research in this area of natural language processing. Gerunds, for example, don’t exist in the German language (see Interaction between syntax and semantics: The case of gerund translation ). But more importantly, gerunds can express multiple meanings and function as verbs or nouns (see this article by Rafael Guzmán). Therefore, human translators have to make choices. They are capable of that. Machines are not. If you are writing for machine translation and your style guide tells you to avoid gerunds, you should comply.

Because gerunds express multiple meanings, they are also interesting for those of us with a terminologist function. I believe they are the single biggest source of mistakes I have seen in my 14 years as corporate terminologist. Here are a few examples.

Example 1:

Example 2:

image

image

In Example 1, it is clear that logging refers to a process. The first instance could be part of the name of a functionality, which, as the first instance in Example 2 shows, can be activated. In the second instance (“unlike logging”) is not quite clear what is meant. I have seen logging used as a synonym to the noun log, i.e. the result of logging. But here, it probably refers to the process or the functionality.

It matters what the term refers to; it matters to the consumer of the text, the translator, who is really the most critical reader, and it matters when the concepts are entered in the terminology database. It would probably be clearest if the following terms were documented:

  • logging = The process of recording actions that take place on a computer, network, or system. (Microsoft Language Portal)
  • logging; log = A record of transactions or events that take place within an IT managed environment. (Microsoft Language Portal)
  • Process Monitoring logging = The functionality that allows users to …(BIK based on context)
  • log = To record transactions or events that take place on a computer, network or system. (BIK based on Microsoft Language Portal).

Another example of an –ing form that has caused confusion in the past is the term backflushing. A colleague insisted that it be documented as a verb. To backflush, the backflushing method or a backflush are curious terms, no doubt (for an explanation see Inventoryos.com). But we still must list them in canonical form and with the appropriate definition. Why? Well, for one thing, anything less than precise causes more harm than good even in a monolingual environment. But what is a translator or target terminologist to do with an entry where the term indicates that it is an adjective, the definition, starts with “A method that…”, and the Part of Speech says Verb? Hopefully, they complain, but if they don’t and simply make a decision, it’ll lead to errors. Human translators might just be confused, but the MT engine won’t recognize the mistake.

So, the answer to the question: “Can I use gerunds?” is, yes, you can. But be sure you know exactly what the gerund stands for. The process or the result? If it is used as a verb, document it in its canonical form. Otherwise, there is trouble.

Posted in Content publisher, Interesting terms, Machine translation, Setting up entries, Translator | Tagged: | 4 Comments »

What do we do with terms?

Posted by Barbara Inge Karsch on September 23, 2010

We collect or extract terms. We research their underlying concepts. We document terms, and approve or fail them. We might research their target language equivalents. We distribute them and their terminological entries. We use them. Whatever you do with terms, don’t translate them.Microsoft Clip Art

A few years ago, Maria Theresa Cabré rightly criticized Microsoft Terminology Studio when a colleague showed it at a conference, because the UI tab for target language entries said “Term Translations.” And if you talk to Klaus-Dirk Schmitz about translating terminology, you will for sure be set straight. I am absolutely with my respected colleagues.

If we translate terms, why don’t we pay $.15 per term, as we do for translation work? At TKE in Dublin, Kara Warburton quoted a study conducted by Guy Champagne Inc. for the Canadian government in 2004. They found that between 4 and 6% of the words in a text need to be researched; on average, it takes about 20 min to research a term. That is why we can’t pay USD .15 per term.

Note also that we pay USD .15 per word and not per term. Terms are the signs that express the most complex ideas (concepts) in our technical documents. They carry a lot more meaning than the lexical units called words that connect them.

Let’s assume we are a buyer of translation and terminology services. Here is what we can expect:

.

Translation

Terminology work

Number of units a person can generally process per day

Ca. 2000 per day

Ca. 20 to 50 entries

Cost for the company

Ca. USD .25 per word

Ca. USD 55 per hour

Microsoft Clip ArtAt the end of the translation process, we have a translated text which in this form can only be used once. Of course, it might become part of a translation memory (TM) and be reused. But reuse can only happen, if the second product using the TM serves the same readership; if the purpose of the text is the same; if someone analyses the new source text with the correct TM, etc. And even then, it would be a good idea to proofread the outcome thoroughly.

The terminological entry, on the other hand, should be set up to serve the present purpose (e.g. support a translator during the translation of a particular project). But it might also be set up to allow a support engineer in a branch office to look up the definition of the target equivalent. Or it might enable a technical writer in another product unit to check on the correct and standardized spelling of the source term.

I am not sure that this distinction is clear to all translators who sell terminology services. You might get away with translating terms a few times. But eventually your client’s customers will indicate that there is something wrong, that the product is hard to understand or operate because it is not in their vernacular.

There are much more scientific reasons why we should not confuse translation and terminology work; while related and often (but not always) coincidental, these tasks have different objectives. More about that some other time. Today, let me appeal to you whose job it is to support clear and precise communication to reserve the verb “to translate” for the transfer of “textual substance in one language to create textual substance in another language” as Juan Sager puts it in the Routledge Encyclopedia of Translation Studies. If we can be precise in talking about our own field, we should do so.

Posted in Events, Interesting terms, Microsoft Terminology Studio, Researching terms, Setting up entries, Terminology of terminology | Tagged: , | 4 Comments »

Quantity AND Quality

Posted by Barbara Inge Karsch on September 16, 2010

In If quantity matters, what about quality? I promised to shed some light on how to achieve quantity without skimping on quality. In knowledge management, it boils down to solid processes supported by reliable and appropriate tools and executed by skilled people. Let me drill down on some aspects of setting up processes and tools to support quantity and quality.

If you cannot afford to build up an encyclopedia for your company (and who can?), select metadata carefully. The number and types of data categories (DCs), as discussed in The Year of Standards, can make a big difference. That is not to say use less. Use the right ones for your environment.

Along those lines, hide data categories or values where they don’t make sense. For example, don’t display Grammatical Gender when Language=English; invariably a terminologist will accidentally select a gender, and if only a few users wonder why that is or note the error, but can’t find a way to alert you to it, too much time is wasted. Similarly, hide Grammatical Number, when the Part of Speech=Verb, and so on.

Plan dependent data, such as product and version, carefully. For example, if versions for all your products are numbered the same way (e.g. 1, 2, 3,..), it might be easiest to have two related tables. If most of your versions have very different version names, you could have one table that lists product and version together (e.g. Windows 95, Windows 2000, Windows XP, …); it makes information retrievable slightly simpler especially for non-expert users. Or maybe you cannot afford or don’t need to manage down to the version level because you are in a highly dynamic environment.Anton by Lee Dennis

Enforce mandatory data when a terminologist releases (approves or fails) an entry. If you  decided that five out of your ten DCs are mandatory, let the tool help terminologists by not letting them get away with a shortcut or an oversight.

It is obviously not an easy task to anticipate what you need in your environment. But well-designed tools and processes support high quality AND quantity and therefore boost your return on investment.

On a personal note, Anton is exhausted with anticipation of our big upcoming event: He will be the ring bearer in our wedding this weekend.

Posted in Advanced terminology topics, Designing a terminology database, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Tool | Tagged: , , | 1 Comment »

If quantity matters, what about quality?

Posted by Barbara Inge Karsch on September 9, 2010

Linguistic quality is one of the persistent puzzles in our industry, as it is such an elusive concept. It doesn’t have to be, though. But if only Microsoft Clip Artquantity matters to you, you are on your way to ruining your company’s linguistic assets.

Because terminology management is not an end in itself, let’s start with the quality objective that users of a prescriptive terminology database are after. Most users access terminological data for support with monolingual, multilingual, manual or automated authoring processes. The outcomes of these processes are texts of some nature. The ultimate quality goal that terminology management supports with regard to these texts could be defined as “the text must contain correct terms used consistently.” In fact, Sue Ellen Wright “concludes that the terminology that makes up the text comprises that aspect of the text that poses the greatest risk for failure.” (Handbook of Terminology Management)

In order to get to this quality goal, other quality goals must precede it. For one, the database must contain correct terminological entries; and second, there must be integrity between the different entries, i.e. entries in the database must not contradict each other.

In order to attain these two goals, others must be met in their turn: The data values within the entries must contain correct information. And the entries must be complete, i.e. no mandatory data is missing. I call this the mandate to release only correct and complete entries (of course, a prescriptive database may contain pre-released entries that don’t meet these criteria yet).

Let’s see what that means for terminologists who are responsible for setting up, approving or releasing a correct and complete entry. They need to be able to:

  • Do research.
  • Transfer the result of the research into the data categories correctly.
  • Assure integrity between entries.
  • Approve only entries that have all the mandatory data.
  • Fill in an optional data category, when necessary.

Let’s leave aside for a moment that we are all human and that we will botch the occasional entry. Can you imagine if instead of doing the above, terminologists were told not to worry about quality? From now on, they would:

  • Stop at 50% research or don’t validate the data already present in the entry.
  • Fill in only some of the mandatory fields.
  • Choose the entry language randomly.
  • Add three or four different designations to the Term field.
  • ….

Microsoft Clip ArtDo you think that we could meet our number 1 goal of correct and consistent terminology in texts? No. Instead a text in the source language would contain inconsistencies, spelling variations, and probably errors. Translations performed by translators would contain the same, possibly worse problems. Machine translations would be consistent, but they would consistently contain multiple target terms for one source term, etc. The translation memory would propagate issues to other texts within the same product, the next version of the product, to texts for other products, and so on. Some writers and translators would not use the terminology database anymore, which means that fewer errors are challenged and fixed. Others would argue that they must use the database; after all, it is prescriptive.

Unreliable entries are poison in the system. With a lax attitude towards quality, you can do more harm than good. Does that mean that you have to invest hours and hours in your entries? Absolutely not. We’ll get to some measures in a later posting. But if you can’t afford correct and complete entries, don’t waste your money on terminology management.

Posted in Advanced terminology topics, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Terminology methods, Terminology principles | Tagged: , , | 1 Comment »

 
%d bloggers like this: