BIK Terminology—

Solving the terminology puzzle, one posting at a time

  • Author

    Barbara Inge Karsch - Terminology Consulting and Training

  • Images

    Bear cub by Reiner Karsch

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 99 other followers

Archive for the ‘Microsoft Terminology Studio’ Category

What do we do with terms?

Posted by Barbara Inge Karsch on September 23, 2010

We collect or extract terms. We research their underlying concepts. We document terms, and approve or fail them. We might research their target language equivalents. We distribute them and their terminological entries. We use them. Whatever you do with terms, don’t translate them.Microsoft Clip Art

A few years ago, Maria Theresa Cabré rightly criticized Microsoft Terminology Studio when a colleague showed it at a conference, because the UI tab for target language entries said “Term Translations.” And if you talk to Klaus-Dirk Schmitz about translating terminology, you will for sure be set straight. I am absolutely with my respected colleagues.

If we translate terms, why don’t we pay $.15 per term, as we do for translation work? At TKE in Dublin, Kara Warburton quoted a study conducted by Guy Champagne Inc. for the Canadian government in 2004. They found that between 4 and 6% of the words in a text need to be researched; on average, it takes about 20 min to research a term. That is why we can’t pay USD .15 per term.

Note also that we pay USD .15 per word and not per term. Terms are the signs that express the most complex ideas (concepts) in our technical documents. They carry a lot more meaning than the lexical units called words that connect them.

Let’s assume we are a buyer of translation and terminology services. Here is what we can expect:



Terminology work

Number of units a person can generally process per day

Ca. 2000 per day

Ca. 20 to 50 entries

Cost for the company

Ca. USD .25 per word

Ca. USD 55 per hour

Microsoft Clip ArtAt the end of the translation process, we have a translated text which in this form can only be used once. Of course, it might become part of a translation memory (TM) and be reused. But reuse can only happen, if the second product using the TM serves the same readership; if the purpose of the text is the same; if someone analyses the new source text with the correct TM, etc. And even then, it would be a good idea to proofread the outcome thoroughly.

The terminological entry, on the other hand, should be set up to serve the present purpose (e.g. support a translator during the translation of a particular project). But it might also be set up to allow a support engineer in a branch office to look up the definition of the target equivalent. Or it might enable a technical writer in another product unit to check on the correct and standardized spelling of the source term.

I am not sure that this distinction is clear to all translators who sell terminology services. You might get away with translating terms a few times. But eventually your client’s customers will indicate that there is something wrong, that the product is hard to understand or operate because it is not in their vernacular.

There are much more scientific reasons why we should not confuse translation and terminology work; while related and often (but not always) coincidental, these tasks have different objectives. More about that some other time. Today, let me appeal to you whose job it is to support clear and precise communication to reserve the verb “to translate” for the transfer of “textual substance in one language to create textual substance in another language” as Juan Sager puts it in the Routledge Encyclopedia of Translation Studies. If we can be precise in talking about our own field, we should do so.

Posted in Events, Interesting terms, Microsoft Terminology Studio, Researching terms, Setting up entries, Terminology of terminology | Tagged: , | 4 Comments »

Quantity matters

Posted by Barbara Inge Karsch on August 19, 2010

Losing a terminologist position because the terminologist couldn’t show any quantitative progress is shocking. But it happened, according to a participant of the TKE conference that just concluded in Dublin. While managing terminology is a quality measure, quantity must not be disregarded. After all, a company or organization isn’t in it for the fun of it. Here are numbers that three teams established in different types of databases.

At J.D. Edwards, quality was a big driving factor. Each conceptual entry passed through a three-step workflow before it was approved. The need for change management was extremely low, but the upfront investment was high. Seven full-time terminologists who worked 1/3 of their time on English entries, 1/3 of their time on entries in their native language and 1/3 of the time on other projects, produced just below 6000 conceptual entries between 1999 and 2003.

In comparison, the Microsoft terminology database contained 9000 concepts in January of 2005, most of them (64%) not yet released (for more details see this article in the German publication eDITion). The team of five full-time English terminologists, who spent roughly 50% of their time on terminology work, increased the volume to about 30,000 in the five following years, 95% of which were released entries. The quality of the entries was not as high at JDE, and there was less complex metadata available (e.g. no concept relations).

Rikstermbanken According to Henrik Nilsson, at Swedish Centre for Terminology, TNC, three fulltime resources built up a terminology database, the Risktermbanken,  with 67.000 conceptual entries in three years. That seems like a large number. But one has to take into consideration that the team consolidated data from many different sources in a more or less automated fashion. The entries have not been harmonized, as one of the goals was to show the redundancy of work between participating institutions. The structure of the entries is deliberately simple.

The needs that these databases serve is different: In a corporation, solid entries that serve as prescriptive reference for the product releases are vital. Entries in a collection from various sources, such as in national terminology banks, serve to support the public and public institutions. They may not be harmonized yet, but contain a lot of different terminology for different users. And they may not be prescriptive.

As terminologists, we are sometimes very focused on quality. But let’s not forget that eventually someone will want to hear what has been accomplished by a project. The number of entries is one of the easiest way to communicate that to a business person.

Posted in J.D. Edwards TDB, Microsoft Terminology Studio, Producing quantity, Return on investment, Rikstermbanken | Tagged: , | 5 Comments »

ISO 12620—Why bother

Posted by Barbara Inge Karsch on July 22, 2010

Standards are nice, but they don’t do anything for you or, more importantly, the user of your terminology database, if you are the only one applying them. But how do you get a large virtual team of terminologists or language specialists to agree on and apply standards, such as ISO 12620, to database entries? And first: Why bother climbing such a mountain?

Machapuchare by Birgit KarschImagine you have a large document to author or translate. Your client gave you a dictionary to use. Because you are not sure of the meaning or usage of 50 terms, you look them up. But the dictionary holds you up more than anything: One entry contains a definition, the next one doesn’t; one provides context, but it is in a language you don’t understand; most terms make sense, but several of them are cryptic and the entry doesn’t provide clarity. If your client hadn’t insisted that you use the dictionary, you wouldn’t: It just slows you down.

The objective of a terminology database is to have consistent and correct terminology used in the product, in source as well as in target languages. To support that goal, users must be able to use a database entry quickly and easily—structure really helps here. Furthermore, users must be able to trust the information provided—transparent, clear and consistent entries create trust.

Ideally, you have a centralized team of trained terminologists who know the standards inside out and apply them religiously. If you don’t, select/create a tool that supports standards adherence as much as possible. Some simple examples: If definition is mandatory, automatically enforce it; if the term is a verb, hide the Number field; if the language is English, hide the Gender field. Tools can do a lot, but your team very likely still needs a standard.

The Microsoft terminology team did. Simply handing a standards document off to the team had not been successful in the past—nobody could remember it, many entries therefore contained unstructured, if not incorrect information, and there was no incentive to adhere to standards. A more collaborative effort was called for: Together, in-house terminologists went through data categories one by one. Because we were a virtual team, e-mail was the best form of communication. Each data category was dealt with in one e-mail that contained: the definition, a scenario and voting buttons that allowed the team to agree with the meaning or disagree and make a better suggestion. Team members could participate in the voting, but they didn’t have to. However, anyone knew from the beginning that they had to accept the outcome, regardless of whether they participated or not. Annapurna South by Birgit KarschAfter the new guide had been published, measurements were carried out and documented in a quarterly report. Terminologists then set their own deadlines for cleaning up entries to comply with the standards.

ISO 12620 doesn’t just enable data exchange, as we saw in last week’s entry. At J.D. Edwards and Microsoft, it also helped create standards guides. I am sure not every field is filled in correctly; perfection is not the point. But with shrinking budgets and tighter deadlines, a database that could cost millions of dollars must support the user as best as possible in their endeavor to create reliable communication. A standards guide based on an international standard is a good tool you can use to climb that mountain.

Posted in Content publisher, Microsoft Terminology Studio, Standardizing entries, Terminologist, Terminology 101 | Tagged: , | 1 Comment »

The Year of Standards

Posted by Barbara Inge Karsch on July 16, 2010

LISA The Localization Industry Standards Association (LISA) reminded us in their recent Globalization Insider that they had declared 2010 the ‘Year of Standards.’ It resonates with me because socializing standards was one of the objectives that I set for this blog. Standards and standardization are the essence of terminology management, and yet practitioners either don’t know of standards, don’t have time to read them, or think they can do without them. In the following weeks, as the ISO Technical Committee 37 ("Terminology and other language and content resources") is gearing up for the annual meeting in Dublin, I’d like to focus on standards. Let’s start with ISO 12620.

ISO 12620:1999 (Computer applications in terminology—Data categories—Part 2: Data category registry) provides standardized data categories (DCs) for terminology databases; a data category is the name of the database field, as it were, its definition, and its ID. Did everyone notice that terminology can now be downloaded from the Microsoft Language Portal? One of the reasons why you can download the terminology today and use it in your own terminology database is ISO 12620. The availability of such a tremendous asset is a major argument in favor of standards.

I remember when my manager at J.D. Edwards slapped 12620 on the table and we started the selection process for TDB. It can be quite overwhelming. But I turned into a big fan of 12620 very quickly: It allowed us to design a database that met our needs at J.D. Edwards.

When I joined Microsoft in 2004, my colleagues had already selected data categories for a MultiTerm database. Since I was familiar with 12620, it did not take much time to be at home in the new database. We reviewed and simplified the DCs over the years, because certain data categories chosen initially were not used often enough to warrant their existence. One example is ‘animacy,’ which is defined in 12620 as “[t]he characteristic of a word indicating that in a given discourse community, its referent is considered to be alive or to possess a quality of volition or consciousness”…most of the things documented in Term Studio are dead and have no will or consciousness. But we could simply remove ‘animacy’, while it would have been difficult or costly to integrate a new data category late in the game. If you are designing a terminology database, err on the side of being more comprehensive. Because we relied on 12620, it was easy when earlier in 2010 we prepared for making data exportable into a TBX format (ISO 30042). The alignment was already there, and communication with the vendor, an expert in TBX, was easy.

ISO 12620:1999 has since been retired and was succeeded by ISO 12620:2009, which “provides guidelines […] forISOcat creating, selecting and maintaining data categories, as well as an interchange format for representing them.” The data categories themselves were moved into the ISOcat “Data Category Registry” open to use by anyone.

ISO 12620 or now the Data Category Registry allows terminology database designers to apply tried and true standards rather than reinventing the wheel. As all standards, they enable quick adoption by those familiar with them and they enable data sharing (e.g. in large term banks, such as the EuroTermBank). If you are not familiar with standards, read A Standards Primer written by Christine Bucher for LISA. It is a fantastic overview that helps navigate the standardization maze.

Posted in Advanced terminology topics, Designing a terminology database, EuroTermBank, J.D. Edwards TDB, Microsoft Language Portal, Microsoft Terminology Studio, Terminologist | Tagged: , , , | 1 Comment »


Get every new post delivered to your Inbox.

Join 99 other followers

%d bloggers like this: