BIK Terminology—

Solving the terminology puzzle, one posting at a time

  • Author

    Barbara Inge Karsch - Terminology Consulting and Training

  • Images

    Bear cub by Reiner Karsch

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 103 other followers

Archive for the ‘Terminology portals’ Category

Doublettes—such a pretty term, yet such a bad concept

Posted by Barbara Inge Karsch on June 10, 2011

Sooner rather than later terminologists need to think about database maintenance. Initially, with few entries in the database, data integrity is easy to warrant: In fact, the terminologist might remember about any entry they ever compiled; my Italian colleague, Licia, remembered just about any entry she ever opened in the database. But even the best human brains will eventually ‘run out of memory’ and blunders will happen. One of these blunders are so called doublettes.

According to ISO TR 26162, a doublette is a “terminological entry that describes the same concept as another entry.” Sometimes these entries are also referred to as duplicates or duplicate entries, but the technical term in standards is doublette. It is important to note that homonyms do not equal doublettes. In other words, two terms that are spelt the same way and that are in two separate entries may refer to the same concept and may therefore be doublettes. But they may also justifiably be listed in separate entries, because they denote slightly or completely different concepts.

As an example, I deliberately set up doublettes in i-Term, a terminology management system developed by DANTERM: The terms automated teller machine and electronic cash machine can be considered synonyms and should be listed in one terminological entry. Below you can see that automated teller machine and its abbreviated form ATM have one definition and definition source, while electronic cash machine and its abbreviated form, cash machine, are listed in a separate entry with another, yet similar definition and its definition source. During database maintenance, these entries should be consolidated into one terminological entry with all its synonyms.

clip_image002clip_image003

It is much easier to detect homographs that turn out to be doublettes. Rather, it should be easier to avoid them in the first place: after all, every new entry in a database starts with a search of the term denoting the concept; if it already exists with the same spelling, it would be a hit). Here are ‘homograph doublettes’ from the Microsoft Language Portal. While we can’t see the ID, the definition shows pretty clearly that the two entries are describing the same concept.

image

Doublettes happen, particularly in settings where more than one terminologist adds and approves entries in a database. But even if one terminologist approves all new concepts, s/he cannot guarantee that a database remains free of doublettes. The right combination of skills, processes and tool support can help limit the number, though.

Posted in iTerm, Maintaining a database, Microsoft Language Portal, Process, Setting up entries | Tagged: , | 4 Comments »

TKE 2010—A Short Report

Posted by Barbara Inge Karsch on September 2, 2010

TKE (International Conference of Terminology and Knowledge Engineering) was recently held in Dublin. The title this year was “Presenting Terminology and Knowledge Engineering Resources Online: Models and Challenges”. Here are my thoughts on three presentations on large database projects and one workshop.

focal.ieOne of the invited talks was given by Michal Boleslav Měchura and Brian Ó Raghallaigh who are the technical brains behind the Irish National Terminology Database that serves a stunning 600,000 users. Much like the Rikstermbanken of the Swedish Center for Terminology discussed in Quantity matters, this project makes a (corporate) terminologist’s mouth water for its funding. According to the project website, there are no fewer than 18 people on the project team. Michal shared how the team is using statistics and user feedback to improve the search capabilities, the user interface, and the data presentation.BACUS

BACUS (Base de Coneixement Universitari) is a terminology database created at the Universitat Autònoma de Barcelona by students as part of their course work. Students work with subject matter experts to create entries in at least three languages. Two of them must be languages taught at the Faculty of Translation and Interpreting: Catalan, Spanish, English, French, German, Portuguese, Italian, Russian, Arabic, Chinese, or Japanese. The third may be a language not taught at the Universitat, such as Basque, Bulgarian, Danish, Slovak, Galician, Greek, Dutch, Norwegian, Latin, Pulaar and Swedish. In their paper, Aguilar-Amat, Mesa-Lao, and Pahisa-Solé describe in detail the high-quality approach that students are taking to arrive at their entries. For example, “all linguistic data included in the BACUS project are obtained from corpora of original texts in different languages on the same specialized subject.” The work on the database has been discontinued, but it is well worth a look.

imageSuch a high-quality approach cannot be expected for entries from a federated term bank, such as EuroTermBank. This project, developed and managed by Tilde, is probably not new to you. Andrejs Vasiljevs presented the results of a survey of different groups of potential system users. In his paper, Andrejs discusses the need to open up term banks to user participation.

At J.D. Edwards user participation in the form of entry requests and comments was implemented in a format that allowed for prescriptive terminology management, as is necessary in the corporate environment. There is no reason, though, that federated term banks should not adopt Wikipedia-style knowledge sharing, approval mechanisms known from commercial sites, and the like. Once sharing, voting or commenting mechanisms are implemented, the key might be to get as many experts to use the database as possible, so that unreliable data be found and eliminated quickly. It would be interesting to discuss entry reliability with regard to these projects and the ones mentioned in Quantity matters.

The main workshop I would like to mention is, of course, the discussion of standard ISO 704. Thank you for participating through the survey and comments in Who cares about ISO 704, which I mentioned in my presentation. During the workshop, we agreed to suggest to the respective workgroup in ISO TC 37 to streamline the current content, review the example used, and add parts geared towards the different user grouimageps. I very much enjoyed the work in this group and feel that it will lead to a better standard down the road.

The TKE organizing committee decided to expand membership of GTW (association of knowledge transfer)), the organization behind TKE. A new subgroup of the Terminology group on LinkedIn is being formed specifically for that purpose. If you are interested, join the group called Association for Terminology and Knowledge Transfer; just allow a bit of time for approval.

To conclude my little TKE report: It was a particular pleasure to witness Gerhard Budin bestow the Eugen Wüster Prize upon Sue Ellen Wright and Klaus-Dirk Schmitz from Kent State University and Cologne University of Applied Science, respectively. It couldn’t have gone to two more well-deserving individuals.

Posted in BACUS, EuroTermBank, Events, Irish National Terminology Database, J.D. Edwards TDB, Rikstermbanken, Terminology portals | Tagged: , , | 1 Comment »

Quantity matters

Posted by Barbara Inge Karsch on August 19, 2010

Losing a terminologist position because the terminologist couldn’t show any quantitative progress is shocking. But it happened, according to a participant of the TKE conference that just concluded in Dublin. While managing terminology is a quality measure, quantity must not be disregarded. After all, a company or organization isn’t in it for the fun of it. Here are numbers that three teams established in different types of databases.

At J.D. Edwards, quality was a big driving factor. Each conceptual entry passed through a three-step workflow before it was approved. The need for change management was extremely low, but the upfront investment was high. Seven full-time terminologists who worked 1/3 of their time on English entries, 1/3 of their time on entries in their native language and 1/3 of the time on other projects, produced just below 6000 conceptual entries between 1999 and 2003.

In comparison, the Microsoft terminology database contained 9000 concepts in January of 2005, most of them (64%) not yet released (for more details see this article in the German publication eDITion). The team of five full-time English terminologists, who spent roughly 50% of their time on terminology work, increased the volume to about 30,000 in the five following years, 95% of which were released entries. The quality of the entries was not as high at JDE, and there was less complex metadata available (e.g. no concept relations).

Rikstermbanken According to Henrik Nilsson, at Swedish Centre for Terminology, TNC, three fulltime resources built up a terminology database, the Risktermbanken,  with 67.000 conceptual entries in three years. That seems like a large number. But one has to take into consideration that the team consolidated data from many different sources in a more or less automated fashion. The entries have not been harmonized, as one of the goals was to show the redundancy of work between participating institutions. The structure of the entries is deliberately simple.

The needs that these databases serve is different: In a corporation, solid entries that serve as prescriptive reference for the product releases are vital. Entries in a collection from various sources, such as in national terminology banks, serve to support the public and public institutions. They may not be harmonized yet, but contain a lot of different terminology for different users. And they may not be prescriptive.

As terminologists, we are sometimes very focused on quality. But let’s not forget that eventually someone will want to hear what has been accomplished by a project. The number of entries is one of the easiest way to communicate that to a business person.

Posted in J.D. Edwards TDB, Microsoft Terminology Studio, Producing quantity, Return on investment, Rikstermbanken | Tagged: , | 5 Comments »

The Year of Standards

Posted by Barbara Inge Karsch on July 16, 2010

LISA The Localization Industry Standards Association (LISA) reminded us in their recent Globalization Insider that they had declared 2010 the ‘Year of Standards.’ It resonates with me because socializing standards was one of the objectives that I set for this blog. Standards and standardization are the essence of terminology management, and yet practitioners either don’t know of standards, don’t have time to read them, or think they can do without them. In the following weeks, as the ISO Technical Committee 37 ("Terminology and other language and content resources") is gearing up for the annual meeting in Dublin, I’d like to focus on standards. Let’s start with ISO 12620.

ISO 12620:1999 (Computer applications in terminology—Data categories—Part 2: Data category registry) provides standardized data categories (DCs) for terminology databases; a data category is the name of the database field, as it were, its definition, and its ID. Did everyone notice that terminology can now be downloaded from the Microsoft Language Portal? One of the reasons why you can download the terminology today and use it in your own terminology database is ISO 12620. The availability of such a tremendous asset is a major argument in favor of standards.

I remember when my manager at J.D. Edwards slapped 12620 on the table and we started the selection process for TDB. It can be quite overwhelming. But I turned into a big fan of 12620 very quickly: It allowed us to design a database that met our needs at J.D. Edwards.

When I joined Microsoft in 2004, my colleagues had already selected data categories for a MultiTerm database. Since I was familiar with 12620, it did not take much time to be at home in the new database. We reviewed and simplified the DCs over the years, because certain data categories chosen initially were not used often enough to warrant their existence. One example is ‘animacy,’ which is defined in 12620 as “[t]he characteristic of a word indicating that in a given discourse community, its referent is considered to be alive or to possess a quality of volition or consciousness”…most of the things documented in Term Studio are dead and have no will or consciousness. But we could simply remove ‘animacy’, while it would have been difficult or costly to integrate a new data category late in the game. If you are designing a terminology database, err on the side of being more comprehensive. Because we relied on 12620, it was easy when earlier in 2010 we prepared for making data exportable into a TBX format (ISO 30042). The alignment was already there, and communication with the vendor, an expert in TBX, was easy.

ISO 12620:1999 has since been retired and was succeeded by ISO 12620:2009, which “provides guidelines […] forISOcat creating, selecting and maintaining data categories, as well as an interchange format for representing them.” The data categories themselves were moved into the ISOcat “Data Category Registry” open to use by anyone.

ISO 12620 or now the Data Category Registry allows terminology database designers to apply tried and true standards rather than reinventing the wheel. As all standards, they enable quick adoption by those familiar with them and they enable data sharing (e.g. in large term banks, such as the EuroTermBank). If you are not familiar with standards, read A Standards Primer written by Christine Bucher for LISA. It is a fantastic overview that helps navigate the standardization maze.

Posted in Advanced terminology topics, Designing a terminology database, EuroTermBank, J.D. Edwards TDB, Microsoft Language Portal, Microsoft Terminology Studio, Terminologist | Tagged: , , , | 1 Comment »

Terminological Scatterlings*

Posted by Barbara Inge Karsch on June 10, 2010

While it is hard to avoid soccer these days—not that I, who enjoyed the Sommermärchen (the fabulous atmosphere) in Germany four years ago, would want to—, it is not hard to link South Africa, soccer and terminology. So, let me leave corporate terminology management behind for this posting and talk a bit about South Africa from a language and terminology point-of-view.

South African Sunset by Eunjung Choi Let’s start with South African English. If you visit this beautiful country, you will very likely notice that ’just now’ “denotes varying levels of urgency. Phoning someone ‘now now’ is sooner than ‘now’ or ‘just now’ but not as soon as ‘right now’” according to this short glossary of South Africanisms.

In their latest blog, ‘South African World Cup Draws Multilingual Audiences’, Common Sense Advisory talks about the language industry benefitting from the World Cup, which is not a surprise: Large international events always are. I fondly remember my time at the Atlanta Olympics with hundreds of other interpreters. I worked for overtired German journalists, who would normally get away without an interpreter, but long hours and the Southern accent incapacitated them. Back then, I had put together my own glossary of sports terminology. For the World Cup, we don’t need to: CLS Communication just released a multilingual World Cup dictionary in five European languages. I never knew that “Goal” is used in Swiss German. I am sure you’ll find other little surprises in the CLS Football Dictionary. Speaking of football, former Microsoft colleague and good friend, Licia Corbolante, points out cultural differences with regard to soccer in her Italian blog.

But the World Cup is just an event that lets us focus on South Africa. The country has had a rich multilingual history. With the end of apartheid came the recognition of eleven official languages: Afrikaans and English as well as the following nine indigenous languages: seSotho sa Lebowa, seSotho, seTswana, siSwati, Tshivenda, Xitsonga, isiNdebele, isiXhosa and isiZulu. We, in Europe and America, are used to other multilingual countries, like Canada, Switzerland or Belgium, leading multilingual topics. But the general level of understanding of terminology issues in South Africa makes the US look like a developing nation.

There are many governmental bodies who do terminology work. The organization charged with promoting language issues is the Pan South African Language Board:

PanSALB“…PanSALB is a statutory body established to create conditions to develop and promote the equal use and enjoyment of all the official South African languages. It actively promotes an awareness of multilingualism as a national resource.” (http://www.pansalb.org.za/index.html).

The establishment of terminology is merely one of PanSALB’s many tasks and services. PanSALB members have also been actively involved in the creation of Microsoft terminology accessible on the Microsoft Language Portal in the following South African languages: Afrikaans, isiXhosa, isiZulu, seSotho sa Lebowa, and seTswana.TAMA SA 2003

A look at the website for the TAMA (Terminology in Advanced Management Applications) conference in 2003 shows the multitude of terminology projects under way in South Africa: From the establishment of legal terminology in South African languages by organizations, such as the Centre for Legal Terminology in African Languages, to the work of the Terminology Subdirectorate of the National Language Service. And in time for the World Cup, the Department of Arts and Culture sponsored the publication of a Multilingual Soccer Terminology List.

Let me finish with a personal story: When I was in South Africa for TAMA in 2003, I had an exchange with a cab driver that really made me a terminologist. Ok, I had had the title “terminologist” since 1998, but I never told anyone. It was often too much trouble to explain what terminologists do. So, when the cab driver asked me what I had come to South Africa for, I told him that I was in Pretoria for a conference. “What subject area?” he kept prying. When I said terminology management, he responded, “oh, I have always been interested in semiotics.” Goooooooaal! Never after this conversation have I dumbed down what I really do.

*Borrowed from “Scatterlings of Africa” by Johnny Clegg

Posted in Events, Terminologist, Terminology portals | Tagged: , , | 9 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 103 other followers

%d bloggers like this: