BIK Terminology—

Solving the terminology puzzle, one posting at a time

  • Author

    Barbara Inge Karsch - Terminology Consulting and Training

  • Images

    Bear cub by Reiner Karsch

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 99 other followers

Archive for the ‘Skills and qualities’ Category

Terminology extraction with memoQ 5.0 RC

Posted by Barbara Inge Karsch on August 15, 2011

In the framework of a TermNet study, I have been researching and gathering data about terminology management systems (TMS). We will not focus on term extraction tools (TE), but since one of our tools candidates recently released a new term extraction module, I wanted to check it out. Here is what I learned from giving the TE functionality of memoQ 5.0 release candidate a good run.

Let me start by saying that this test made me realize again how much I enjoy working with terminological data; I love analyzing terms and concept, researching meaning and compiling data in entries; to me it is a very creative process. Note furthermore that I am not an expert in term extraction tools: I was a serious power-user of several proprietary term extraction tools at JDE and Microsoft; I haven’t worked with the Trados solution since 2003; and I have only played with a few other methods (e.g. Word/Excel and SynchroTerm). So, my view of the market at the moment is by no means a comprehensive one. It is, however, one of a user who has done some serious term mining work. One of the biggest projects I ever did was Axapta 4.0 specs. It took us several days to even just load all documents on a server directory; it took the engine at least a night to “spit out” 14,000 term candidates; and it took me an exhausting week to nail down 500 designators worth working with.

As a mere user, as opposed to a computational linguist, I am not primarily interested in the performance of the extraction engine (I actually think the topic is a bit overrated); I like that in memoQ I can set the minimum/maximum word lengths, the minimum frequency, and the inclusion/exclusion of words with numbers (the home-grown solutions had predefined settings for all of this). But beyond the rough selection, I can deal with either too many or too few suggestions, if the tool allows me to quickly add or delete what I deem the appropriate form. There will always be noise and lots of it. I would rather have the developer focus on the usability of the interface than “waste” time on tweaking algorithms a tiny bit more.Microsoft PowerPoint Clip Art

So, along the lines of the previous posting on UX design, my requirements on a TE tool are that it allows me to

  • Process term candidates (go/no-go decision) extremely fast and
  • Move data into the TMS smoothly and flawlessly.

memoQ by Kilgray Translation Technologies* meets the first requirement very nicely. My (monolingual) test project was the PowerPoint presentations of the ECQA Certified Terminology Manager, which I had gone through in detail the previous week and which contained 28,979 English words. Because the subject matter is utterly familiar to me, there was no question as to what should make the cut and what shouldn’t. I loved that I could “race” through the list and go yay or nay; that I could merge obvious synonyms; and that I could modify term candidates to reflect their canonical form. Because the contexts for each candidate are all visible, I could have even checked the meaning in context quickly if I had needed to.

I also appreciated that there is already a stop word list in place. It was very easy to add to it, although here comes one suggestion: It would be great to have the term candidate automatically inserted in the stop-word dialog. Right now, I still have to type it in. It would safe time if it was prefilled. Since the stop word list is not very extensive (e.g. even words like “doesn’t” are missing in the English list), it’ll take everyone considerable time to build up a list, which in its core will not vary substantially from user to user. But that may be too much to ask for a first release.

As for my second requirement, memoQ term extraction doesn’t meet that (yet) (note that I only tested the transfer of data to memoQ, but not to qTerm). I know it is asking for a lot to have a workflow from cleaned-up term candidate list to terminological entry in a TMS. Here are two suggestions that would make a difference to users:

  • Provide a way to move context from the source document, incl. context source, into the new terminological entry.
  • Merging terms into one entry because they are synonyms is great. But they need to show up as synonyms when imported into the term base; none of my short forms (e.g. POS, TMS) showed up in the entry for the long forms (e.g. part of speech, terminology management systems) when I moved them into the memoQ term base.

imageMy main overall wish is that we integrate TE with authoring and translation in a way that allows companies and LSPs, writers and translators to have an efficient workflow. It is imperative in technical communication/translation to document terms and concepts. When this task is put on the translators, it is already quite late, but it is better than if it doesn’t happen. Only fast and flawless processing will allow one-person or multi-person enterprises, for that matter, to carry out terminology work as part of the content supply chain. When the “fast and flawless” prerequisite is met, even those of my translator-friends who detest the term “content supply chain” will have enough time to enjoy themselves with the more creative aspects of their profession. Then, economic requirements essential on the macro level are met, and the need of the individual to get satisfaction out of the task is fulfilled on the micro level. The TE functionality of memoQ 5.0 RC excels in design and, in my opinion, is ready for translators’ use. If you have any comments, if you agree or disagree with me, I’d love to hear it.

*Kilgray is a client of BIK Terminology.

Posted in Designing a terminology database, memoQ, Producing quantity, Selecting terms, Term extraction tool, Usability | Tagged: | 3 Comments »


Posted by Barbara Inge Karsch on November 2, 2010

For the last few years, I was part of a team called Microsoft Language Excellence. Now, I am part of a consultant group called ExcellenceTerm. To some, including excellence in one’s name might be presumptuous, even arrogant. To me, it is part of the vision.

Let’s look into the etymology. Excellence comes from Latin excellere which means to distinguish oneself or to raise oneself above. If we look up ‘to excel’ in OneLook© Dictionary Search, we find that most dictionaries define it as to do better than, to surpass, to be outstanding, to have a particular talent in something, to do better than a given standard, etc.

Is there something wrong with doing better than a particular standard? Or with being outstanding? I believe not in our Western culture. In a competitive environment, such as the Microsoft culture, there certainly is a positive connotation with the fact that you think you can surpass someone else. My vision for Microsoft Language Excellence was always to be the best resource for terminology management within the company. I believe we fulfilled that vision during most of the existence of Language Excellence.image


ExcellenceTerm is part of TermNet, the International Network of Terminology. TermNet was founded in 1988 based “on the initiative of UNESCO, with the aim to establish a network for co-operation in the field of terminology.” ExcellenceTerm is a small group of terminology consultants who are working on various projects, including a certification program for terminologists called the ECQA Certified Terminology Manager.

Economic ups-and-downs aside, we all have to be motivated in our professional lives in order to keep our jobs, make a living, not burn out, etc. Striving for excellence—not achieving perfection—is for me a healthy way to add value and enjoy what we are doing.

Posted in Skills and qualities, TermNet | Tagged: , , , | Leave a Comment »

To centralize or not to centralize—it’s not even a question

Posted by Barbara Inge Karsch on October 21, 2010

In May, I saw the announcement of a new research brief by Common Sense Advisory, which, according to its summary, would explain why companies are starting to centralize their language services. That made sense to me. In fact, it made me happy.

Not happy enough to cough up the money to purchase the study, I am afraid. But as people interested in terminology management, don’t you think that the following paragraph from the announcement sounds good? “Large organizations have begun consolidating their translation activities into internal service groups responsible for a broad range of language-related functions. This brief outlines the rationale behind and steps involved in enterprise language processing, including centralized operations, process re-engineering, automation, and content and metadata remediation.”

It sounds good, because anything else but a centralized service for prescriptive terminology management in an enterprise would be counterproductive. A centralized terminology database with a centralized service allows an entire company to contribute to and make use of the asset. According to Fred Lessing’s remark in an earlier posting, Daimler did a good job with this. Here is what they and companies, such as IBM and SAP, who have had a centralized service for years, if not decades, are getting out of it:

  • Standardization: If product teams reuse terms, it leads to consistent corporate language. Documenting a term once and reusing it a million times, helps getting a clear message out to the customer and sets a company off from its competitors.
  • Cost savings: The Gilbane Group puts it nicely in this presentation: “Ca-ching each time someone needs to touch the content.” It might cost $20 to set up one entry initially, but ten questions that didn’t need to be asked, might save $200 and a lot of aggravation. There are many terminology questions that come in for a major release. If I remember correctly, there were 8000 questions for a Windows Server release back when things hadn’t been centralized; many translators asked the same question or asked because they couldn’t access the database.
  • Skills recycling: That’s right. It takes “strange” skills to set up a correct and complete entry. A person who does it every now and then might not remember what the meaning of a data category field, forgets the workflow, or simply can’t understand the question by a translator. And yet, entries have to be set up quickly and reliably, otherwise we get the picture painted in this posting. A centralized team, who does it all the time, refines skills further and further, and again, saves time because no questions need to be asked later.

But all that glitters is not gold with centralization either. There are drawbacks, which a team of committed leaders should plan for:

  • Scale: Users, contributors and system owners all have to be on board. And that takes time and commitment, as the distance between people in the system may be large, both physically and philosophically. Evangelization efforts have to be planned.
  • Cost allocation: A centralized team might be in a group that doesn’t produce revenue. As a member of terminology teams, I have worked in customer support, Fliegenpilzcontent publishing, product teams, and the training and standardization organization. When I had a benchmarking conversation with the Daimler team in 2007, they were located in HR. The label of the organization doesn’t matter so much than whether the group receives funding for terminology work from those groups that do generate revenue. Or whether the leadership even just gets what the team is doing.

I believe that last point is what broke the camel’s back at Microsoft: Last week, the centralized terminologist team at Microsoft was dismantled. The terminologist in me is simply sad for all the work that we put in to build up a centralized terminology management service. The business person in me is mad for the waste of resources. And the human worries about four former colleagues who were let go, and the rest who were re-organized into other positions. Here is good luck to all of them!

Posted in Return on investment, Skills and qualities, Standardizing entries | Tagged: , , , | 1 Comment »

Quantity AND Quality

Posted by Barbara Inge Karsch on September 16, 2010

In If quantity matters, what about quality? I promised to shed some light on how to achieve quantity without skimping on quality. In knowledge management, it boils down to solid processes supported by reliable and appropriate tools and executed by skilled people. Let me drill down on some aspects of setting up processes and tools to support quantity and quality.

If you cannot afford to build up an encyclopedia for your company (and who can?), select metadata carefully. The number and types of data categories (DCs), as discussed in The Year of Standards, can make a big difference. That is not to say use less. Use the right ones for your environment.

Along those lines, hide data categories or values where they don’t make sense. For example, don’t display Grammatical Gender when Language=English; invariably a terminologist will accidentally select a gender, and if only a few users wonder why that is or note the error, but can’t find a way to alert you to it, too much time is wasted. Similarly, hide Grammatical Number, when the Part of Speech=Verb, and so on.

Plan dependent data, such as product and version, carefully. For example, if versions for all your products are numbered the same way (e.g. 1, 2, 3,..), it might be easiest to have two related tables. If most of your versions have very different version names, you could have one table that lists product and version together (e.g. Windows 95, Windows 2000, Windows XP, …); it makes information retrievable slightly simpler especially for non-expert users. Or maybe you cannot afford or don’t need to manage down to the version level because you are in a highly dynamic environment.Anton by Lee Dennis

Enforce mandatory data when a terminologist releases (approves or fails) an entry. If you  decided that five out of your ten DCs are mandatory, let the tool help terminologists by not letting them get away with a shortcut or an oversight.

It is obviously not an easy task to anticipate what you need in your environment. But well-designed tools and processes support high quality AND quantity and therefore boost your return on investment.

On a personal note, Anton is exhausted with anticipation of our big upcoming event: He will be the ring bearer in our wedding this weekend.

Posted in Advanced terminology topics, Designing a terminology database, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Tool | Tagged: , , | 1 Comment »

If quantity matters, what about quality?

Posted by Barbara Inge Karsch on September 9, 2010

Linguistic quality is one of the persistent puzzles in our industry, as it is such an elusive concept. It doesn’t have to be, though. But if only Microsoft Clip Artquantity matters to you, you are on your way to ruining your company’s linguistic assets.

Because terminology management is not an end in itself, let’s start with the quality objective that users of a prescriptive terminology database are after. Most users access terminological data for support with monolingual, multilingual, manual or automated authoring processes. The outcomes of these processes are texts of some nature. The ultimate quality goal that terminology management supports with regard to these texts could be defined as “the text must contain correct terms used consistently.” In fact, Sue Ellen Wright “concludes that the terminology that makes up the text comprises that aspect of the text that poses the greatest risk for failure.” (Handbook of Terminology Management)

In order to get to this quality goal, other quality goals must precede it. For one, the database must contain correct terminological entries; and second, there must be integrity between the different entries, i.e. entries in the database must not contradict each other.

In order to attain these two goals, others must be met in their turn: The data values within the entries must contain correct information. And the entries must be complete, i.e. no mandatory data is missing. I call this the mandate to release only correct and complete entries (of course, a prescriptive database may contain pre-released entries that don’t meet these criteria yet).

Let’s see what that means for terminologists who are responsible for setting up, approving or releasing a correct and complete entry. They need to be able to:

  • Do research.
  • Transfer the result of the research into the data categories correctly.
  • Assure integrity between entries.
  • Approve only entries that have all the mandatory data.
  • Fill in an optional data category, when necessary.

Let’s leave aside for a moment that we are all human and that we will botch the occasional entry. Can you imagine if instead of doing the above, terminologists were told not to worry about quality? From now on, they would:

  • Stop at 50% research or don’t validate the data already present in the entry.
  • Fill in only some of the mandatory fields.
  • Choose the entry language randomly.
  • Add three or four different designations to the Term field.
  • ….

Microsoft Clip ArtDo you think that we could meet our number 1 goal of correct and consistent terminology in texts? No. Instead a text in the source language would contain inconsistencies, spelling variations, and probably errors. Translations performed by translators would contain the same, possibly worse problems. Machine translations would be consistent, but they would consistently contain multiple target terms for one source term, etc. The translation memory would propagate issues to other texts within the same product, the next version of the product, to texts for other products, and so on. Some writers and translators would not use the terminology database anymore, which means that fewer errors are challenged and fixed. Others would argue that they must use the database; after all, it is prescriptive.

Unreliable entries are poison in the system. With a lax attitude towards quality, you can do more harm than good. Does that mean that you have to invest hours and hours in your entries? Absolutely not. We’ll get to some measures in a later posting. But if you can’t afford correct and complete entries, don’t waste your money on terminology management.

Posted in Advanced terminology topics, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Terminology methods, Terminology principles | Tagged: , , | 1 Comment »

Quantity matters

Posted by Barbara Inge Karsch on August 19, 2010

Losing a terminologist position because the terminologist couldn’t show any quantitative progress is shocking. But it happened, according to a participant of the TKE conference that just concluded in Dublin. While managing terminology is a quality measure, quantity must not be disregarded. After all, a company or organization isn’t in it for the fun of it. Here are numbers that three teams established in different types of databases.

At J.D. Edwards, quality was a big driving factor. Each conceptual entry passed through a three-step workflow before it was approved. The need for change management was extremely low, but the upfront investment was high. Seven full-time terminologists who worked 1/3 of their time on English entries, 1/3 of their time on entries in their native language and 1/3 of the time on other projects, produced just below 6000 conceptual entries between 1999 and 2003.

In comparison, the Microsoft terminology database contained 9000 concepts in January of 2005, most of them (64%) not yet released (for more details see this article in the German publication eDITion). The team of five full-time English terminologists, who spent roughly 50% of their time on terminology work, increased the volume to about 30,000 in the five following years, 95% of which were released entries. The quality of the entries was not as high at JDE, and there was less complex metadata available (e.g. no concept relations).

Rikstermbanken According to Henrik Nilsson, at Swedish Centre for Terminology, TNC, three fulltime resources built up a terminology database, the Risktermbanken,  with 67.000 conceptual entries in three years. That seems like a large number. But one has to take into consideration that the team consolidated data from many different sources in a more or less automated fashion. The entries have not been harmonized, as one of the goals was to show the redundancy of work between participating institutions. The structure of the entries is deliberately simple.

The needs that these databases serve is different: In a corporation, solid entries that serve as prescriptive reference for the product releases are vital. Entries in a collection from various sources, such as in national terminology banks, serve to support the public and public institutions. They may not be harmonized yet, but contain a lot of different terminology for different users. And they may not be prescriptive.

As terminologists, we are sometimes very focused on quality. But let’s not forget that eventually someone will want to hear what has been accomplished by a project. The number of entries is one of the easiest way to communicate that to a business person.

Posted in J.D. Edwards TDB, Microsoft Terminology Studio, Producing quantity, Return on investment, Rikstermbanken | Tagged: , | 5 Comments »

Who has the last word?

Posted by Barbara Inge Karsch on June 17, 2010

In most cases, terminology research leads to one obvious target terminology solution. But sometimes there are several options and many people are involved. How then do you make a decision? Who is the final authority? When a new target term is coined or a controversial term needs to be changed, stakeholders become extremely passionate about these questions (see also a recent discussion in the LinkedIn Terminology group).

Terminologist as the hubI believe it is simply the wrong approach to this puzzle. Even just asking the question about authority does not help negotiations. Instead, let’s look at the process: The target terminologist does the research, because that is what they are trained and paid for. Research means accessing and evaluating pertinent resources to find the answer to the terminological problem. Resources may be print and online dictionaries, books and technical magazines, websites and terminology portals, the product itself, related products and material, and, yes, subject matter experts (SMEs).

Target terminologists are the hub in the middle of a bunch of experts. Sometimes they turn out to be self-proclaimed experts or people who are simply passionate about their native language. But especially in localization environments, a terminologist is a generalist and must never work in isolation. A good terminology management system (TMS) supports the terminologist by allowing knowledge sharing by others, voting, etc.

After doing the research, after consulting experts, after weighing each term candidate carefully, the answer should be apparent to the terminologist as well as to stakeholders. Here are some of the aspects that must be taken into consideration:

  • Linguistic presentation—is the new term a well-motivated term? Again, the DTT/DIT Best Practice contains a well-structured list of criteria.
  • Budgetary concerns—does changing from an old to a new term completely blow some product group’s budget and will they therefore not go for the new suggestion?
  • Input by end-users—when the term replaces an existing term: do end-users simply not understand the old term or have a strong dislike for it?
  • Sociolinguistic aspects—how rooted is the old term already in common parlance?
  • And finally, in certain environments, political aspects—e.g. are enough stakeholders convinced to make the change so that it will actually be successful?

It goes without saying that in each situation the criteria need to be weighted differently. For Windows Vista, for example, the German term for “to download” was changed from downloaden to herunterladen. The budgetary impact was high due to the high occurrence of the term in Microsoft material. Who had the final authority on that one? Well, a user survey conducted by the German terminologist at CeBIT in Hannover revealed that many users, even techies attending the computer fair, did not like the Anglicism. The German terminologist made the case to product groups, and the change was implemented by mutual agreement.

So, why do I say that the answer “should” be apparent? The most obvious reason is that, just like everyone else, terminologists are human and make mistakes—another good reason to not work in isolation. Apart from that, here are other aspects that I have observed impacting the negotiation process in today’s virtual world:

  • Culture, gender and hierarchy: At J.D. Edwards, some handbooks were translated multiple times into Japanese. Each time a more high-ranking person in the Japanese subsidiary had a complaint, the books or certain portions of them might be retranslated. Similarly, there was “terminology du jour”—terms that changed based on the input of the subsidiary and with little guidance by the female Japanese terminologist. Gender and hierarchy have an impact on terminological decisions in certain cultures.
  • Outsourcing: An external target terminologist isn’t necessarily in the strongest position. Many may not have contact with the local market subsidiary, because there is none or it is not staffed to discuss terminology. The worst case is, though, when the linguist makes a perfectly sound suggestion, but the counter-suggestion from the subsidiary prevails because it came from the client or the perceived expert. Subsidiary PMs may have strong technical knowledge, but that does not mean that they are always completely clear a) about the concept, b) about the impact of a term change, or even c) whether the term works for the end user. It is a terminologist’s job to assess how valuable the input of an expert is.
  • Expertise and experience: Let’s face it—some terminologists don’t deserve to be called terminologists. Terms are small, if not the smallest unit of knowledge, and terminologists need to deal with dozens of them on a daily basis. Experts usually don’t get paid or have no time for terminology work. Therefore, communication better be efficient and fast. It takes tremendous skill to make a concise case and get to a solution smoothly and efficiently.

In my experience, the best designations result from the work of a team of equals who draw on each others’ strength. In many software scenarios, the ultimate SME does not even exist—terminology-management and subject-matter expertise is contributed by different parties to an online centralized terminology database managed by the terminologist for the respective language.

What is your experience—do too many cooks spoil the terminology database? Or does it take a village?

Posted in Negotiation skills, Researching terms, Subject matter expert, Terminologist, Terminology 101, Translator | Tagged: , , , , , , , | 2 Comments »


Get every new post delivered to your Inbox.

Join 99 other followers

%d bloggers like this: