BIK Terminology—

Solving the terminology puzzle, one posting at a time

  • Author

    Barbara Inge Karsch - Terminology Consulting and Training

  • Images

    Cathedral of Our Lady, Antwerp, by Barbara Inge Karsch

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 70 other followers

Archive for the ‘Process’ Category

Is “cloud” a technical term (yet)?

Posted by Barbara Inge Karsch on October 11, 2011

We have jargon, we have words, we have phrases…we have terms. Can words become terms? How would that happen? And has “the cloud” arrived as a technical concept yet?

clip_image003Cloud, as a word, is part of our everyday vocabulary. With the summer over, it’ll again be part of our daily lives in the Pacific Northwest for the next eight months. On the right is a good definition from the Merriam Webster Learner’s Dictionary. The Learner’s Dictionary is not concerned with technical language, as it is compiled for non-native speakers. So, the definition doesn’t allude to the fact that clouds, in a related sense, are also part of the field of meteorology and therefore part of a language for special purposes (LSP).

When common everyday words are used in technical communication and with specialized meaning, they have become terms through a process called terminologization. Is cloud, as in cloud computing, there yet? Or is it still in this murky area where marketing babel meets technical communication? It certainly was initially.

Here is a great blog on when cloud was used for the first time. Author John M. Willis asked his Twitter followers Who Coined The Phrase Cloud Computing? and could then trace back the first occurrences to May of 1997 and a patent application for “cloud computing” by NetCentric; then to a 1999 NYT article that referred to a Microsoft “cloud of computers”, and finally to a speech by Google’s Eric Schmidt who Willis says he would pick as the moment when the cloud metaphor became mainstream.

Cloud Managed, really? Picture by BIK

That was 2006, and “the cloud” may have become part of the tech world’s hype, but it wasn’t a technical term with a solid and clearly delineated definition. As Willis points out “cloud computing was a collection of related concepts that people recognized, but didn’t really have a good descriptor for, a definition in search of a term, you could say.”

Yes, we had the designator, but did we really have a clear definition? In my mind, everyone defined it differently. For a while, the idea of “the cloud” was batted around mostly by marketing and advertising folks whose job it is to use hip language and create positive connotations. When “the cloud” and other marketing jargon sound like dreams coming true to disposed audiences, they usually spell nightmare to terminologists. The path of a “cloud dream” into technical language is a difficult one. In 2008, I was part of a terminology taskforce within the Windows Server team who tried to nail down what cloud computing was. I believe the final definition wasn’t set when I left in May 2010.

On a recent walk, though, I took my resident Azure architect evangelist (See You say Aaaazure, I say Azuuuure…) through a good analysis of the conceptual area. Although Greg kept saying that some of the many companies in cloud computing these days “would also include x, y, or z,” x, y and z all turned out to not be “essential characteristics.” And we ended up with the following definition. It is based largely on the one published by Netlingo, but modified to meet more of the criteria of a terminological definition:

clip_image005

“A type of computing in which dynamic, scalable and virtual resources are provided over the Internet and which includes services that provide common business applications online and accessible from a Web browser, while the software and data are stored on servers.”

Wouldn’t it be great, if a terminologist could stand by to assist any time a new concept is being created somewhere? Then, we’d have nice definitions and well-formed terms and appellations right away. Since that is utopia, at least it helps to be aware that language is in flux, that marketing language might be deliberately nebulous, and that it might take time before a majority of experts have agreed on what something is and how it is different from other things around it. I think “the cloud” and “cloud computing” have been terminologized and arrived in technical language.

 

Posted in Branding, Coining terms, Content publisher, Interesting terms, Terminologist | Tagged: , , , | 10 Comments »

Terminology extraction with memoQ 5.0 RC

Posted by Barbara Inge Karsch on August 15, 2011

In the framework of a TermNet study, I have been researching and gathering data about terminology management systems (TMS). We will not focus on term extraction tools (TE), but since one of our tools candidates recently released a new term extraction module, I wanted to check it out. Here is what I learned from giving the TE functionality of memoQ 5.0 release candidate a good run.

Let me start by saying that this test made me realize again how much I enjoy working with terminological data; I love analyzing terms and concept, researching meaning and compiling data in entries; to me it is a very creative process. Note furthermore that I am not an expert in term extraction tools: I was a serious power-user of several proprietary term extraction tools at JDE and Microsoft; I haven’t worked with the Trados solution since 2003; and I have only played with a few other methods (e.g. Word/Excel and SynchroTerm). So, my view of the market at the moment is by no means a comprehensive one. It is, however, one of a user who has done some serious term mining work. One of the biggest projects I ever did was Axapta 4.0 specs. It took us several days to even just load all documents on a server directory; it took the engine at least a night to “spit out” 14,000 term candidates; and it took me an exhausting week to nail down 500 designators worth working with.

As a mere user, as opposed to a computational linguist, I am not primarily interested in the performance of the extraction engine (I actually think the topic is a bit overrated); I like that in memoQ I can set the minimum/maximum word lengths, the minimum frequency, and the inclusion/exclusion of words with numbers (the home-grown solutions had predefined settings for all of this). But beyond the rough selection, I can deal with either too many or too few suggestions, if the tool allows me to quickly add or delete what I deem the appropriate form. There will always be noise and lots of it. I would rather have the developer focus on the usability of the interface than “waste” time on tweaking algorithms a tiny bit more.Microsoft PowerPoint Clip Art

So, along the lines of the previous posting on UX design, my requirements on a TE tool are that it allows me to

  • Process term candidates (go/no-go decision) extremely fast and
  • Move data into the TMS smoothly and flawlessly.

memoQ by Kilgray Translation Technologies* meets the first requirement very nicely. My (monolingual) test project was the PowerPoint presentations of the ECQA Certified Terminology Manager, which I had gone through in detail the previous week and which contained 28,979 English words. Because the subject matter is utterly familiar to me, there was no question as to what should make the cut and what shouldn’t. I loved that I could “race” through the list and go yay or nay; that I could merge obvious synonyms; and that I could modify term candidates to reflect their canonical form. Because the contexts for each candidate are all visible, I could have even checked the meaning in context quickly if I had needed to.

I also appreciated that there is already a stop word list in place. It was very easy to add to it, although here comes one suggestion: It would be great to have the term candidate automatically inserted in the stop-word dialog. Right now, I still have to type it in. It would safe time if it was prefilled. Since the stop word list is not very extensive (e.g. even words like “doesn’t” are missing in the English list), it’ll take everyone considerable time to build up a list, which in its core will not vary substantially from user to user. But that may be too much to ask for a first release.

As for my second requirement, memoQ term extraction doesn’t meet that (yet) (note that I only tested the transfer of data to memoQ, but not to qTerm). I know it is asking for a lot to have a workflow from cleaned-up term candidate list to terminological entry in a TMS. Here are two suggestions that would make a difference to users:

  • Provide a way to move context from the source document, incl. context source, into the new terminological entry.
  • Merging terms into one entry because they are synonyms is great. But they need to show up as synonyms when imported into the term base; none of my short forms (e.g. POS, TMS) showed up in the entry for the long forms (e.g. part of speech, terminology management systems) when I moved them into the memoQ term base.

imageMy main overall wish is that we integrate TE with authoring and translation in a way that allows companies and LSPs, writers and translators to have an efficient workflow. It is imperative in technical communication/translation to document terms and concepts. When this task is put on the translators, it is already quite late, but it is better than if it doesn’t happen. Only fast and flawless processing will allow one-person or multi-person enterprises, for that matter, to carry out terminology work as part of the content supply chain. When the “fast and flawless” prerequisite is met, even those of my translator-friends who detest the term “content supply chain” will have enough time to enjoy themselves with the more creative aspects of their profession. Then, economic requirements essential on the macro level are met, and the need of the individual to get satisfaction out of the task is fulfilled on the micro level. The TE functionality of memoQ 5.0 RC excels in design and, in my opinion, is ready for translators’ use. If you have any comments, if you agree or disagree with me, I’d love to hear it.

*Kilgray is a client of BIK Terminology.

Posted in Designing a terminology database, memoQ, Producing quantity, Selecting terms, Term extraction tool, Usability | Tagged: | 3 Comments »

HCI International 2011

Posted by Barbara Inge Karsch on August 11, 2011

In July, I spent two days at Human Computer Interaction International 2011 in Orlando, Florida, with hundreds of UX designers, usability analysts, engineers and researchers from around the world. It surprised me that language as part of usability was mentioned just a few times. Furthermore, I didn’t expect to hear so much about the struggle of usability professionals within company hierarchies and cultures. It also occurred to me that many terminology management systems (TMS) may not have taken usability all that seriously so far.

Thunderstorm over Disney WorldChallenged by a missed flight and an extra night in DC, I managed to attend about 40 presentations. None of them even mentioned language, let alone terminology as a focus point or issue. Although Helmut Windl from Continental Automotive GmbH had a wonderful series of translation errors as an intro to his paper on Empathy as a Key Factor for Successful Intercultural HCI Design. Linguistic faux pas are always good for a laugh. As you might expect, my own paper, Terminology Precision—A Key Factor in Product Usability and Safety, was focused on avoiding such faux pas, particularly in the life sciences where blunders could be less than funny.

What came across in more than one presentation is that UX professionals, like language professionals, struggle with their status in an enterprise. Clemens Lutsch from Microsoft Deutschland GmbH gave a good presentation on making the case for usability standards to management that had useful ideas for us terminologists as well, e.g., what he called “the trap of the cost is already there”. What he means with this is that existing roles already take care of the task, say, user-centered design or, for us, something like term formation, so why bother changing anything. The awareness that these employees may not have the right skill set does not (always) exist. Usability folks and terminologists can form alliances on more than one front.

Usability Standards across the Development Lifecycle by Theofanos and StantonLutsch’s was part of a whole session on ISO usability standards and enterprise software. The award winning paper of this track (Design, User Experience, and Usability) by Theofanos and Stanton of the National Institute of Standards and Technology (US) introduced a comprehensive overview of all the standards provided or proposed by the respective ISO technical committee(s) and IEC. The graphic on the left which stems from the paper has lots of detail. But the main point of showing it here is that it has the user at the center and that any and all design tasks revolve around user needs.

I have participated in software development for terminology management systems (as well as in others) and this view was never the prevailing one. The result was often that TMS users struggled with the software: They would rather work in Excel and then import the data than work in the interface that was to support and facilitate their work.

So, here is a challenge to the designers and developers of TMS: Don’t provide systems that do a wonderful job hosting data; provide systems that allow us to do terminology work efficiently and reliably. In Quantity AND Quality, I discussed a few of the easy things that can be done on the interface level. I would love to see tools being developed following not only the soon to be released ISO 26162, but also the usability standards put forth by ISO TC 159, (Ergonomics). By the same token, let the usability and ergonomics people in the committee inspire the rest of their industry. After all their scope includes “standardization in the field of ergonomics, including terminology, methodology, and human factors data.”

Posted in Designing a terminology database, Events, Usability | Tagged: , , , , | Leave a Comment »

Avoiding doublettes or a report from the ISO meetings in Korea

Posted by Barbara Inge Karsch on June 23, 2011

One of the main reasons we have doublettes in our databases is that we often don’t get around to doing proper terminological analysis. I was just witness to and assistant in a prime example of a team doing this analysis at the meetings of ISO TC37.

ISO TC 37 is the technical committee for “Terminology and other language and content resources.” It is the standards body responsible for standards such as ISO 12620 (now retired, as discussed in an earlier posting), 704 (as discussed here) or soon 26162 (already quoted here). This year, the four subcommittees (SCs) and their respective working groups (WGs) met in Seoul, South Korea, from June 12 through 17.

One of these working groups had considerable trouble coming to an agreement on various aspects of a standard. Most of us know how hard it is to get subject matter experts (or language people!) to agree on something. Imagine a multi-cultural group of experts who are tasked with producing an international standard and who have native languages other than English, the language of discussion! The convener, my colleague and a seasoned terminologist, Nelida Chan, recognized that the predicament could be alleviated by some terminology work, more precisely by thorough terminological analysis.

First, she gave a short overview of the basics of terminology work, as outlined in ISO 704 Terminology work – Principles and methods. Then the group agreed on the subject field and listed it on a white board. Any of the concepts up for discussion had to be in reference to this subject field; if the discussion drifted off into general language, the reminder to focus on the subject field was right on the board.

The group knew that they had to define and name three different concepts that they had been struggling with, although lots of research had been done; so we put three boxes on the board as well. We then discussed, agreed on and added the superordinate to each box, which was the same in each case. We also discussed what distinguished each box from the other two. Furthermore, we found examples of the concepts and added what turned out to be subordinates right into the appropriate box. Not until then did we give the concepts names. And now, naming was easy.View from the meeting room onto Olympic National Park in Seoul, by BIK

Step 1 .

Subject field

.
Step 2 Superordinate Superordinate Superordinate
Step 3 Distinguishing characteristic 1
Distinguishing characteristic 2
Distinguishing characteristic 1
Distinguishing characteristic 2
Distinguishing characteristic 1
Distinguishing characteristic 2
(Step 4) . Subordinate
Subordinate
.
Step 5 Designator Designator Designator

 

After this exercise, we had a definition, composed of the superordinate and its distinguishing characteristics as well as terms for the concepts. Not only did the group agree on the terms and their meanings, the data can now also be stored in the ISO terminology database. Without doublettes.

Granted, as terminologists we don’t often have the luxury of having 15 experts in one room for a discussion. But sometimes we do: I remember discussing terms and appellations for new gaming concepts in Windows Vista with marketing folks in a conference room at the Microsoft subsidiary in Munich. Even if we don’t have all experts in shouting distance, we can proceed in a similar fashion and collect the information from virtual teams and other resources in our daily work. It may take a little bit to become fluent in the process, but terminological analysis helps us avoid doublettes and pays off in the long run.

Posted in Events, Researching terms, Standardizing entries, Subject matter expert, Terminologist, Terminology 101, Terminology methods, Terminology principles | Tagged: , , , , , | 3 Comments »

Why doublettes are bad

Posted by Barbara Inge Karsch on June 15, 2011

One of the main reasons of having a concept-oriented terminology database is that we can set up one definition to represent the concept and can then attach all its designations, including all equivalents in the target language. It helps save cost, drive standardization and increase usability. Doublettes offset these benefits.

The below diagrams are simplifications, of course, but they explain visually why concept orientation is necessary when you are dealing with more than one language in a database. To explain it briefly: once the concept is established through a definition and other concept-related metadata, source and target designators can be researched and documented. Sometimes this research will result in multiple target equivalents when there was only one source designator; sometimes it is just the opposite, where, say, the source languages uses a long and a short form, but the target language only has a long form.

imageimage

If you had doublettes in your database it not only means that the concept research happened twice and, to a certain level, unsuccessfully. But it also means that designators have to be researched twice and their respective metadata has to be documented twice. The more languages there are, the more expensive that becomes. Rather than having, say, a German terminologist research the concept denoted by automated teller machine, ATM and electronic cash machine, cash machine, etc. two or more times, research takes place once and the German equivalent Bankautomat is attached as equivalent potentially as equivalent for all English synonyms.

Doublettes also make it more difficult to work towards standardized terminology. When you set up a terminological entry including the metadata to guide the consumer of the terminological data in usage, standardization happens even if there are multiple synonyms. Because they are all in one record, the user has, e.g. usage, product, or version information to choose the applicable term for their context. But it is also harder to use, because the reader has to compare two entries to find the guidance.

And lastly, if that information is in two records, it might be harder to discover. Depending on the search functionality, the designator and the language of the designator, the doublettes might display in one search. But chances are that only one is found and taken for the only record on the concept. With increasing data volumes more doublettes will happen, but retrievability is a critical part of usability. And without usability, standardization is even less likely and even more money was wasted.

Posted in Maintaining a database, Return on investment, Standardizing entries | Tagged: , , , , , | 1 Comment »

Doublettes—such a pretty term, yet such a bad concept

Posted by Barbara Inge Karsch on June 10, 2011

Sooner rather than later terminologists need to think about database maintenance. Initially, with few entries in the database, data integrity is easy to warrant: In fact, the terminologist might remember about any entry they ever compiled; my Italian colleague, Licia, remembered just about any entry she ever opened in the database. But even the best human brains will eventually ‘run out of memory’ and blunders will happen. One of these blunders are so called doublettes.

According to ISO TR 26162, a doublette is a “terminological entry that describes the same concept as another entry.” Sometimes these entries are also referred to as duplicates or duplicate entries, but the technical term in standards is doublette. It is important to note that homonyms do not equal doublettes. In other words, two terms that are spelt the same way and that are in two separate entries may refer to the same concept and may therefore be doublettes. But they may also justifiably be listed in separate entries, because they denote slightly or completely different concepts.

As an example, I deliberately set up doublettes in i-Term, a terminology management system developed by DANTERM: The terms automated teller machine and electronic cash machine can be considered synonyms and should be listed in one terminological entry. Below you can see that automated teller machine and its abbreviated form ATM have one definition and definition source, while electronic cash machine and its abbreviated form, cash machine, are listed in a separate entry with another, yet similar definition and its definition source. During database maintenance, these entries should be consolidated into one terminological entry with all its synonyms.

clip_image002clip_image003

It is much easier to detect homographs that turn out to be doublettes. Rather, it should be easier to avoid them in the first place: after all, every new entry in a database starts with a search of the term denoting the concept; if it already exists with the same spelling, it would be a hit). Here are ‘homograph doublettes’ from the Microsoft Language Portal. While we can’t see the ID, the definition shows pretty clearly that the two entries are describing the same concept.

image

Doublettes happen, particularly in settings where more than one terminologist adds and approves entries in a database. But even if one terminologist approves all new concepts, s/he cannot guarantee that a database remains free of doublettes. The right combination of skills, processes and tool support can help limit the number, though.

Posted in iTerm, Maintaining a database, Microsoft Language Portal, Process, Setting up entries | Tagged: , | 4 Comments »

Twitterisms

Posted by Barbara Inge Karsch on March 22, 2011

Source: http://www.macmillandictionary.com/buzzword/entries/tweetheart.html

What do you call a user of the Twitter short messaging service who is liked and admired by other users?

A tweetheart! And how do you use the term? Here is an example of how Belgian tennis player, Kim Clijsters, used it in a tweet from the Yahoo-Eurosport site: "Happy Australia day to all my Aussie tweethearts!" It earned her the Tweet of the Day.

I am not a Twitter user, or tweeter, but the terminology of Twitter has been the subject of many conversations. While this social media has been emerging at an incredible pace, some of the terminology around it is quite well developed. The glossary provided by the Twitter service contains the basics. But it doesn’t list all the good (and bad) terms that have sprung up around the service.

Some of the terms that don’t work so well are impossible to pronounce. The list in this article on About.com contains designations, like Twitpocalypse, which is defined as “the moment when the identification number of individual tweets surpassed the capacity of the most common data type. The Twitpocalypse crashed a number of Twitter clients.” The motivation behind the name is clear, though.

This article* in the quarterly webzine of the Macmillan English Dictionaries, MED Magazine, has a very nice list of twitterisms. I would consider most of them quite well-motivated. If you don’t want to check out the link, here is another example: What group do people belong to whose tweets attract a large number of readers? The twitterati.

*BIK: Unfortunately this article was removed recently.

Posted in Coining terms, Interesting terms | Tagged: , , , | 6 Comments »

Brands, names and problems

Posted by Barbara Inge Karsch on March 20, 2011

The concept denoted by the term “brand” includes many different aspects of a product. Considering that it evolved from the common practice of burning a mark into cattle for identification, it certainly contains the aspect of marks or symbols.

In his book, brand failures, Matt Haig[1] says that ‘[b]rands need to acknowledge cultural differences. Very few brands have been able to be transferred into different cultures without changes to their formula.’ He then lists many of the well-known cases where translation errors or naming misfortunes did lasting damage to a brand:

Beavers in Redmond by BIK

  • Clairol’s Mist Stick curling iron launched in Germany: Mist is the German word for manure.
  • The Silver Mist car by Rolls Royce was not a good choice for the German language market for the same reason.
  • Rover connotes a dog; apparently, Land Rover had a problem selling cars; I am not sure that is still true. That connotation would obviously not bug me very much.

These are funny, if you are not the branding manager of the respective product. At Microsoft, product names, but also many feature names went through a process called a globalization review. A target language terminologist, who is a native speaker of the target-market language, reviews the suggested name for undesirable connotations in the target culture.

If the English name of a new feature is not to be retained in the target-language software, a so-called localizability review is performed. During this evaluation, the terminologist checks whether the connotations that the appellation has in English can be retained easily in the target language. They often try to find a designation that is very close to the original. If that is not possible, they will let the requesting product group know.

Here is a nice list of brand naming considerations offered by brand naming company, Brand Periscope, on their website:

  • easy to say and spell
  • memorable
  • extendable, has room for growth
  • positive feeling
  • international; doesn’t have bad meanings in other languages
  • available; from trademark and domain perspective
  • meaning, has relevance to your business

Sounds simple, but this terminology task is something that is forgotten very often. Product developers might have very little exposure to other cultures and/or languages and don’t think to include terminology or linguistic tasks or checks in their development process. When translators, localizers and terminologists point out a faux-pas, it often is either not taken seriously or it comes too late.

1. Haig, M., brand Failures: The Truth About the 100 Biggest Branding Mistakes of All Times. 2003, London: Kogan Page Limited. 309.

Posted in Branding, Coining terms | Tagged: , | 1 Comment »

A home run is a home run is a home run?

Posted by Barbara Inge Karsch on February 20, 2011

Indeed. Except if it has been “determinologized.” If terminologization is when a common everyday word turns into a technical term, then the reverse process is when a technical term from a technical subject field becomes part of our everyday vernacular. The process was identified, analyzed and, I believe, named by Ingrid Meyer and Kristen Mackintosh in a paper in Terminology in 2000.

They describe two categories of determinologization.

1. The term retains essentially the same meaning, but is no longer used by subject matter experts referring to a concept in their field. Rather the subject matter might have become popularized, and laypersons understand enough about the concept to use the term. The term in the layman’s use refers to a “more shallow” meaning of the concept or one that also has other connotations.

Good examples are medical terms of diseases that are prevalent enough for all of us to have an idea about them. Insomnia, for instance, is a condition that for medical professionals is highly complex. imageThey might break it down into sleep maintenance insomnia, sleep-onset insomnia, etc. and treated it with benzodiazepines. It might be chronic or intermittent, familial or even fatal.

At some point, we all might have talked about it in a less medical sense. Here is the entry in the Urban Dictionary—a listing on this website is a good indication that a term has become a word in common usage. And to the right is an excerpt from the South African Mail&Guardian about a chess player who can’t sleep before competitions.

In these examples, the meaning behind the word “insomnia” remains the same as in the medical context: Someone can’t sleep. But our associations don’t take us to the clinical setting, rather we get a sense of the mood of the sufferer or the chosen cure.

2. The word now describes a completely different concept. While it shares some characteristics with the meaning in the technical subject of the term, it does no longer share the essential characteristics.

The Monday-morning quarterback is not a John Elway or Peyton Manning rising on the first workday of the week. It is the guy who watched them the day before and now tells his buddies how the quarterback could have done a better job or how anyone could have done a better job in any subject matter. The essence of the concept in sports, i.e. an American football player, is completely gone in this general use of “quarterback.”clip_image004

A term from another sport, baseball, which has been determinologized, is home run. The excerpt from the Wall Street Journal shows that when someone hits a home run, there is no batter involved, not even a hit in the sport’s sense. But it is “a success.”

Enough baseball terminology has made it into the American vernacular that Dr. Jerry Roth at Sprachen- und Dolmetscher Institut in Munich gave it a special focus during our studies. He even had us meet in Englischer Garten for a game.

Why do we care? Well, if we create new terms, borrow them from other fields or languages, terminologize or determinologize them, the receiver of our message—and that does include translators in many cases—only understands it if our usage has the appropriate level of precision. Understanding the methods that we have to our avail allows us to choose the best methods. The likelihood that others will understand our message then is much higher. And after all, understanding is what communication is about.

Posted in Advanced terminology topics, Content publisher, Interesting terms, Process, Terminologist | Tagged: , | 1 Comment »

How many terms do we need to document?

Posted by Barbara Inge Karsch on December 17, 2010

Each time a new project is kicked off this question is on the table. Content publishers ask how much are we expected to document. Localizers ask how many new terms will be used.

Who knows these things when each project is different, deadlines and scopes change, everyone understands “new term” to mean something else, etc. And yet, there is only the need to agree on a ballpark volume and schedule. With a bit of experience and a look at some key criteria, expectations can be set for the project team.

In a Canadian study, shared by Kara Warburton at TKE in Dublin, authors found that texts contain 2-4% terms. If you take a project of 500,000 words, that would be roughly 15,000 terms. In contrast, product glossary prepared for end-customers in print or online contain 20 to 100 terms. So, the discrepancy of what could be defined and what is generally defined for end-customers is large.

A product glossary is a great start. Sometimes, even that is not available. And, yet, I hear from at least one customer that he goes to the glossary first and then navigates the documentation. Ok, that customer is my father. But juxtapose that to the remark by a translator at a panel discussion at the ATA about a recent translation project (“aha, the quality of writing tells me that this falls in the category of ‘nobody will read it anyway’”), and I am glad that someone is getting value out of documentation.Microsoft ClipArt

In my experience, content publishing teams are staffed and ready to define about 20% of what localizers need. Ideally, 80% of new terms are documented systematically in the centralized terminology database upfront and the other 20% of terms submitted later, on an as-needed basis. Incidentally, I define “new terms” as terms that have not been documented in the terminology database. Anything that is part of a source text of a previous version or that is part of translation memories cannot be considered managed terminology.

Here are a few key criteria that help determine the number of terms to document in a terminology database:

  • Size of the project: small, medium, large, extra-large…?
  • Timeline: Are there five days or five months to the deadline?
  • Version: Is this version 1 or 2 of the product? Or is it 6 or 7?
  • Number of terms existing in the database already: Is this the first time that terminology has been documented?
  • Headcount: How many people will be documenting terms and how much time can they devote?
  • Level of complexity: Are there more new features? Is the SME content higher than normal?

These criteria can serve as guidelines, so that a project teams knows whether they are aiming at documenting 50 or 500 terms upfront. If memory serves me right, we added about 2700 terms to the database for Windows Vista. 75% was documented upfront. It might be worthwhile to keep track of historic data. That enables planning for the next project. Of course, upfront documentation of terms takes planning. But answering questions later is much more time-consuming, expensive and resource-intense. Hats off to companies, such as SAP, where the localization department has the power to stop a project when not enough terms were defined upfront!

Posted in Content publisher, Selecting terms, Translator | Tagged: , | Leave a Comment »

 
Follow

Get every new post delivered to your Inbox.

Join 70 other followers