Posted by Barbara Inge Karsch on June 23, 2011
One of the main reasons we have doublettes in our databases is that we often don’t get around to doing proper terminological analysis. I was just witness to and assistant in a prime example of a team doing this analysis at the meetings of ISO TC37.
ISO TC 37 is the technical committee for “Terminology and other language and content resources.” It is the standards body responsible for standards such as ISO 12620 (now retired, as discussed in an earlier posting), 704 (as discussed here) or soon 26162 (already quoted here). This year, the four subcommittees (SCs) and their respective working groups (WGs) met in Seoul, South Korea, from June 12 through 17.
One of these working groups had considerable trouble coming to an agreement on various aspects of a standard. Most of us know how hard it is to get subject matter experts (or language people!) to agree on something. Imagine a multi-cultural group of experts who are tasked with producing an international standard and who have native languages other than English, the language of discussion! The convener, my colleague and a seasoned terminologist, Nelida Chan, recognized that the predicament could be alleviated by some terminology work, more precisely by thorough terminological analysis.
First, she gave a short overview of the basics of terminology work, as outlined in ISO 704 Terminology work – Principles and methods. Then the group agreed on the subject field and listed it on a white board. Any of the concepts up for discussion had to be in reference to this subject field; if the discussion drifted off into general language, the reminder to focus on the subject field was right on the board.
The group knew that they had to define and name three different concepts that they had been struggling with, although lots of research had been done; so we put three boxes on the board as well. We then discussed, agreed on and added the superordinate to each box, which was the same in each case. We also discussed what distinguished each box from the other two. Furthermore, we found examples of the concepts and added what turned out to be subordinates right into the appropriate box. Not until then did we give the concepts names. And now, naming was easy.
||Distinguishing characteristic 1
Distinguishing characteristic 2
|Distinguishing characteristic 1
Distinguishing characteristic 2
|Distinguishing characteristic 1
Distinguishing characteristic 2
After this exercise, we had a definition, composed of the superordinate and its distinguishing characteristics as well as terms for the concepts. Not only did the group agree on the terms and their meanings, the data can now also be stored in the ISO terminology database. Without doublettes.
Granted, as terminologists we don’t often have the luxury of having 15 experts in one room for a discussion. But sometimes we do: I remember discussing terms and appellations for new gaming concepts in Windows Vista with marketing folks in a conference room at the Microsoft subsidiary in Munich. Even if we don’t have all experts in shouting distance, we can proceed in a similar fashion and collect the information from virtual teams and other resources in our daily work. It may take a little bit to become fluent in the process, but terminological analysis helps us avoid doublettes and pays off in the long run.
Posted in Terminologist, Subject matter expert, Terminology 101, Researching terms, Events, Standardizing entries, Terminology principles, Terminology methods | Tagged: ISO 12620, ISO 704, ISO 26162, terminological analysis, ISO meeting, Seoul | 3 Comments »
Posted by Barbara Inge Karsch on June 15, 2011
One of the main reasons of having a concept-oriented terminology database is that we can set up one definition to represent the concept and can then attach all its designations, including all equivalents in the target language. It helps save cost, drive standardization and increase usability. Doublettes offset these benefits.
The below diagrams are simplifications, of course, but they explain visually why concept orientation is necessary when you are dealing with more than one language in a database. To explain it briefly: once the concept is established through a definition and other concept-related metadata, source and target designators can be researched and documented. Sometimes this research will result in multiple target equivalents when there was only one source designator; sometimes it is just the opposite, where, say, the source languages uses a long and a short form, but the target language only has a long form.
If you had doublettes in your database it not only means that the concept research happened twice and, to a certain level, unsuccessfully. But it also means that designators have to be researched twice and their respective metadata has to be documented twice. The more languages there are, the more expensive that becomes. Rather than having, say, a German terminologist research the concept denoted by automated teller machine, ATM and electronic cash machine, cash machine, etc. two or more times, research takes place once and the German equivalent Bankautomat is attached as equivalent potentially as equivalent for all English synonyms.
Doublettes also make it more difficult to work towards standardized terminology. When you set up a terminological entry including the metadata to guide the consumer of the terminological data in usage, standardization happens even if there are multiple synonyms. Because they are all in one record, the user has, e.g. usage, product, or version information to choose the applicable term for their context. But it is also harder to use, because the reader has to compare two entries to find the guidance.
And lastly, if that information is in two records, it might be harder to discover. Depending on the search functionality, the designator and the language of the designator, the doublettes might display in one search. But chances are that only one is found and taken for the only record on the concept. With increasing data volumes more doublettes will happen, but retrievability is a critical part of usability. And without usability, standardization is even less likely and even more money was wasted.
Posted in Maintaining a database, Return on investment, Standardizing entries | Tagged: cost, doublette, dublicate entry, duplicate, standardization, usability | 1 Comment »
Posted by Barbara Inge Karsch on October 21, 2010
In May, I saw the announcement of a new research brief by Common Sense Advisory, which, according to its summary, would explain why companies are starting to centralize their language services. That made sense to me. In fact, it made me happy.
Not happy enough to cough up the money to purchase the study, I am afraid. But as people interested in terminology management, don’t you think that the following paragraph from the announcement sounds good? “Large organizations have begun consolidating their translation activities into internal service groups responsible for a broad range of language-related functions. This brief outlines the rationale behind and steps involved in enterprise language processing, including centralized operations, process re-engineering, automation, and content and metadata remediation.”
It sounds good, because anything else but a centralized service for prescriptive terminology management in an enterprise would be counterproductive. A centralized terminology database with a centralized service allows an entire company to contribute to and make use of the asset. According to Fred Lessing’s remark in an earlier posting, Daimler did a good job with this. Here is what they and companies, such as IBM and SAP, who have had a centralized service for years, if not decades, are getting out of it:
- Standardization: If product teams reuse terms, it leads to consistent corporate language. Documenting a term once and reusing it a million times, helps getting a clear message out to the customer and sets a company off from its competitors.
- Cost savings: The Gilbane Group puts it nicely in this presentation: “Ca-ching each time someone needs to touch the content.” It might cost $20 to set up one entry initially, but ten questions that didn’t need to be asked, might save $200 and a lot of aggravation. There are many terminology questions that come in for a major release. If I remember correctly, there were 8000 questions for a Windows Server release back when things hadn’t been centralized; many translators asked the same question or asked because they couldn’t access the database.
- Skills recycling: That’s right. It takes “strange” skills to set up a correct and complete entry. A person who does it every now and then might not remember what the meaning of a data category field, forgets the workflow, or simply can’t understand the question by a translator. And yet, entries have to be set up quickly and reliably, otherwise we get the picture painted in this posting. A centralized team, who does it all the time, refines skills further and further, and again, saves time because no questions need to be asked later.
But all that glitters is not gold with centralization either. There are drawbacks, which a team of committed leaders should plan for:
- Scale: Users, contributors and system owners all have to be on board. And that takes time and commitment, as the distance between people in the system may be large, both physically and philosophically. Evangelization efforts have to be planned.
- Cost allocation: A centralized team might be in a group that doesn’t produce revenue. As a member of terminology teams, I have worked in customer support, content publishing, product teams, and the training and standardization organization. When I had a benchmarking conversation with the Daimler team in 2007, they were located in HR. The label of the organization doesn’t matter so much than whether the group receives funding for terminology work from those groups that do generate revenue. Or whether the leadership even just gets what the team is doing.
I believe that last point is what broke the camel’s back at Microsoft: Last week, the centralized terminologist team at Microsoft was dismantled. The terminologist in me is simply sad for all the work that we put in to build up a centralized terminology management service. The business person in me is mad for the waste of resources. And the human worries about four former colleagues who were let go, and the rest who were re-organized into other positions. Here is good luck to all of them!
Posted in Return on investment, Skills and qualities, Standardizing entries | Tagged: centralization, Common Sense Advisory, Gilbane Group, Microsoft Language Excellence | 1 Comment »
Posted by Barbara Inge Karsch on July 22, 2010
Standards are nice, but they don’t do anything for you or, more importantly, the user of your terminology database, if you are the only one applying them. But how do you get a large virtual team of terminologists or language specialists to agree on and apply standards, such as ISO 12620, to database entries? And first: Why bother climbing such a mountain?
Imagine you have a large document to author or translate. Your client gave you a dictionary to use. Because you are not sure of the meaning or usage of 50 terms, you look them up. But the dictionary holds you up more than anything: One entry contains a definition, the next one doesn’t; one provides context, but it is in a language you don’t understand; most terms make sense, but several of them are cryptic and the entry doesn’t provide clarity. If your client hadn’t insisted that you use the dictionary, you wouldn’t: It just slows you down.
The objective of a terminology database is to have consistent and correct terminology used in the product, in source as well as in target languages. To support that goal, users must be able to use a database entry quickly and easily—structure really helps here. Furthermore, users must be able to trust the information provided—transparent, clear and consistent entries create trust.
Ideally, you have a centralized team of trained terminologists who know the standards inside out and apply them religiously. If you don’t, select/create a tool that supports standards adherence as much as possible. Some simple examples: If definition is mandatory, automatically enforce it; if the term is a verb, hide the Number field; if the language is English, hide the Gender field. Tools can do a lot, but your team very likely still needs a standard.
The Microsoft terminology team did. Simply handing a standards document off to the team had not been successful in the past—nobody could remember it, many entries therefore contained unstructured, if not incorrect information, and there was no incentive to adhere to standards. A more collaborative effort was called for: Together, in-house terminologists went through data categories one by one. Because we were a virtual team, e-mail was the best form of communication. Each data category was dealt with in one e-mail that contained: the definition, a scenario and voting buttons that allowed the team to agree with the meaning or disagree and make a better suggestion. Team members could participate in the voting, but they didn’t have to. However, anyone knew from the beginning that they had to accept the outcome, regardless of whether they participated or not. After the new guide had been published, measurements were carried out and documented in a quarterly report. Terminologists then set their own deadlines for cleaning up entries to comply with the standards.
ISO 12620 doesn’t just enable data exchange, as we saw in last week’s entry. At J.D. Edwards and Microsoft, it also helped create standards guides. I am sure not every field is filled in correctly; perfection is not the point. But with shrinking budgets and tighter deadlines, a database that could cost millions of dollars must support the user as best as possible in their endeavor to create reliable communication. A standards guide based on an international standard is a good tool you can use to climb that mountain.
Posted in Content publisher, Microsoft Terminology Studio, Standardizing entries, Terminologist, Terminology 101 | Tagged: ISO 12620, standardization | 1 Comment »