Posted by Barbara Inge Karsch on June 15, 2011
One of the main reasons of having a concept-oriented terminology database is that we can set up one definition to represent the concept and can then attach all its designations, including all equivalents in the target language. It helps save cost, drive standardization and increase usability. Doublettes offset these benefits.
The below diagrams are simplifications, of course, but they explain visually why concept orientation is necessary when you are dealing with more than one language in a database. To explain it briefly: once the concept is established through a definition and other concept-related metadata, source and target designators can be researched and documented. Sometimes this research will result in multiple target equivalents when there was only one source designator; sometimes it is just the opposite, where, say, the source languages uses a long and a short form, but the target language only has a long form.
If you had doublettes in your database it not only means that the concept research happened twice and, to a certain level, unsuccessfully. But it also means that designators have to be researched twice and their respective metadata has to be documented twice. The more languages there are, the more expensive that becomes. Rather than having, say, a German terminologist research the concept denoted by automated teller machine, ATM and electronic cash machine, cash machine, etc. two or more times, research takes place once and the German equivalent Bankautomat is attached as equivalent potentially as equivalent for all English synonyms.
Doublettes also make it more difficult to work towards standardized terminology. When you set up a terminological entry including the metadata to guide the consumer of the terminological data in usage, standardization happens even if there are multiple synonyms. Because they are all in one record, the user has, e.g. usage, product, or version information to choose the applicable term for their context. But it is also harder to use, because the reader has to compare two entries to find the guidance.
And lastly, if that information is in two records, it might be harder to discover. Depending on the search functionality, the designator and the language of the designator, the doublettes might display in one search. But chances are that only one is found and taken for the only record on the concept. With increasing data volumes more doublettes will happen, but retrievability is a critical part of usability. And without usability, standardization is even less likely and even more money was wasted.
Posted in Maintaining a database, Return on investment, Standardizing entries | Tagged: cost, doublette, dublicate entry, duplicate, standardization, usability | 1 Comment »
Posted by Barbara Inge Karsch on October 21, 2010
In May, I saw the announcement of a new research brief by Common Sense Advisory, which, according to its summary, would explain why companies are starting to centralize their language services. That made sense to me. In fact, it made me happy.
Not happy enough to cough up the money to purchase the study, I am afraid. But as people interested in terminology management, don’t you think that the following paragraph from the announcement sounds good? “Large organizations have begun consolidating their translation activities into internal service groups responsible for a broad range of language-related functions. This brief outlines the rationale behind and steps involved in enterprise language processing, including centralized operations, process re-engineering, automation, and content and metadata remediation.”
It sounds good, because anything else but a centralized service for prescriptive terminology management in an enterprise would be counterproductive. A centralized terminology database with a centralized service allows an entire company to contribute to and make use of the asset. According to Fred Lessing’s remark in an earlier posting, Daimler did a good job with this. Here is what they and companies, such as IBM and SAP, who have had a centralized service for years, if not decades, are getting out of it:
- Standardization: If product teams reuse terms, it leads to consistent corporate language. Documenting a term once and reusing it a million times, helps getting a clear message out to the customer and sets a company off from its competitors.
- Cost savings: The Gilbane Group puts it nicely in this presentation: “Ca-ching each time someone needs to touch the content.” It might cost $20 to set up one entry initially, but ten questions that didn’t need to be asked, might save $200 and a lot of aggravation. There are many terminology questions that come in for a major release. If I remember correctly, there were 8000 questions for a Windows Server release back when things hadn’t been centralized; many translators asked the same question or asked because they couldn’t access the database.
- Skills recycling: That’s right. It takes “strange” skills to set up a correct and complete entry. A person who does it every now and then might not remember what the meaning of a data category field, forgets the workflow, or simply can’t understand the question by a translator. And yet, entries have to be set up quickly and reliably, otherwise we get the picture painted in this posting. A centralized team, who does it all the time, refines skills further and further, and again, saves time because no questions need to be asked later.
But all that glitters is not gold with centralization either. There are drawbacks, which a team of committed leaders should plan for:
- Scale: Users, contributors and system owners all have to be on board. And that takes time and commitment, as the distance between people in the system may be large, both physically and philosophically. Evangelization efforts have to be planned.
- Cost allocation: A centralized team might be in a group that doesn’t produce revenue. As a member of terminology teams, I have worked in customer support, content publishing, product teams, and the training and standardization organization. When I had a benchmarking conversation with the Daimler team in 2007, they were located in HR. The label of the organization doesn’t matter so much than whether the group receives funding for terminology work from those groups that do generate revenue. Or whether the leadership even just gets what the team is doing.
I believe that last point is what broke the camel’s back at Microsoft: Last week, the centralized terminologist team at Microsoft was dismantled. The terminologist in me is simply sad for all the work that we put in to build up a centralized terminology management service. The business person in me is mad for the waste of resources. And the human worries about four former colleagues who were let go, and the rest who were re-organized into other positions. Here is good luck to all of them!
Posted in Return on investment, Skills and qualities, Standardizing entries | Tagged: centralization, Common Sense Advisory, Gilbane Group, Microsoft Language Excellence | 1 Comment »
Posted by Barbara Inge Karsch on September 16, 2010
In If quantity matters, what about quality? I promised to shed some light on how to achieve quantity without skimping on quality. In knowledge management, it boils down to solid processes supported by reliable and appropriate tools and executed by skilled people. Let me drill down on some aspects of setting up processes and tools to support quantity and quality.
If you cannot afford to build up an encyclopedia for your company (and who can?), select metadata carefully. The number and types of data categories (DCs), as discussed in The Year of Standards, can make a big difference. That is not to say use less. Use the right ones for your environment.
Along those lines, hide data categories or values where they don’t make sense. For example, don’t display Grammatical Gender when Language=English; invariably a terminologist will accidentally select a gender, and if only a few users wonder why that is or note the error, but can’t find a way to alert you to it, too much time is wasted. Similarly, hide Grammatical Number, when the Part of Speech=Verb, and so on.
Plan dependent data, such as product and version, carefully. For example, if versions for all your products are numbered the same way (e.g. 1, 2, 3,..), it might be easiest to have two related tables. If most of your versions have very different version names, you could have one table that lists product and version together (e.g. Windows 95, Windows 2000, Windows XP, …); it makes information retrievable slightly simpler especially for non-expert users. Or maybe you cannot afford or don’t need to manage down to the version level because you are in a highly dynamic environment.
Enforce mandatory data when a terminologist releases (approves or fails) an entry. If you decided that five out of your ten DCs are mandatory, let the tool help terminologists by not letting them get away with a shortcut or an oversight.
It is obviously not an easy task to anticipate what you need in your environment. But well-designed tools and processes support high quality AND quantity and therefore boost your return on investment.
On a personal note, Anton is exhausted with anticipation of our big upcoming event: He will be the ring bearer in our wedding this weekend.
Posted in Advanced terminology topics, Designing a terminology database, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Tool | Tagged: data category, ISO 12620, ISOcat | 1 Comment »
Posted by Barbara Inge Karsch on September 9, 2010
Linguistic quality is one of the persistent puzzles in our industry, as it is such an elusive concept. It doesn’t have to be, though. But if only quantity matters to you, you are on your way to ruining your company’s linguistic assets.
Because terminology management is not an end in itself, let’s start with the quality objective that users of a prescriptive terminology database are after. Most users access terminological data for support with monolingual, multilingual, manual or automated authoring processes. The outcomes of these processes are texts of some nature. The ultimate quality goal that terminology management supports with regard to these texts could be defined as “the text must contain correct terms used consistently.” In fact, Sue Ellen Wright “concludes that the terminology that makes up the text comprises that aspect of the text that poses the greatest risk for failure.” (Handbook of Terminology Management)
In order to get to this quality goal, other quality goals must precede it. For one, the database must contain correct terminological entries; and second, there must be integrity between the different entries, i.e. entries in the database must not contradict each other.
In order to attain these two goals, others must be met in their turn: The data values within the entries must contain correct information. And the entries must be complete, i.e. no mandatory data is missing. I call this the mandate to release only correct and complete entries (of course, a prescriptive database may contain pre-released entries that don’t meet these criteria yet).
Let’s see what that means for terminologists who are responsible for setting up, approving or releasing a correct and complete entry. They need to be able to:
- Do research.
- Transfer the result of the research into the data categories correctly.
- Assure integrity between entries.
- Approve only entries that have all the mandatory data.
- Fill in an optional data category, when necessary.
Let’s leave aside for a moment that we are all human and that we will botch the occasional entry. Can you imagine if instead of doing the above, terminologists were told not to worry about quality? From now on, they would:
- Stop at 50% research or don’t validate the data already present in the entry.
- Fill in only some of the mandatory fields.
- Choose the entry language randomly.
- Add three or four different designations to the Term field.
Do you think that we could meet our number 1 goal of correct and consistent terminology in texts? No. Instead a text in the source language would contain inconsistencies, spelling variations, and probably errors. Translations performed by translators would contain the same, possibly worse problems. Machine translations would be consistent, but they would consistently contain multiple target terms for one source term, etc. The translation memory would propagate issues to other texts within the same product, the next version of the product, to texts for other products, and so on. Some writers and translators would not use the terminology database anymore, which means that fewer errors are challenged and fixed. Others would argue that they must use the database; after all, it is prescriptive.
Unreliable entries are poison in the system. With a lax attitude towards quality, you can do more harm than good. Does that mean that you have to invest hours and hours in your entries? Absolutely not. We’ll get to some measures in a later posting. But if you can’t afford correct and complete entries, don’t waste your money on terminology management.
Posted in Advanced terminology topics, Producing quality, Producing quantity, Return on investment, Setting up entries, Terminologist, Terminology methods, Terminology principles | Tagged: errors, quality, quantity | 1 Comment »
Posted by Barbara Inge Karsch on August 19, 2010
Losing a terminologist position because the terminologist couldn’t show any quantitative progress is shocking. But it happened, according to a participant of the TKE conference that just concluded in Dublin. While managing terminology is a quality measure, quantity must not be disregarded. After all, a company or organization isn’t in it for the fun of it. Here are numbers that three teams established in different types of databases.
At J.D. Edwards, quality was a big driving factor. Each conceptual entry passed through a three-step workflow before it was approved. The need for change management was extremely low, but the upfront investment was high. Seven full-time terminologists who worked 1/3 of their time on English entries, 1/3 of their time on entries in their native language and 1/3 of the time on other projects, produced just below 6000 conceptual entries between 1999 and 2003.
In comparison, the Microsoft terminology database contained 9000 concepts in January of 2005, most of them (64%) not yet released (for more details see this article in the German publication eDITion). The team of five full-time English terminologists, who spent roughly 50% of their time on terminology work, increased the volume to about 30,000 in the five following years, 95% of which were released entries. The quality of the entries was not as high at JDE, and there was less complex metadata available (e.g. no concept relations).
According to Henrik Nilsson, at Swedish Centre for Terminology, TNC, three fulltime resources built up a terminology database, the Risktermbanken, with 67.000 conceptual entries in three years. That seems like a large number. But one has to take into consideration that the team consolidated data from many different sources in a more or less automated fashion. The entries have not been harmonized, as one of the goals was to show the redundancy of work between participating institutions. The structure of the entries is deliberately simple.
The needs that these databases serve is different: In a corporation, solid entries that serve as prescriptive reference for the product releases are vital. Entries in a collection from various sources, such as in national terminology banks, serve to support the public and public institutions. They may not be harmonized yet, but contain a lot of different terminology for different users. And they may not be prescriptive.
As terminologists, we are sometimes very focused on quality. But let’s not forget that eventually someone will want to hear what has been accomplished by a project. The number of entries is one of the easiest way to communicate that to a business person.
Posted in J.D. Edwards TDB, Microsoft Terminology Studio, Producing quantity, Return on investment, Rikstermbanken | Tagged: eDITion, TNC | 5 Comments »
Posted by Barbara Inge Karsch on August 16, 2010
Even after nine years, the terminology ROI data from J.D. Edwards is still being quoted in the industry. The data made a splash, because it was the only data available at the time. It isn’t always quoted accurately, though, and since it just came up at TKE in Dublin, let’s revisit what the J.D. Edwards team did back then.
J.D. Edwards VP of content publishing, Ben Martin, was invited to present at the TAMA conference in Antwerp in February 2001. His main focus was on single-source publishing. Ben invited yours truly to talk more about the details of the terminology management system as part of his presentation, and he also encouraged a little study that a small project team conducted.
Ben’s argument for single-sourcing was and is simple: Write it once, reuse it multiple times; translate it once, reuse the translated chunk multiple times.
At that time, the J.D. Edwards’ terminology team and project was in its infancy. In fact, the TMS was just about to go live, as the timeline presented in Antwerp shows.
For the ROI (return on investment) study, my colleagues compared the following data:
- What does it cost to change one term throughout the J.D. Edwards software and documentation?
- What does it cost to manage one concept/term?
27 different terms were changed in various languages, and the time it took was measured. Then, the average change time was multiplied by the average hourly translation cost, including overhead. In the J.D. Edwards setting, the average cost to change one term in one language turned out to be $1900.
The average time that it took to create one entry in the terminology database had already been measured. At that early time in the project, it cost $150 per terminological entry.
The cost to manage one entry seems high. Therefore, it is important to note that
- There were three quality assurance steps in the flow of one entry for the source language English, and up to two steps in the flow of one entry for the target languages. So, the resulting entry was highly reliable, and change management was minimal.
- The cost came down dramatically over the months, as terminologists and other terminology stakeholders became more proficient in the process, standards and tool.
Both figures are highly system/environment-dependent. In other words, if it is easy to find and replace a term in the documents, it will cost less. While these figures were first published years ago, they served as the benchmark in the industry and established an ROI model that has since been used and further developed and elaborated on. If you have any opinion, thoughts or can share other information, feel free to add a comment or send me an e-mail.
Posted in Advanced terminology topics, Events, Process, Return on investment | Tagged: J.D. Edwards, ROI, TAMA | 4 Comments »