BIK Terminology

Solving the terminology puzzle, one posting at a time

  • About
    • Curriculum Vitae
  • Services
  • Portfolio
  • Resources
  • Blog
  • Contact

Terminology extraction with memoQ 5.0 RC

August 15, 2011 by Barbara Inge Karsch

In the framework of a TermNet study, I have been researching and gathering data about terminology management systems (TMS). We will not focus on term extraction tools (TE), but since one of our tools candidates recently released a new term extraction module, I wanted to check it out. Here is what I learned from giving the TE functionality of memoQ 5.0 release candidate a good run.

Let me start by saying that this test made me realize again how much I enjoy working with terminological data; I love analyzing terms and concept, researching meaning and compiling data in entries; to me it is a very creative process. Note furthermore that I am not an expert in term extraction tools: I was a serious power-user of several proprietary term extraction tools at JDE and Microsoft; I haven’t worked with the Trados solution since 2003; and I have only played with a few other methods (e.g. Word/Excel and SynchroTerm). So, my view of the market at the moment is by no means a comprehensive one. It is, however, one of a user who has done some serious term mining work. One of the biggest projects I ever did was Axapta 4.0 specs. It took us several days to even just load all documents on a server directory; it took the engine at least a night to “spit out” 14,000 term candidates; and it took me an exhausting week to nail down 500 designators worth working with.

As a mere user, as opposed to a computational linguist, I am not primarily interested in the performance of the extraction engine (I actually think the topic is a bit overrated); I like that in memoQ I can set the minimum/maximum word lengths, the minimum frequency, and the inclusion/exclusion of words with numbers (the home-grown solutions had predefined settings for all of this). But beyond the rough selection, I can deal with either too many or too few suggestions, if the tool allows me to quickly add or delete what I deem the appropriate form. There will always be noise and lots of it. I would rather have the developer focus on the usability of the interface than “waste” time on tweaking algorithms a tiny bit more.

So, along the lines of the previous posting on UX design, my requirements on a TE tool are that it allows me to:

  • Process term candidates (go/no-go decision) extremely fast and
  • Move data into the TMS smoothly and flawlessly.

memoQ by Kilgray Translation Technologies* meets the first requirement very nicely. My (monolingual) test project was the PowerPoint presentations of the ECQA Certified Terminology Manager, which I had gone through in detail the previous week and which contained 28,979 English words. Because the subject matter is utterly familiar to me, there was no question as to what should make the cut and what shouldn’t. I loved that I could “race” through the list and go yay or nay; that I could merge obvious synonyms; and that I could modify term candidates to reflect their canonical form. Because the contexts for each candidate are all visible, I could have even checked the meaning in context quickly if I had needed to.

I also appreciated that there is already a stop word list in place. It was very easy to add to it, although here comes one suggestion: It would be great to have the term candidate automatically inserted in the stop-word dialog. Right now, I still have to type it in. It would safe time if it was prefilled. Since the stop word list is not very extensive (e.g. even words like “doesn’t” are missing in the English list), it’ll take everyone considerable time to build up a list, which in its core will not vary substantially from user to user. But that may be too much to ask for a first release.

As for my second requirement, memoQ term extraction doesn’t meet that (yet) (note that I only tested the transfer of data to memoQ, but not to qTerm). I know it is asking for a lot to have a workflow from cleaned-up term candidate list to terminological entry in a TMS. Here are two suggestions that would make a difference to users:

  • Provide a way to move context from the source document, incl. context source, into the new terminological entry.
  • Merging terms into one entry because they are synonyms is great. But they need to show up as synonyms when imported into the term base; none of my short forms (e.g. POS, TMS) showed up in the entry for the long forms (e.g. part of speech, terminology management systems) when I moved them into the memoQ term base.

My main overall wish is that we integrate TE with authoring and translation in a way that allows companies and LSPs, writers and translators to have an efficient workflow. It is imperative in technical communication/translation to document terms and concepts. When this task is put on the translators, it is already quite late, but it is better than if it doesn’t happen. Only fast and flawless processing will allow one-person or multi-person enterprises, for that matter, to carry out terminology work as part of the content supply chain. When the “fast and flawless” prerequisite is met, even those of my translator-friends who detest the term “content supply chain” will have enough time to enjoy themselves with the more creative aspects of their profession. Then, economic requirements essential on the macro level are met, and the need of the individual to get satisfaction out of the task is fulfilled on the micro level. The TE functionality of memoQ 5.0 RC excels in design and, in my opinion, is ready for translators’ use. If you have any comments, if you agree or disagree with me, I’d love to hear it.

*Kilgray is a client of BIK Terminology.

SHARE THIS:

Excellence

November 2, 2010 by Barbara Inge Karsch

For the last few years, I was part of a team called Microsoft Language Excellence. Now, I am part of a consultant group called ExcellenceTerm. To some, including excellence in one’s name might be presumptuous, even arrogant. To me, it is part of the vision.

Let’s look into the etymology. Excellence comes from Latin excellere which means to distinguish oneself or to raise oneself above. If we look up ‘to excel’ in OneLook© Dictionary Search, we find that most dictionaries define it as to do better than, to surpass, to be outstanding, to have a particular talent in something, to do better than a given standard, etc.

Is there something wrong with doing better than a particular standard? Or with being outstanding? I believe not in our Western culture. In a competitive environment, such as the Microsoft culture, there certainly is a positive connotation with the fact that you think you can surpass someone else. My vision for Microsoft Language Excellence was always to be the best resource for terminology management within the company. I believe we fulfilled that vision during most of the existence of Language Excellence.

ExcellenceTerm is part of TermNet, the International Network of Terminology. TermNet was founded in 1988 based “on the initiative of UNESCO, with the aim to establish a network for co-operation in the field of terminology.” ExcellenceTerm is a small group of terminology consultants who are working on various projects, including a certification program for terminologists called the ECQA Certified Terminology Manager.

Economic ups-and-downs aside, we all have to be motivated in our professional lives in order to keep our jobs, make a living, not burn out, etc. Striving for excellence—not achieving perfection—is for me a healthy way to add value and enjoy what we are doing.

SHARE THIS:

To centralize or not to centralize—it’s not even a question

October 21, 2010 by Barbara Inge Karsch

In May, I saw the announcement of a new research brief by Common Sense Advisory, which, according to its summary, would explain why companies are starting to centralize their language services. That made sense to me. In fact, it made me happy.

Not happy enough to cough up the money to purchase the study, I am afraid. But as people interested in terminology management, don’t you think that the following paragraph from the announcement sounds good? “Large organizations have begun consolidating their translation activities into internal service groups responsible for a broad range of language-related functions. This brief outlines the rationale behind and steps involved in enterprise language processing, including centralized operations, process re-engineering, automation, and content and metadata remediation.”

It sounds good, because anything else but a centralized service for prescriptive terminology management in an enterprise would be counterproductive. A centralized terminology database with a centralized service allows an entire company to contribute to and make use of the asset. According to Fred Lessing’s remar in an earlier posting, Daimler did a good job with this. Here is what they and companies, such as IBM and SAP, who have had a centralized service for years, if not decades, are getting out of it:

  • Standardization: If product teams reuse terms, it leads to consistent corporate language. Documenting a term once and reusing it a million times, helps getting a clear message out to the customer and sets a company off from its competitors.
  • Cost savings: The Gilbane Group puts it nicely in this presentation: “Ca-ching each time someone needs to touch the content.” It might cost $20 to set up one entry initially, but ten questions that didn’t need to be asked, might save $200 and a lot of aggravation. There are many terminology questions that come in for a major release. If I remember correctly, there were 8000 questions for a Windows Server release back when things hadn’t been centralized; many translators asked the same question or asked because they couldn’t access the database.
  • Skills recycling: That’s right. It takes “strange” skills to set up a correct and complete entry. A person who does it every now and then might not remember what the meaning of a data category field, forgets the workflow, or simply can’t understand the question by a translator. And yet, entries have to be set up quickly and reliably, otherwise we get the picture painted in this posting. A centralized team, who does it all the time, refines skills further and further, and again, saves time because no questions need to be asked later.

But all that glitters is not gold with centralization either. There are drawbacks, which a team of committed leaders should plan for:

  • Scale: Users, contributors and system owners all have to be on board. And that takes time and commitment, as the distance between people in the system may be large, both physically and philosophically. Evangelization efforts have to be planned.
  • Cost allocation: A centralized team might be in a group that doesn’t produce revenue. As a member of terminology teams, I have worked in customer support, content publishing, product teams, and the training and standardization organization. When I had a benchmarking conversation with the Daimler team in 2007, they were located in HR. The label of the organization doesn’t matter so much than whether the group receives funding for terminology work from those groups that do generate revenue. Or whether the leadership even just gets what the team is doing.

I believe that last point is what broke the camel’s back at Microsoft: Last week, the centralized terminologist team at Microsoft was dismantled. The terminologist in me is simply sad for all the work that we put in to build up a centralized terminology management service. The business person in me is mad for the waste of resources. And the human worries about four former colleagues who were let go, and the rest who were re-organized into other positions. Here is good luck to all of them!

SHARE THIS:
Next Page »

Blog Categories

  • Advanced terminology topics
  • Branding
  • Content publisher
  • Events
  • Interesting terms
  • Job posting
  • Process
    • Coining terms
    • Designing a terminology database
    • Maintaining a database
    • Researching terms
    • Selecting terms
    • Setting up entries
    • Standardizing entries
  • Return on investment
  • Skills and qualities
    • Negotiation skills
    • Producing quality
    • Producing quantity
  • Subject matter expert
  • Terminologist
  • Terminology 101
    • Terminology methods
    • Terminology of terminology
    • Terminology principles
  • TermNet
  • Theory
  • Tool
    • iTerm
    • Machine translation
    • Proprietary terminology management systems
      • J.D. Edwards TDB
      • Microsoft Terminology Studio
    • Term extraction tool
      • memoQ
    • Terminology portals
      • BACUS
      • EuroTermBank
      • Irish National Terminology Database
      • Microsoft Language Portal
      • Rikstermbanken
  • Translator
  • Usability

Blog Archives

  • November 2012
  • October 2012
  • September 2012
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • July 2011
  • June 2011
  • April 2011
  • March 2011
  • February 2011
  • January 2011
  • December 2010
  • November 2010
  • October 2010
  • September 2010
  • August 2010
  • July 2010
  • June 2010
  • May 2010

BIK Terminology

  • About Barbara Inge Karsch
  • Terminology Services
  • Terminology Resources
  • My Terminology Portfolio
  • Let’s Talk Terminology

From the Blog

  • A glossary for MT–terrific! MT on a glossary—horrific!
  • Part-time position for an Arabic terminologist
  • Tidbit from the ATA Conference
  • Bilingual corpora and target terminology research
  • Terminology internship at Eurocopter in France

Find It Here

Follow Me

  • Email
  • LinkedIn
  • Phone
Copyright © 2023 BIK Terminology. All Rights Reserved. Sitemap. Website by sundaradesign.