Tool

A glossary for MT–terrific! MT on a glossary—horrific!

November 3, 2012 by Barbara Inge Karsch

In the last few months, I have been reading quite a bit about machine translation. And I also took the opportunity at the recent LocWorld in Seattle and the ATA conference in San Diego to attend sessions on MT.

In Seattle, TAUS presented several real-world examples of what can today be done with the Moses engine. It was refreshing to hear from experts on statistical MT that terminology matters, since that camp, at least at MS, had largely been ignorant to terminology management in the past. Here are a number of worthwhile tutorials on the TAUS site for those who’d like to stay abreast of developments.

At the ATA, the usual suspects, Laurie Gerber, Rubén de la Fuente, and Mike Dillinger, outdid each other once again in debunking the myth around MT. When fears did come up in the audience about MT and its effects, I had to think of a little story:

In the mid-90s, five of us German translators at J.D. Edwards were huddled in a conference room for some training. Something or someone was terribly delayed, and while chatting we all started catching up on the translation quota due that day. You know what that involved? It involved finding a string that came up 500 to 800 times. After translating it once, you could continue your chat and hit enter 500 to 800 times. See the screen print of the translation software to the right and you will realize that the software didn’t allow a human to translate like a human; we translated like machines, only worse because…oops, the 800 strings are through and you are on to the next source string. Some would call this negligent behavior and I am glad that today we have better translation software and that machines assist with or do the jobs that we are not good at.

ATA impressions

November 22, 2011 by Barbara Inge Karsch

I have traveled quite a bit during the last four weeks and it is high time for an update. Let me start with a review of yet another great conference of the American Translators Association in Boston.

At last year’s ATA conference in Denver, I was still stunned because the Association still seemed to catch up with technology and the opportunity to embrace machine translation. This year, I saw something completely differently. Mike Dillinger gave a well attended, entertaining and educational seminar on machine translation. He certainly lived up to his promise of showing “what the translator’s role is in this new business model.”

It was so clear that editing for MT is a market segment on the rise, if not during Mike’s seminar, then during Laurie Gerber’s presentation on the specifics of editing machine translation output. She also shared tips on how to educate “over-optimistic clients”. You add to that Jost Zetzsche’s presentation on dealing with that flood of data, and the puzzle pieces start forming a picture of new skills and new jobs.

Jost’s presentation is very much in line with an article by Detlef Reineke and Christian Galinski in eDITion, the publication of the German Terminology Association, DTT, about the flood of terminology in our future (“Vor uns die Terminologieflut”). To stem the flood, it helps to think of “data,” as Jost did, rather than texts, documents or even segments. He also declared the glossary outdated and announced a bright future for terminology databases. To think about texts, documents, segments, concepts and terms as data is helpful in the sense that data along with solid corresponding metadata have a higher reuse value, if you will, than unmanaged translation memories or the final translation product. That has been terminologists’ message for a long time.

I also attended sessions on translation education, one by the University of Illinois at Urbana-Champaign and one by New York University. Since I will be working with the Translation Center of the University of Illinois on a small research project and am currently preparing the online terminology course that will be part of the M.S. at NYU starting this spring, it was nice to meet my colleagues in person.

Terminology extraction with memoQ 5.0 RC

August 15, 2011 by Barbara Inge Karsch

In the framework of a TermNet study, I have been researching and gathering data about terminology management systems (TMS). We will not focus on term extraction tools (TE), but since one of our tools candidates recently released a new term extraction module, I wanted to check it out. Here is what I learned from giving the TE functionality of memoQ 5.0 release candidate a good run.

Let me start by saying that this test made me realize again how much I enjoy working with terminological data; I love analyzing terms and concept, researching meaning and compiling data in entries; to me it is a very creative process. Note furthermore that I am not an expert in term extraction tools: I was a serious power-user of several proprietary term extraction tools at JDE and Microsoft; I haven’t worked with the Trados solution since 2003; and I have only played with a few other methods (e.g. Word/Excel and SynchroTerm). So, my view of the market at the moment is by no means a comprehensive one. It is, however, one of a user who has done some serious term mining work. One of the biggest projects I ever did was Axapta 4.0 specs. It took us several days to even just load all documents on a server directory; it took the engine at least a night to “spit out” 14,000 term candidates; and it took me an exhausting week to nail down 500 designators worth working with.

As a mere user, as opposed to a computational linguist, I am not primarily interested in the performance of the extraction engine (I actually think the topic is a bit overrated); I like that in memoQ I can set the minimum/maximum word lengths, the minimum frequency, and the inclusion/exclusion of words with numbers (the home-grown solutions had predefined settings for all of this). But beyond the rough selection, I can deal with either too many or too few suggestions, if the tool allows me to quickly add or delete what I deem the appropriate form. There will always be noise and lots of it. I would rather have the developer focus on the usability of the interface than “waste” time on tweaking algorithms a tiny bit more.

So, along the lines of the previous posting on UX design, my requirements on a TE tool are that it allows me to:

Process term candidates (go/no-go decision) extremely fast and
Move data into the TMS smoothly and flawlessly.

memoQ by Kilgray Translation Technologies* meets the first requirement very nicely. My (monolingual) test project was the PowerPoint presentations of the ECQA Certified Terminology Manager, which I had gone through in detail the previous week and which contained 28,979 English words. Because the subject matter is utterly familiar to me, there was no question as to what should make the cut and what shouldn’t. I loved that I could “race” through the list and go yay or nay; that I could merge obvious synonyms; and that I could modify term candidates to reflect their canonical form. Because the contexts for each candidate are all visible, I could have even checked the meaning in context quickly if I had needed to.

I also appreciated that there is already a stop word list in place. It was very easy to add to it, although here comes one suggestion: It would be great to have the term candidate automatically inserted in the stop-word dialog. Right now, I still have to type it in. It would safe time if it was prefilled. Since the stop word list is not very extensive (e.g. even words like “doesn’t” are missing in the English list), it’ll take everyone considerable time to build up a list, which in its core will not vary substantially from user to user. But that may be too much to ask for a first release.

As for my second requirement, memoQ term extraction doesn’t meet that (yet) (note that I only tested the transfer of data to memoQ, but not to qTerm). I know it is asking for a lot to have a workflow from cleaned-up term candidate list to terminological entry in a TMS. Here are two suggestions that would make a difference to users:

Provide a way to move context from the source document, incl. context source, into the new terminological entry.
Merging terms into one entry because they are synonyms is great. But they need to show up as synonyms when imported into the term base; none of my short forms (e.g. POS, TMS) showed up in the entry for the long forms (e.g. part of speech, terminology management systems) when I moved them into the memoQ term base.

My main overall wish is that we integrate TE with authoring and translation in a way that allows companies and LSPs, writers and translators to have an efficient workflow. It is imperative in technical communication/translation to document terms and concepts. When this task is put on the translators, it is already quite late, but it is better than if it doesn’t happen. Only fast and flawless processing will allow one-person or multi-person enterprises, for that matter, to carry out terminology work as part of the content supply chain. When the “fast and flawless” prerequisite is met, even those of my translator-friends who detest the term “content supply chain” will have enough time to enjoy themselves with the more creative aspects of their profession. Then, economic requirements essential on the macro level are met, and the need of the individual to get satisfaction out of the task is fulfilled on the micro level. The TE functionality of memoQ 5.0 RC excels in design and, in my opinion, is ready for translators’ use. If you have any comments, if you agree or disagree with me, I’d love to hear it.

*Kilgray is a client of BIK Terminology.

A glossary for MT–terrific! MT on a glossary—horrific!

ATA impressions

Terminology extraction with memoQ 5.0 RC

BIK Terminology

From the Blog

Find It Here

BIK Terminology

From the Blog

Find It Here

Follow Me