Savvy Diversification Series – Diversification into Machine Translation

The Savvy Newcomer team has been taking stock of the past year and finding that one key priority for many freelance translators and interpreters has been diversification. Offering multiple services in different sectors or to different clients can help steady us when storms come. Diversification can help us hedge against hard times. With this in mind, we’ve invited a series of guest authors to write about the diversified service offerings that have helped their businesses to thrive, in the hopes of inspiring you to branch out into the new service offerings that may be right for you!

Taking the pulse of the U.S. localization industry demonstrates what should be an economically prosperous period for qualified translators and editors. It’s true that it doesn’t sound great for the industry to be operating in what the Joint National Committee for Languages calls a period of “language crisis” in the United States. The materials distributed to U.S. lawmakers during the February 2021 Virtual Language Advocacy Days give alarming statistics: “9 out of 10 US employers rely on employees with world language skills[, and] 1 in 3 foreign language-dependent employers reports a language skills gap[ and] 1 in 4… lost business due to a lack of foreign language skills” (JNCL-NCLIS, Legislative Priorities). That is to say, at the same time that the U.S. market is feeling repercussions for its lacking investments in multilingual education over the years, qualified language professionals are in high demand, while the roles being demanded by the market are becoming ever more technological in nature. In article “Future Tense: Thriving Amid the Growing Tensions between Language Professionals and Intelligent Systems,” Jay Marciano points out, “The day-to-day work of the translator of today will be hardly recognizable to a language services professional in 2030.”

Newcomers to the industry are at a particular advantage within these circumstances. During Slator’s Briefing for their Pro Guide: Translation Pricing and Procurement, Anna Wyndham noted that experienced buyers of localization services are less likely to adopt new pricing models, while new buyers from the tech industry and beyond are more open to and indeed may expect “human-in-the-loop” pricing models based on full integration with machine translation. Likewise, savvy newcomers to the translation profession are more likely to adopt machine translation as a reality of the role, while more veteran translators may feel less incentivized to go through the disruptive change of integrating Machine Translation (MT) technology into their everyday workflows. Newcomers and veterans alike who are looking to diversify now and have their services remain relevant for decades to come would do well to incorporate machine translation before the learning curve has become so great as to effectively disqualify one from key markets.

This article outlines key MT-related services to include in your portfolio as 21st-century translators reinventing themselves as language technologists. As language technologists, your expertise in translation makes you an asset at MT-engine training, writing content for MT, and post-editing of machine translation (PEMT) stages. This article considers these services in reverse order, starting with the PEMT services that translators are most likely to perform, before shifting further and further upstream, first to writing for MT and then to training MT engines. The discussion of each service type addresses common misconceptions and key competencies so you can start developing the skills needed to add MT services to your field of expertise. Check out the additional resources section for further reading to continue your exploration of this dynamic service area.

Service #1 – Post Editing of Machine Translation (PEMT)

In Episode 49 of The ATA Podcast, “A Look into the Future of Post-Editing and Machine Translation,” Jay Marciano defines post-editing of machine translation as a “step that a professional translator takes to review and make corrections to machine translation output in the provisioning of… high quality translation[s]” (Baird and Marciano). By rights, Marciano believes that this terminology “post editor” adds specialized meaning to what is already a post editing role. To summarize, traditional translation denotes not only the invention of completely new copy, understood to be the translation of “new words,” but also the act of editing translation memory (TM) output at the segment level, the level of work involved depending on the quality of the contributors to shared, proprietary resources, and the level of match of the source segment for translation to existing segments within the TM, generally starting from 75% percent matches to above. Incorporating segments that have been pre-translated using MT adds another segment type for human post-editing, though the term “post-editing” itself is used exclusively to denote work reviewing machine translation output.

The belief that it takes less skill to post-edit machine translation than it does to produce traditional human translation is a misconception that has circulated in the translation field since the advent of MT. This misconception is tied to several factors. Among those is the outdated perception that MT produces poor quality output that is too repetitive to be interesting for humans to review. Older rules-based or statistical models indeed perform better for content that corresponds to lower levels of the Interagency Language Roundtable (ILR) scale for translation performance. The ILR scale is comprised of 5 levels, with level 2 and below indicating limited or minimal performance, and level 3 and above indicating levels of professional performance. Traditionally, rules-based and statistical models have been best geared for texts that correspond to level 2 of the ILR scale, or straightforward texts like sets of instructions produced using controlled language that leaves little room for creative interpretation. ATA certification is a mid-career certification that demonstrates that a translator performs at (at least) a Level 3 of the ILR scale, and older MT models could not at all compete with professional humans for content characterized by the abstract language, implication, and nuance that requires a human mind to be parsed. However, machine translation technology has evolved at light speed, and even if MT cannot surpass the quality produced by human translators, the levels of fluency and correspondence it is possible to achieve using artificial intelligence and neural machine translation is remarkable. The linguistic challenges encountered in this work are interesting for those who enjoy studying the intersection of human and machine-produced languages too.

No matter the complexity of the content that a machine translation engine is designed to pre-translate, MT engines are far from replacing humans. According to the ATA Position Paper on Machine Translation, this is because “Computers can be very sophisticated in calculating the likelihood of a certain translation, but they understand neither the source nor the target text, and language has not yet been captured by a set of calculations.” While the results of MT are getting better all the time, when confirmation of any degree of accuracy or polishing is needed, a professional post editor is the one to do that job. According to ISO 17100 Translation Services – Requirements for translation services of the International Organization for Standardization (ISO), the professional competences of translators are: translation, linguistic and textual competence in the source and target language, research, information acquisition and processing, and cultural, technical, and domain competences (3.1.3). Professionalism is a competence added to the translator competences indicated in ISO 17100 for MT post editors according to ISO 18587 – Translation services – Post-editing of machine translation output – Requirements. That professionalism entails a knowledge of MT technology, common linguistic errors produced by MT, and Computer-Assisted Translation (CAT) tools, and the ability to carry out linguistic analysis, provide structured feedback to improve MT output over time, and interact with terminology management systems (“5 Competences and qualifications of post-editors” ISO 18587).

To undertake the linguistic challenges that post editing of machine translation presents requires a thorough understanding of key post-editing concepts and how those concepts relate to post-editing specifications. To review, specifications outline the requirements of buyers and expectations of target users that change how localization services are produced. With regards to machine translation, the value proposition of the content being produced will determine whether light post-editing or full-post editing is needed, that is, whether what the TAUS MT Post-Editing Guidelines refers to as “good enough” or “human translation” quality is needed. If light-post editing is called for, such as in circumstances in which speed of delivery takes priority over fluency and stylistics, the post editor will intervene minimally in the raw MT output to make corrections to inaccurately rendered meaning, grammar and spelling errors, and culturally offensive content. If full-post editing is called for, greater checks for consistency in terminology, product names, and mechanical aspects of the text are also employed.

Within either light or full post-editing models, discipline is key, and in post-editing, discipline is demonstrated by using the least number of keystrokes to make only the necessary corrections. Experienced post-editors can quickly distinguish among segments that are good enough, segments that require minor edits, and segments that need to be started from scratch.  Localization managers use post-editing distance – or the measure of the change between raw MT output and post-edited content – to gauge the overall quality of the MT engine and the post editor’s work and to identify instances of over-editing and under-editing. According to Silvio Picinini of eBay, low edit distances can be an indicator of both quality and productivity, since if both the MT engine and the post editor have been well trained, that should result in lower edit distances. For those who are interested in working as post editors or in training post editors, Sharon O’Brien recommends the following curriculum in the 2002 paper “Teaching Post-editing: a proposal for course content”; “Introduction to Post-editing, Introduction to Machine Translation Technology, Introduction to Controlled Language Authoring, Advanced Terminology Management, Advanced Text Linguistics, [and]Basic Programming Skills” (103).

Service #2 – Writing for Machine Translation

In a world in which more-and-more data is being authored on a daily basis than could ever possibly be translated by humans, the authors of a great percentage of that data may not be good writers at all, much less good writers of content intended for translation. Within workflows that incorporate MT, professional linguists have an opportunity to get involved before any content is even imported in the engines that produce the raw output for PEMT. Just like workflows built around human translation benefit if the source content is written for translation, workflows that incorporate machine translation benefit from increased efficiency and quality if the source content is written expressly for that purpose. Localization workflows for human translation already incorporate copy-editing of source content to promote smooth processing during translation, especially where multiple target languages are involved. This copy-editing stage decreases the need for clarification mid-workflow and prevents the extensive rework that results from misunderstandings and poor comprehensibility by identifying and correcting ambiguities and inconsistencies in source content prior to sending that content for translation.

Once post editors have a good sense for the errors that are common to a language pair, subject field, and text type, they will be more equipped to customize recommendations for how to best write for machine translation, and for certain text types and subject fields, the professional recommendation may just be that MT will not suffice. Ambiguities and inconsistencies that should be flagged prior to both human and machine translation include unclear referents, the use of synonyms, long compound nouns, and the misinterpretation of homonyms, among many other textual features. Examples of some common sources of translation errors are provided below.

  • Unclear referent: Group A and group B compared their results, and they [Group A, Group B, or Group A & B?] decided to make changes based on finding C.
  • Potential synonym use: The drying process should take so many days. Once the dehydration process is complete, do this next. [Are drying and dehydration separate processes, or do both refer to the same process?]
  • Misinterpretation of homonyms: Our earnings for this quarter are as follows. [Depending on the context, the best equivalent for “earnings” may be an equivalent that conveys one of these senses: pay, profits, returns, income, etc.]

When getting started with writing for MT, the principles from controlled language and plain language have good general rules that can be applied too. Uwe Muegge’s Controlled Language Optimized for Uniform Translation, for instance, includes such guidelines as expressing only one idea per sentence, using simple yet complete grammatical structures, limiting the use of pronouns by restating nouns instead, and using articles so that nouns can be easily identified; and Plain Language Association International recommends that jargon be avoided and that simple words be employed (“What is plain language?”). The rules for controlled language and plain language may imply that these forms of communication are easy to use, but even identifying the myriad of textual features encompassed by these principles takes a great deal of study, practice, and experience. The Simplified Technical English, a controlled language of the AeroSpace and Defense Industries Association of Europe, for instance, consists of sixty-five writing rules in nine different categories and a dictionary of nearly 1000 approved words.

Service #3 – Training Customized MT Engines

The invention of machine translation has largely remained in the realm of programmers and engineers. Despite the noticeable lack of linguists involved in MT development, so much high-quality data is needed to train customized MT engines that getting corpus linguists involved before undertaking what can be expensive, manual data collection processes makes perfect sense. A corpus is a collection of texts that have been selected for a specific purpose. A general language corpus will include many millions of words, while a corpus of specialized texts written by experts from a specific subject field may include only hundreds of thousands of words to start. Parallel corpora of translated and aligned segments are most frequently sought when training MT engines, whether rules-based, statistical, or neural models. However, high-quality parallel corpora take a long time to build and are exceedingly hard to find in any off-the-shelf format. Because high quality parallel corpora are so hard to find, those training MT engines may turn to comparable corpora, or collections of similar texts in multiple languages, for languages with less resources.

When building monolingual corpora, linguists will be able to identify the characteristics of the most representative data to collect for each corpus upon which the MT engine will be trained. Corpora might include one technical corpus of general content written by subject matter experts in a specific subject field per language and one client-specific corpus of proprietary product documentation per language. Since MT is trained using human produced language, it therefore replicates human biases. Linguists can help identify and mitigate the race and gender biases that manifest in large data sets by identifying specific populations, geographical regions, or language dialects not adequately represented in a corpus. They can help by eliminating any content from the corpus that is not fit for use too. Thus, MT users will not be made to feel insulted by offensive language produced by an MT engine and MT developers can avoid alienating MT users. Salvador Ordorica gives several examples of high-profile manifestations of racial and gender bias in MT and how to overcome it in the article “Avoiding Bias and Discrimination in Machine Translation” published via Forbes.

Most would-be localizers need to look no further than the translation memories under their command to start getting practice managing parallel corpora. Translation memories that contain high-quality content are highly sought-after while being hard to find, and this makes quality TMs exceedingly valuable. When a single person is contributing to a TM, each segment should be tagged with anonymized client and project identifiers so that individual clients’ data can be later isolated as necessary, in keeping with any confidentiality agreements that govern the use of content produced. Linguistic patterns will emerge from overall TMs used to train MT engines if multiple clients’ content is mixed together, so producing distinguishable copy from that content is a challenge that needs to be taken into consideration as well. Linguists can help with the style and terminology guides that make producing distinguishable copy from MT possible. If multiple people are contributing to a TM, keep the number of people contained and their identifiers clearly documented with proper protections over copyrighted assets that include the ability to rate the contents according to the quality of the producer of the source and target segment and revoke access rights, as necessary. Again, take these precautions because high quality TMs make the training of MT engines much more efficient, and these TMs therefore fetch a very high price.

Pricing MT Services According to Skill

In summary, to diversify into the MT services that are already a nearly ubiquitous part of the provisioning of human translation services, translators should develop advanced skills in CAT tools, technology in general, and linguistic post editing, the ability to match services rendered with the quality expectations conveyed in specifications, and knowledge of controlled languages, corpus building and analysis, TM management at scale, terminology management, and data security. Regardless of the wide range of competencies necessary to work in MT, be aware that traditional buyers accustomed to per-word pricing models tend to see the incorporation of MT as an opportunity to purchase translation services at further discounts to TM-pricing models. As Slator emphasizes in the Pro Guide: Translation Pricing and Procurement, new buyers mean that new pricing models are possible. When working with new buyers, shift to value-based pricing models that more adequately compensate you for your rich expertise where you can. Above all, remember that in the design, implementation, and review of MT, teaching the parrot to talk is among the goals, but it is much more valuable if you can teach the parrot to say the correct thing.

Works Consulted & Recommended Resources for Further Reading

Aslan, Şölen. “9 Types of Data Bias in Machine Learning.” TAUS, 2021 Mar 22, https://blog.taus.net/9-types-of-data-bias-in-machine-learning. Accessed 2021 Apr 12.

“ATA Position Paper on Machine Translation: A Clear Approach to a Complex Topic.” American Translators Association, 2018 Aug. 13, https://www.atanet.org/client-assistance/machine-translation/. Accessed 2021 Apr 1.

Baird, Matt and Jay Marciano. “E49: A Look into the Future of Post-Editing and Machine Translation.” The ATA Podcast, Episode 49, 2020 Sept 24, https://www.atanet.org/podcast/e49-a-look-into-the-future-of-post-editing-and-machine-translation/.

Berger, Carola F. “An Introduction to Neural Machine Translation.” American Translators Association, ATA 59th Annual Conference, October 2018, https://ata-divisions.org/S_TD/wp-content/uploads/2018/11/ATA59_An_Introduction_to_Neural_Machine_Translation.pdf. Accessed, 2021 Apr 10.

“ILR Skill Level Descriptions for Translation Performance.” Interagency Language Roundtable, https://www.govtilr.org/Skills/AdoptedILRTranslationGuidelines.htm (Links to an external site.). Accessed 2021 Mar. 30.

ISO 17100:2015(E), Translation Services – Requirements for translation services, International Organization for Standardization, Geneva, Switzerland, 2015, www.iso.org.

ISO 18587:2017, Translation Services – Post-editing of machine translation output – Requirements, International Organization for Standardization, Geneva, Switzerland, 2017, http://www.iso.org.

Legislative Priorities of the Language Enterprise-177th Congress. Joint National Committee for Languages and the National Council for Languages and International Studies (JNCL-NCLIS), 2021 Feb, handout.

Marciano, Jay. “Future Tense: Thriving Amid the Growing Tension between Language Professionals and Intelligent Systems.” The Chronicle, American Translators Association, July/August 2020, 29-32, https://www.nxtbook.com/nxtbooks/chronicle/20200708/index.php. Accessed 2021 Apr 12.

Massardo, Isabella, et al. MT Post-Editing Guidelines. TAUS, 2016, https://info.taus.net/mt-post-editing-guidelines.

Muegge, Uwe. Controlled Language Optimized for Uniform Translation (CLOUT). Bepress, 2002, https://works.bepress.com/uwe_muegge/88/.

O’Brien, Sharon. “Teaching Post-editing: A Proposal for Course Content.” European Association for Machine Translation, 2002.

Ordorica, Salvador. “Avoiding Bias and Discrimination in Machine Translation.” Forbes, 2021 Mar 1, https://www.forbes.com/sites/forbesbusinesscouncil/2021/03/01/avoiding-bias-and-discrimination-in-machine-translation/. Accessed 2021 Apr 12.

Picinini, Silvio. “Going the Distance – Edit Distance 1.” eBay blog, eBay Inc., 2019 Aug 8, https://tech.ebayinc.com/research/going-the-distance-edit-distance-1/. Accessed 2021 Mar 31. See also “Going the Distance – Edit Distance 2 & 3.”

Pro Guide Briefing: Pricing and Procurement. Slator, 2021 Apr 7, Webinar.

Pro Guide: Translation Pricing and Procurement. Slator, 2021 Mar 19, https://slator.com/data-research/pro-guide-translation-pricing-and-procurement/. Accessed 2021 Apr 12.

Simplified Technical English Specification ASD-STE100. AeroSpace and Defence Industries Association of Europe, Issue 7, 2017. PDF.

“What is plain language?” Plain Language Association International (PLAIN), 2021, https://plainlanguagenetwork.org/plain-language/what-is-plain-language/. Accessed 2021 Apr 12.

Zetzsche, Jost, Lynne Bowker, Sharon O’Brien, and Vassilina Nikoulina. “Women and Machine Translation.” The ATA Chronicle, American Translators Association, Nov/Dec 2020, Volume XLIX, Number 6. Print. Also available via: https://www.atanet.org/tools-and-technology/women-and-machine-translation/.

Author bio

Alaina Brandt is a Spanish>English translator with an MA in Language, Literature and Translation from the University of Wisconsin–Milwaukee. Her professional experience includes roles in terminology, vendor, quality, and localization project management. Alaina is currently an assistant professor of professional practices in the Translation and Localization Management program at the MIIS at Monterey. In 2017, she launched her own company Afterwards Translations to offer localization consulting and training services. Alaina is membership secretary of ASTM International Committee F43 on Language Services and Products and serves as an expert in Technical Committee 37 on Language and Terminology of the International Organization for Standardization. She has been the Assistant Administrator of ATA’s Translation Company Division since 2018.

Machine Translation and the Savvy Translator

Using machine translation is easy; using it critically requires some thought.

Tick tock! As translators, we’re all too familiar with the experience of working under pressure to meet tight deadlines. We may have various tools that can help us to work more quickly, such as translation memory systems, terminology management tools, and online concordancers. Sometimes, we may even find it helpful to run a text segment through a machine translation (MT) system.

There was a time when translators would have been embarrassed to admit “resorting” to MT because these tools often produced laughable rather than passable results. But MT has come a long way since its post-World War II roots. Early rule-based approaches, where developers tried to program MT systems to process language similar to the way people do (i.e., using grammar rules and bilingual lexicons) have been largely set aside. Around the turn of the millennium, statistics rather than linguistics came into play, and new statistical machine translation (SMT) approaches allowed computers to do what they’re good at: number crunching and pattern matching. With SMT, translation quality got noticeably better, and companies such as Google and Microsoft, among others, released free online versions of their MT tools.

Neural Machine Translation: A game changer

In late 2016, the underlying approach to MT changed again. Now state-of-the-art MT systems use artificial neural networks coupled with a technique known as machine learning. Developers “train” neural machine translation (NMT) systems by feeding them enormous parallel corpora that contain hundreds of thousands of pages of previously translated texts. In a way, this should make translators feel good! Rather than replacing translators, NMT systems depend on having access to very large volumes of high quality translation in order to function. Without these professionally translated corpora, NMT systems would not be able to “learn” how to translate. Although the precise inner workings of NMT systems remain mysterious, the quality of the output has, for the most part, improved.

It’s not perfect, and no reasonable person would claim that it is better than the work of a professional translator. However, it would be short-sighted of translators to dismiss this technology, which has become more or less ubiquitous.

MT Literacy: Be a savvy MT user

Today, there should be no shame in consulting an MT system. Even if the suggested translation can’t be used “as is,” a translator might be able to fix it up quickly, or might simply be inspired by it on the way to producing a better translation. However, as with any tool, it pays to understand what you are dealing with. It’s always better to be a savvy user than not. Thinking about whether, when, why, and how to use MT is part of what we term “MT literacy.” It basically comes down to being an informed and critical user of this technology, rather than being someone who just copies, pastes and clicks. So what should savvy translators know about using free online MT systems?

— Information entered into a free online MT system doesn’t simply “disappear” once you close the window. Rather, the companies that own the MT system (e.g., Google, Microsoft) might keep the data and use it for other purposes. Don’t enter sensitive or confidential information into an online MT system. For more tips on security and online MT, see Don DePalma’s article in TC World magazine.

— Consider the notion of “fit-for-purpose” when deciding whether an MT system could help. Chris Durban and Alan Melby prepared a guide for the ATA entitled Translation: Buying a non-commodity in which they note that one of the most important criteria to consider is:

The purpose of the translation: Sometimes all you want is to get (or give) the general idea of a document (rough translation); in other cases, a polished text is essential.

The closer you are to needing a rough translation, the more likely it is that MT can help. As you move closer towards needing a polished translation, MT may still prove useful, but it’s likely that you are going to need to invest more time in improving the output. Regardless, it’s always worth keeping the intended purpose of the text in mind. Just as you wouldn’t want to under-deliver by offering a client a text that doesn’t meet their needs, there’s also no point in over-delivering by offering them a text that exceeds their needs. By over-delivering, you run the risk of doing extra work for free instead of using that time to work on another job or to take a well-earned break!

— Not all MT systems are the same. Each NMT system is trained using different corpora (e.g., different text types, different language pairs, different number of texts), which means they could be “learning” different things. If one system doesn’t provide helpful information, another one might. Also, these systems are constantly learning. If one doesn’t meet your needs today, try it again next month and the results could be different. Some free online MT systems include:

— Check the MT output carefully before deciding to use it. Whereas older MT systems tended to produce text that was recognizably “translationese,” a study involving professional translators that was carried out by Sheila Castilho and colleagues in 2017 found that newer NMT systems often produce text that is more fluent and contains fewer telltale errors such as incorrect word order. But just because the NMT output reads well doesn’t mean that it’s accurate or right for your needs. As a language professional, it’s up to you to be vigilant and to ensure that any MT output that you use is appropriate for and works well as part of your final target text.

Image credits: Pixabay 1, Pixabay 2, Pixabay 3

Author bio

Lynne Bowker, PhD, is a certified French to English translator with the Association of Translators and Interpreters of Ontario, Canada. She is also a full professor at the School of Translation and Interpretation at the University of Ottawa and 2019 Researcher-in-Residence at Concordia University Library where she is leading a project on Machine Translation Literacy. She has published widely on the subject of translation technologies and is most recently co-author of Machine Translation and Global Research (2019, Emerald).

Why machine translation should have a role in your life. Really!

Why machine translation should have a role in your life

Image: pixabay.com

By Spence Green (@LiltHQ)
Reblogged from the The Language of Translation blog with permission from the author (incl. the image)

Guest author Spence Green talks about a heated topic: Machine Translation, Translation Memories and everything in between. Spence Green is a co-founder of Lilt, a provider of interactive translation systems. He has a PhD in computer science from Stanford University and a BS in computer engineering from the University of Virginia.

It is neither new nor interesting to observe that the mention of machine translation (MT) provokes strong opinions in the language services industry. MT is one scapegoat for ever decreasing per-word rates, especially among independent translators. The choice to accept post-editing work is often cast in moral terms (peruse the ProZ forums sometime…). Even those who deliberately avoid MT can find it suddenly before them when unscrupulous clients hire “proof-readers” for MT output. And maybe you have had one of those annoying conversations with a new acquaintance who, upon learning your profession, says, “Oh! How useful. I use Google Translate all the time!”

But MT is a tool, and one that I think is both misunderstood and underutilized by some translators. It is best understood as generalized translation memory (TM), a technology that most translators find indispensable. This post clarifies the relationship between TM and MT, dispels myths about the two technologies, and discusses a few recent developments in translation automation.

Translation Memory

Translation memory (TM) was first proposed publicly by Peter Arthern, a translator, in 1979. The European Commission had been evaluating rule-based MT, and Arthern argued forcefully that raw MT output was an unsuitable substitute for scratch translations. Nonetheless, there were intriguing possibilities for machine assistance. He observed a high degree of repetition in the EC’s text, so efficiency could be improved if the EC stored “all the texts it produces in [a] system’s memory, together with their translations into however many languages are required.” [1, p.94]. For source segments that had been translated before, high precision translations could be immediately retrieved for human review.

Improvements upon Arthern’s proposal have included subsegment matching, partial matching (“fuzzies”) with variable thresholds, and even generalization over inflections and free variables like pronouns. But the basic proposal remains the same: Translation memory is a high-precision system for storing and retrieving previously translated segments.

Machine Translation

Arthern admitted a weakness in his proposal: the TM could not produce output for unseen segments. Therefore, the TM “could very conveniently be supplemented by ‘genuine’ machine translation, perhaps to translate the missing areas in texts retrieved from the text memory” [1, p.95]. Arthern viewed machine translation as a mechanism for increasing recall, i.e., a backoff in the case of “missing areas” in texts.

Think of MT this way: Machine translation is a high-recall system for translating unseen segments.

Modern MT systems are built on large collections of human translations, so they can of course translate  previously seen segments, too. But for computational reasons they typically only store fragments of each sentence pair, so they often fail to produce exact matches. TM is therefore a special case of MT for repeated text. TM offers high-precision, and general MT fills in to improve recall.

Myths and countermyths

By understanding MT and TM as closely related technologies, each with a specific and useful role in the translation process, you can offer informed responses when you hear the following proclamations:

  • TM is “better than” MT – false. MT is best suited to unseen segments, for which TM often produces no output.
  • Post-editing is MT – false. Both TM and MT produce suggestions for input source segments. Partial TM matches are post-edited just like MT. Errors can be present in TM exact matches, too.
  • MT post-editing leads to lower quality translation – false. The translator is always free to ignore the MT just as he or she can disregard TM partial matches. Any effect on quality is probably due to priming, apathy, and/or other behavioral phenomena.
  • MT is only useful if it is trained on my data – neither true nor false. Statistical MT systems are trained on large collections of human-generated parallel text, i.e., large TMs. If you are translating text that is similar to the MT training data, the output can be surprisingly good. This is the justification for the custom MT offered by SDL, Microsoft, and other vendors.
  • TMs improve with use; MT does not – true until recently. Lilt and CasmaCat (see below) are two recent systems that, like TM, learn from feedback.

Tighter MT Integration

Major desktop-based CAT systems such as Trados and memoQ emphasize TM over MT, which is typically accessible only as a plugin or add-on. This is a sensible default since TM has the twin benefits of high precision and domain relevance. But new CAT environments are incorporating MT more directly as in Arthern’s original proposal.

In the November 2015 issue of the ATA Chronicle I wrote about three research CAT systems based on interactive MT, that is an MT system that responds to and learns from translator feedback. Two of them are now available for production use:

  • CasmaCat – Free, open source, runs locally on Linux or on a Windows virtual machine.
  • Lilt – Free, cloud-based, runs on all major browsers.

The present version of CasmaCat does not include TM, so I’ll briefly describe Lilt, which is based on research by me and others on translator productivity.

Lilt offers the translator an integrated TM / MT environment. TM entries, if present, are always shown before backing off to MT. The MT system is interactive, so it suggests words and full translations as the translator types. Smartphone users will be familiar with this style of predictive typing.

Lilt also learns. Recall that both TM and MT are derived from parallel text. In Lilt, each confirmed translation is immediately added to the TM and MT components. The MT system extracts new words and phrases, which can be offered as future suggestions.

Conclusion

New translators should think about how to integrate MT into their workflows as a backoff. Experiment with it in combination with your TM. Measure yourself. In a future post, I’ll offer some tips for working with both conventional and interactive MT systems.

————— [1] Peter J. Arthern. 1979. Machine translation and computerized terminology systems: A translator’s viewpoint. In Translating and the Computer, B.M. Snell (ed.)