Corpus analysis: The Ugly Duckling of Translation

Not long ago, hearing the term “corpus linguistics” made me shriek; after all, it was something that only linguists in academia did, right? So, when I signed up for a course, I was not fully convinced that I would learn something that I could truly put into practice. However, by the end of the course, I had concluded that corpus analysis is the Ugly Duckling of Translation.

Before you get to know it, it looks ugly and worthless, but as your relationship deepens, you start seeing the beauty of it. And don’t take my word for it; others have seen it too. Take my husband, for example, a freelancer translator with all the best tools. He had also heard about corpus analysis; he knew that learning how to analyze corpus might be useful, but he had not taken the time to do it. Once I showed him how easy it was to do searches, he was immediately hooked. He even built a huge corpus from his legal and oil & gas documentation, which are his specializations. Recently, after a 10-minute introduction to a colleague, she said: “OMG, where has this been all my life!”

If you haven’t been overcome by this feeling yet, I am willing to bet that you are still looking at the Ugly Duckling from the outside. But I am sure I can convince you in the next few paragraphs by showing you the face of a cute little swan. There are three easy steps to start believing.

The first step: Decide which tool you want to use. AntConc, Wordsmith, and Sketch Engine are some of the top names in the market. All of them are great tools. But you can start with AntConc (free) to take your first steps and then take advantage of the free trials and play with the others to pick your favorite. Of course, you could stick to using online corpus such as COCA, BNC, BNCweb, etc., and maybe that’s enough for you, but why not build your own corpus that can be controlled and expanded endlessly and effortlessly!

The second step is collecting your corpus and converting it to .txt files. Nothing easier! Create a folder with subfolders on your computer. For example, if you translate documents on energy, you can have two main folders, renewable and nonrenewable; then, inside the renewable folder, you may have wind energy, solar energy, bioenergy, etc. Why is this folder division important? Because sometimes you might be looking for a general term on renewable energy, but other times you only want to search in your documentation on solar energy, which could make your searches faster. If you are just starting out, don’t worry about the number of documents in the beginning, just make sure they are representative of the topic you are working with to make sure you get useful results. You can add more documents as you get the hang of it. Just remember: Quality over quantity!

Corpus analysis tools only accept .txt files, but you can find free software that can do this for you in a matter of seconds, including the collection of cute little tools provided by the creator of AntConc, Dr. Laurence Anthony. AntFileConverter and EncodeAnt help you convert PDF and Word files into .txt, and .txt files into UTF-8 files, respectively (“stubborn” .txt files that the tool may not recognize might need that extra step of conversion to UFT-8 files). The conversion takes seconds, even for a large number of documents.

The third step is getting training, free training, that is. I know what you’re thinking: That’s going to take a long time. Wrong! Take AntConc, for example, Dr. Anthony has a collection of 5 to 10-minute videos that explain every function clearly. The fact that they are short suggests that it doesn’t take long to understand how the software works. By the way, when I say “software” I am actually referring to a downloadable file. It can’t get any easier than that! If you are just starting out, don’t get overwhelmed. First, play with the concordance tool until you feel comfortable using it before going to the next one. And that’s it! If you complete those three steps. you are ready to play. And, really… Play! It is so much fun.

What do I use it for? Corpus analysis tools include many great functions. I look for terms to confirm that they have been previously translated in this or that way. You can see how many times each term has been used and make an appropriate decision. For example, “operational” in Spanish could be “operativo,” “operacional,” “de negocios,” etc. When I check my corpus, which has been translated by professional translators, I can see how every term is used in its context and make my choice.

I can also “guess” a translation for a term to see if my guess is correct and, consequently, an accurate term for my translation. To illustrate, I can enter the word “framework” to search for a term that I know for sure contains it. I can sort my results by one, two or three words to the left or to the right (as shown by the colors red, green, and purple in the illustration) of the word “framework.” And I know it is an acronym, so I ask the program to look only for capitalized “Framework.” And, voilà, I get what I am looking for: Corporate Results Framework (CRF). If I click on Framework to see the context for every hit, the program takes me to the .txt file where the term came from. That is music to my ears.

Another tool that is music to my ears is BootCat, which converts your favorite websites into a format that can be examined in a corpus analysis tool. It is super easy to use, and it is extremely valuable if you have to translate a document about a topic that you still don’t know that well. (Great for newbies!) Just search the web, select sites or pages about your topic, and copy the URLs into BootCat.

After that first course, my interest in corpus analysis grew. There are a few courses and webinars that show translators not only how useful they are but also how to use them. However, few of them are free. I must confess, I am not an expert, but I am a good player. And when you become a skillful player, you too will see the ugly duckling become a beautiful swan!

Header image: Pixabay

Author bio

Patricia Brenes works in the Quality Control Unit of the Translation Section of the Inter-American Development Bank in Washington, D.C. She is a translator and terminologist, with a Master’s Degree in Specialized Translation from the University of Vic in Barcelona and certified by ECQA as Terminology Manager (TermNet, Vienna).

After realizing that there was a limited availability of resources and information for linguists and other stakeholders, she decided to start a terminology blog with resources and information: http://www.inmyownterms.com (Terminology for Beginners and Beyond).

Savvy Technical Translators: What do They Have that You Need?

Savvy Technical TranslatorsWhen you come into the translation business, you usually know deep down if you have what it takes to be a technical translator. As a basic starting point, you need good technical instincts in the field you are interested in. That may come from a prior career, a course of study, a family business, or a hobby that you are managing to turn into a money-maker.

Hearing tales of the often amazing series of events that bring us to the point of beginning a career in translation are part of what makes us such a fun bunch of people. But once you are here, ready to begin, know your limits. Don’t translate chemistry if you don’t know silicon from silicone. Don’t translate automotive texts if you don’t know how an internal combustion engine works. You will fall flat on your face. Ask anyone who’s been doing this for a while. We all have a story about “that job we should never have accepted.”

Good technical translation produces precise, concise, and clear texts

“Precise” is usually covered by the terms you choose, so that takes us to two of the skills that you need to make it as a top-notch technical translator. One of those is subject-matter expertise: the other is strong terminology research skills. “Concise” and “clear” texts are produced from superb technical writing. When you combine these three skills, you can be a great technical translator.

New technical translators usually come in two “varieties.” The first is translators with credentials in translation, perhaps including technical translation, but with little hands-on work experience in any technical field. They usually come with a “Desperately Seeking Specialization” vibe. The second will usually have had a career in commerce or industry and come to translation later in life.

The former group often has stronger terminology research and writing skills. The latter group usually has strong subject matter expertise but can’t necessarily write well in their target language, or, and here I speak from personal experience, their proofreading skills might not be where they need to be.

Subject matter expertise

What really defines it? At the high end, you’ll hear people refer to the 10 year or 10,000 hour rule made popular by Malcolm Gladwell’s book Outliers, which says that no one can be an expert until they have spent 10 years working in a field. That’s a somewhat depressing concept for many technical translators wishing to build up expertise in a new field.

At the other extreme, you’ll find people who consider themselves an expert after they have translated 10,000 words on some subject or other. That’s a recipe for disaster (well, at the very least, quality complaints). Unsurprisingly, perhaps, I think the answer lies somewhere in-between. Yes, if you have 10 years’ experience you’ll have a head start and many customers will view you favorably.

But that doesn’t mean you are a brilliant translator and don’t have a great deal to learn. You should work on your writing. And people without hands-on experience can build up a body of expertise in a field over time. A long time, mind you, not a few weeks’ worth of work. The best and fastest way I know to build up this expertise is to have your work edited by somebody who knows what they’re talking about. Shake off your pride and ask people to track changes in your work. Feedback produces growth.

What about terminology?

Being able to research and pin down terminology in context successfully is the only way to produce reliable technical translations. Doing that quickly helps productivity and increases your hourly gross income. Over time you’ll know the key resources for your field and know how to use collocations to find out how people actually say it today. But any translator with Internet access and decent dictionaries can look up translations for technical terms. There’s nothing that can help you properly parse concepts that you do not truly understand. That brings us back to subject matter expertise. Sorry to harp on, but that’s the strongest prerequisite for success, in my mind.

The third skill, technical writing style, is less talked about routinely, but I have written and spoken about it, for instance here. Technical writing is a skill that can be learned and a fundamental part of the technical translator’s skill set. Don’t think that only commercial and marketing translators need to write well.

Make clarity a point of pride. Do one proofreading pass for numbers and units of measure alone, so that no errors of that nature ever creep in to your work. Use a suitable style guide so that you always format units of measure correctly and know whether to hyphenate a term of the art. Use document-specific style sheets to help you be consistent.

So start with good instincts, but don’t be that technical translator who “just translates what’s there.” Make the product a better piece of writing than the original, unless the purpose precludes that. Invest in yourself. Learn about cars or colloids, computer chips or contact lenses. Don’t leave great writing for artsy translations.

Be savvy: Know that your career will be much more successful if you treat technical translation with the respect it deserves, you start with high standards and you raise them with every new customer. May you prosper!

Header image credit: Picjumbo
Header image edited with Canva

Author bio

Karen M. Tkaczyk

Karen Tkaczyk was the 2011-2015 Administrator of ATA’s Science and Technology Division. She is an ATA-certified French>English freelance translator. Her translation work is focused on chemistry and its industrial applications. She has an MChem in chemistry with French from the University of Manchester, UK, and a diploma in French, a PhD in organic chemistry from the University of Cambridge, UK. Initially, she worked in the pharmaceutical industry in Europe. After relocating to the U.S. in 1999, she worked in pharmaceuticals and cosmetics. She established her translation practice in 2005. She lives in Colorado with her family. Contact: karen@mcmillantranslation.com, @ChemXlator.

The Ins and Outs of Term Validation

By Patricia Brenes
Reblogged from In My Own Terms with permission from the author (including the images)

The Ins and Outs of Term ValidationEvery step in term processing during the preparation of glossaries or updating of termbases is important, but probably the one that will save you the most time is term validation. How and when it’s done is key to achieve cost-effective/efficient validation.

What is term validation?

Validation (conceptual/linguistic) is the verification and quality control process used to make sure a term or list of terms is accurate according to preferred usage or requirements established by the terminologist or the team involved in the process. It includes a series of steps such as evaluating the quality of the resources available (e.g., corpora) and consolidating terminological data (e.g. into glossaries). It involves choosing between several term candidates to pick a preferred term or even creating your own terms (neologisms). In some cases, validation also includes writing new or updated definitions. Continue reading