Demystifying Neural Machine Translation

The AI-powered machine translation of today is very different from its previous incarnations. Where is machine translation currently, and where is it heading? More importantly, is it right for you?

Vector source:

If you’ve been anywhere near a digital-age business person lately, I’m sure you’ve heard buzzwords like ‘artificial intelligence’, ‘machine learning’ or ‘neuron networks’ thrown around. Water-cooler conversations around the world teach us that developments in AI have infiltrated every aspect of our existence and are going to be much, much more present in our lives in the future. Because it’s such a versatile technology, machine learning can be used for many things — from predicting what people will buy next to creating art, and even translating.

Machine translation has been a thing for decades. Maybe you’ve even used it for a work document you had to read in a pinch or to talk with someone online, or order a flat white in Paris (that didn’t go well, now did it?). But don’t judge an MT by its former historic failures. Thanks to AI, there have been dramatic changes to machine translation in the past few years, and it’s only going to get better.

You may have heard of the contribution AI had to machine translation and were curious as to what the fuss is all about, really. Is it truly as good as they make it out to be? That’s a good question, and one I will try to answer in the following pages. I wrote this e-book to give you a bit more insight about what, exactly, is machine translation today and where it’s going, so you can make an informed decision when faced with the question of what method of translation is best for you.

The way you should approach MT depends on a variety of factors (more on that later), but if you give it a chance, it can truly transform your workflow. And if you’re still not sure if machine translation right for you, don’t worry — I’ve included a handy checklist at the end to help you figure it out! Those of you who have no patience for theory can scroll down for the link. I won’t judge. The rest of you, join me for a quick glimpse into translation technology.

Human translation is as old as language

Humans have started to converse in something resembling a language fairly lately (at least, in historical terms). Most specialists agree that the languages spoken today are all a result of an African tongue developed approx. 50,000 years ago. As these ancient Africans started to spread out around the world — possibly thanks to that very same language — they brought their words with them. Over time, new languages started to develop from that primary tongue. It wasn’t long before people realized that in order to communicate with humans from other places, they would need to understand what they are saying. The translation profession was born.

It’s hard to tell exactly when and where the first act of linguistic translation took place. We know that the famous Epic of Gilgamesh was created approx. 2100 B.C. and contained an Akkadian version of Sumerian poems. The ancient Greeks, around 12th century B.C., had discussed translation work and even distinguished between literal and paraphrased translation. 3rd century B.C. saw translations of the Bible from Hebrew into Greek, as well as the writing of the Rosetta Stone, basically the world’s first multilingual dictionary. Obviously, many, many more translation endeavors followed.

And still, approximately 4000 years after the Epic of Gilgamesh was created, the translator’s job hasn’t changed much. That is, until computers came into the picture and changed everything.

Vector source:

From paper tapes to early computers: rule-based machine translation

The first stages of computer translation took place in the 1930s. The technology may seem primitive to us now, but if you can believe it, it was considered extremely advanced at the time (just like dial phones, hairspray, and pacifism). A Russian scientist named Peter Troyanskii proposed a machine that could produce actual, logical and localized translations using only paper tape and two human native speakers. Alas, his invention did not get the appreciation it deserved.

About 20 years later, the computer guys at IBM held a demonstration of their first translation machine. The state-of-the-art instrument could translate a whopping 60+ Russian sentences into English, though none of them was “does this have peanuts in it”, so we know the technology had its faults. Mainly, it wasn’t flexible enough to accommodate for semantic ambiguities. Luckily, some still found it useful: like many new technologies, it was embraced for military uses, for example, in the American Air Force and Atom Energy Commission.

Now, these early versions of MT used a rule-based machine translation method, or RBMT — “the classical approach”. In this method, the computer was fed with a set of linguistic rules and dictionaries. When introduced with a new translation request, it would analyze the line in the source language (let’s be Anglocentric and say it’s English), determine the sentence structure and the rules used to create that sentence, then create a similar version of that sentence using the words and linguistic rules of the target language (say, Danish). But this method wasn’t very efficient. It was slow (as were computers, back then), could not really duplicate nuances in the source text, and worst of all, it was literal — easily confused by all the horse-holding and cake-piecing people kept doing.

Vector source:

Better, but not quite there yet: example-based machine translation

Over the years, machine translation slowly improved and was sparsely employed for various uses, such as by the French Textile Institute and the printing company Xerox. It gained popularity around 1975, when international trade and commerce activities led to higher demand for translation in languages previously ignored, such as Japanese, German or French. It was during that time that a new approach to MT was developed — the example-based machine translation, or EBMT.

EBMT is also based on dictionaries, but it takes a slightly different approach. Instead of using word-for-word translation, the example-based method uses a database of sentences and their translation. The computer then analyzes those sentences and extracts phrases to use for future translation. So If we have the translation for “Oh no, not again”, and “so long and thanks for all the fish”, we can easily get the translation for “Oh no, not all the fish”.

This wasn’t a perfect method, but it solved some of the context and literality issues people faced when using RBMT. But soon enough, EBMT had to prove its worth in the face of a bigger, better method: Statistical machine translation (SMT). Spoiler: it lost ☹

20 years of fame is not so bad: statistical machine translation

In the late 1990s, several web-based machine translation services popped up, beginning with AltaVista’s Babelfish (cool name, guys). These services, like many other web and non-web MT services at the time, were based on SMT. Using a complex algorithm, they examined the various possible translations for the words in the sentence and calculated the probability of each translation to determine which version is most likely to be the correct one. And the more lines the machine had in its database, the better the results were.

Perfect, right? In SMT, you didn’t have to feed the system all those grammatical and linguistic rules, and there were much fewer context issues. And the results for some language combinations were quite good (although for others, it was so bad that MTs became synonymous with ‘bad translation’. Yikes). So all in all, you can understand why this one lasted a couple of decades. And with the world-wide-web gaining popularity, there were that much more opportunities for translation. Chat rooms, web pages, myspace posts — you name it. For the first time since transatlantic flying was invented, people were rediscovering the world (and this time, the food was much better).

SMT was made famous by online services like Google Translate, and it’s still in use today, but the times they are changing. The second half of the 2010s saw the unbelievably quick emergence of the neural machine translation method — or NMT. In 2015, a study by Junczys-Dowmunt et al compared SMT and NMT translations in 15 language pairs and found that NMT was comparable to- or better than SMT in every. single. one of these pairs.

Vector source:

All hail the (current) king of MT: neural machine translation

Using attention mechanisms in neural networks, This science-fictiony technology manages to neutralize most of the issues SMT causes. Artificial neural networks are comprised of a big number of small, simple processors. When connected to many others, these processors are capable of making amazingly complex calculations. With the help of state-of-the-art algorithms, they can predict results based on the information they are fed.

So, why are neural networks with attention mechanisms better than the regular computers and servers used to back the old SMT technologies?

Firstly, these networks can be trained, i.e. fed information — usually a large quantity of bilingual content — that will help them provide more accurate results when presented with a problem (for example, a text to translate). But in that sense, they’re not so different than SMT. What differentiates them from previous methods is the fact that they can learn. This means that they are ‘born’ a clean slate, but accumulate more and more information as they go along, getting better and better each time. The networks identify patterns in previous sequences and use them to predict the correct result.

Each translation uses two neural networks — one for the source language, and another for the target. It’s important to remember that neural networks are machines, and like other machines, they don’t understand human concepts such as meaning, or that joyous feeling you get when you spoon up a chocolate chunk from a pint of ice cream. So if we want them to understand text written by humans, we have to convert it to something they know: numbers.

To do that, the first network encodes the source text into a numbered code. Now, the target network can finally understand it. Yay! The target network then uses that information to produce a text in the target language. This is not a word-for-word translation, nor it is based on strict rules. Rather, just like a real, live linguist, the target network uses the meaning of the content to create its result. This leaves us with a much more natural-sounding text.

Vector source:

The human factor in translation may be more important than you think

We know that neural networks can produce a much better translation than before. In some languages, the results are astounding. While some others still require some work, we know they’re going to get better soon. Not only that, but a lot of the human error currently present in translation work will become a thing of the past. Numbers will always be correct, dates and weights will always be in the right format. The currency could be automatically converted. Think of the possibilities! But is NMT really going to be as good as a great human translator? As with all good questions, the answer varies. This is because people have two very unique advantages over computers: they are creative and emotional.

In a very short time (shorter than we expect — predictions say as little as 1–3 years!) NMTs could do a lot of the work humans do and do it quicker and cheaper. Texts that require little creativity, like software strings, legal contracts, economic reports or medical documents, often have very straightforward, simple translations, and are an easy target for NMT. Yes, in the near future, a human linguist will still need to review the translation, make sure everything was accurately done and fix up any mistakes and errors. But those checkers are expected to have less and less work as the machine improves.

Things get complicated when we consider the way languages evolve and change with society. A good example of this is the case of non-gendered Hebrew. Hebrew is, and always have been, a gendered language. Verbs, adjectives — all gendered, and usually written in male form by default. But in past years, as people and companies shifted towards a more inclusive approach to content, they started expecting non-gendered texts. The gender-neutral trend in Hebrew is a fairly new one, and as NMTs are trained with massive amounts of content, it might take them a while to pick up on those new preferences. Such is also the case for slang or new syntactic structures that take root in languages all the time.

Other specific preferences, such as cultural sensitivities, also present a problem. Languages are extremely fluid and evolve at light speed in the information age. The words chosen are often weaved with emotion and hidden meanings that are impossible for machines to understand. It’s safe to say if we want our content to speak to other humans — to speak their language if you will — we’ll still need to include humans in the process. But for the first time in many, many years, the translator’s job is going through a formative change. Linguists will have to become experts of local language and culture, more local-adaptors than translators.

This transformation is not so different from the ones other professions are expected to go through in the near future. Futurologists claim that deep learning will make many technical and repetitive jobs obsolete, and humans will be left with the creative and emotional side of things. But while tasks that were previously done by people will now be done better, and faster, by computers, new tasks — some say even better, more interesting ones — will be created.

Vector source:

For translators, it’s adapt-or-die. Yes, some linguists will find that their job became obsolete — mostly less-qualified, least-specialized ones. But those who will learn to work alongside the machines, to insert their own unique humanness and their understanding of their local culture into the technically-correct translation produced by AI — those may enjoy the satisfaction of creating texts that are truly better, both accurate and creative.

For business owners, NMT presents a real opportunity to create bilingual content that is accurate, error-free and consistent. Machine translation is not the obvious choice for everyone — marketing content or literature, for example, are better off with good old human translators, for the time being. If you’re lucky, though, and your content is perfect for NMT, you can enjoy a quicker translation and the added advantage of being able to brag about how advanced you are to your colleagues (score). If you’re not sure if your content is right for machine translation, that’s OK — it’s not all black-and-white, anyway. Consult with your translation provider to help you decide.

Our world is vast

Thanks for reaching out!

Robotic carrier pigeons are relaying your
message to us as we speak!

We'll get back to you as soon as possible.