The English language’s current status in the field of technology is further bolstered by the fact that many languages aren’t afforded the resources needed to compete. Now, research by Master’s in Artificial Intelligence student JAKE DALLI is offering a faster and cheaper way for other languages to catch up.
Technology and the internet may have opened up some incredible avenues for people to explore and experience, but these have also become monopolised by a small number of languages. There are many reasons for this, but none more crucial than the fact that expensive research is required to change this. Thankfully, Jake Dalli, through his Master’s thesis, is working on creating a system whereby software can use what’s already available to help other languages thrive.
“When it comes to teaching a computer how to understand sentences the way humans do, we use a specific type of AI called Natural Language Inference (NLI), which is a branch of Natural Language Processing (NLP),” Jake explains. “This area sees us use machine learning models to teach a computer how to deduce the relationship between sentences in order to get it to understand the semantics [meaning] of language.”
In his research, Jake explored the inference task, where a computer is charged with deducing whether two sentences – a premise and a hypothesis – corroborate, contradict, or bear no relation to each other.
To understand this better, let’s take, ‘The boy jumps over the wall in the garden,’ as an example of a premise. A hypothesis that says, ‘There is a wall in the garden,’ corroborates the premise. One that states, ‘The boy is in the bedroom,’ contradicts it. While a sentence that specifies, ‘The boy is wearing a green shirt,’ bears no relation to the premise at all.
“This may seem easy and obvious to us, but computers work in a different way. To understand the semantic reasoning behind language, they turn words and sentences into a type of arithmetic. That’s why we need to point out certain things, such as what things like figures of speech and metaphors actually infer, and which words hold a lot of weight – ‘not,’ for example, can completely alter the meaning of a sentence.”
As one can imagine, it takes many examples and a lot of programming in order for computers to learn to make these distinctions by themselves. That, in turn, makes the process a time consuming and expensive one. But while this may make financial sense for researchers to do it with global languages like English or French, it’s not as financially worthwhile for low-resource languages like Urdu or Maltese, which have small userbases.
“In the hopes of counteracting this, my research is looking at cross-lingual NLP, which would use what computers have been taught about the English language to understand other languages, without the need of starting from scratch,” Jake continues. “To test this out, I started by feeding an artificial neural network [a computer system that simulates the human brain] all the pages on Wikipedia in a variety of languages including English and German. Then, I got the computer to deduce the correlation between words in these languages based on what it already knew about language in general.
“This actually had a higher level of accuracy than we first envisaged. Indeed, we discovered that languages that come from the same language family require pretty similar things. So, for example, once we teach the computer that some languages allow for free word order, then that can be used across the board whether it is translating Bulgarian or Russian to another language, or vice versa. It’s similar with Latin languages, Semitic languages, and the list goes on.”
The benefits of such research could actually be manifold. On top of saving vast amounts of money, a system like this could help improve the level of digital translation systems, improve automated language translations, make it easier for companies to provide instruction manuals in many languages, see film studios offering subtitles and dubbing in more languages, and much, much more. Yet, perhaps, the biggest benefit is that such a system allows for a more diverse online world in which all languages can be easily communicated in, thus leaving fewer people behind.