Rimarju: A rhyming dictionary for the Maltese language

This work proposes a system that could generate words that would rhyme with a given input word, ranked based on topic modelling. Various natural language processing (NLP) techniques such as topic modelling and rhyme detection were adopted to accomplish this task. Whilst these two techniques have been long established in the field of NLP, very few studies have carried out the process on languages other than English. This project focused on Maltese.


Rhyme is prominent aspect of many languages, and plays a significant role in various forms of expression, such as poetry and music lyrics. Phonetics is the study of speech sounds and tonality. Individual words are made up of phonemes which, when transcribed, indicate  how a specific word is pronounced. These phonemes are listed in the International Phonetic Alphabet. In written language, they become letters, or graphemes. The proposed system uses a converter that would transcribe the sound of any Maltese word into units called graphemes. From this text, the final 45% of words would be compared to seek those ending in the same sound, and thus creating rhyme.

 
Whilst rhyming is the major aspect in the creative forms mentioned above, having a flowing and thematic piece of verbal art is also important. Hence, a topic modelling system was incorporated into the system in order to, not only provide the user with random rhyming words but also ranking them in a way whereby semantically closer words would be outputted first. 

The Natural Language Toolkit (NLTK) library was used to build the Latent Dirichlet Allocation (LDA)  model, which was adopted for this section of the project. The model was trained on a large corpus of Maltese sentences, in order to identify various topics and which words would most likely be used in each of said topics. These probabilities between words were then used in order to identify and rank the words that were semantically closest to others (or more distant), and outputting these words in the right order.


Creating rhyming poetry and lyrics has been popular for many generations in numerous cultures. This is particularly the case in the Maltese context, where traditional rhymes and poetry, and also the most popular type of folk music or għana played a significant role in popular and oral culture, especially in rustic life. However, these forms of entertainment have become less sought after, especially with the younger generation. The proposed app was intended to renew the interest in the practice of spontaneous and written rhymes in Maltese, possibly also incorporating them into creative literature. 

Given the interactive nature of the model, it could also be used as a language-learning tool for beginners, through which they could learn different words and inflections, and how to pronounce different variations.  Furthermore, it could be used as a tool to assist poets and songwriters. Rimarju is an easy-to-use interactive app that has the potential to be of benefit to many users in the field of language arts and education.

Figure 1. Topic-modelling system

Student: Lucas Lautier
Supervisor: Prof. Matthew Montebello