This work attempts the grammatical inference of Maltese text, which means learning of the grammar rules of this language from training data. The chosen method was a neural approach, using pre-trained models called BERT and MBERT. These models have been trained to recognise patterns in a number of languages but, to date, have not been trained on Maltese. The above models have been improved upon through the creation of the BERTu and MBERTu models, which have been further trained on Maltese data. The latter were also used in this study.
Maltese is a complex language. It is morphologically rich, meaning that it has many word forms and grammatical structures. However, it is also a low-resource language, which means that there is not an abundance of data readily available for the training of models. Different approaches were attempted to help improve the model’s performance and overcome these difficulties, two examples being: cross-lingual transfer and 0-shot learning.
The data used to train the models in this project was a Unimorph-style database for the Maltese language, compiled by a previous student. This was chosen over the standard available Unimorph database as it was more detailed and larger in size with respect to Maltese. Furthermore, when training the multilingual models, Italian and Spanish Unimorph databases were also included. This is because Maltese is a blended language and includes Romance influence, so introducing these languages could potentially facilitate the model’s learning process. The data was in the form of the root word, the conjugated word, and its grammatical information, for example: [‘baħbaħ’, ‘tbaħbħu’, ‘V;PERF;FIN;IND;PST;PL;3;POS;INTR’]
This information was then changed to vector input, with the grammatical information being one-hot encoded. Once trained, only conjugated Maltese words would be inputted into the model, which would then output the relevant morphological inflections as demonstrated in Figure 1.
Along with the BERT models, a basic neural model was also trained on the same task for better comparison. The final models were compared to each of their counterparts, i.e., BERT to BERTu, MBERT to MBERTu, and all the BERT models to the more basic neural model. The assumption was that the Maltese models would outperform their BERT counterparts, and that all BERT and MBERT models will outperform the basic model.
Figure 1. Process overview of the trained model, showing the input as the conjugated word, represented by the word ‘newwilija’, and output as its morphological inflection,
in this case [‘ADJ’,’MASC+FEM’,’PL’]
Student: Alana Busuttil
Supervisor: Dr Claudia Borg