SimplifyIt: Automatic simplification of text

Health literacy is often overlooked in the public health sector. Medical reports are usually written to share a medical situation with other physicians and, in order to describe the patient’s needs as accurately as possible, highly technical language would be used. In most cases, this makes it very hard for the patient to understand what is being said about them, thus creating a health literacy problem.

Being unable to understand their own medical situation from official medical reports, patients could resort to looking for information from less reliable sources or misinterpret the physician’s reports. This situation provided the motivation for this the project, which sought to simplify medical text by experimenting with text simplification techniques used in the field of natural language processing.

Generally, text simplification systems are trained on a two-part dataset – one part would contain the complex terminology and the other part would offer the same content, but in simpler language. One of the primary challenges in the experiment was finding such training data in the medical domain fit for this purpose. In this work, the limited data available was used to evaluate the effectiveness of fine- tuning a pre-trained model, such as BART (Bidirectional and Auto- Regressive Transformer) [1], for medical text simplification through three different experiments. More specifically, the experiment was set up to investigate the effect of the data used to fine-tune the base version of BART for medical text simplification.

In the first experiment, the model was fine-tuned on EW-SEW (English Wikipedia and Simple English Wikipedia) [2], a general English corpus for text simplification compiled from Wikipedia. The second experiment extended the model trained in experiment 1 by fine-tuning it further on the limited data available specific to the medical domain, which was a subset of the EW-SEW corpus made up of only medical sentences. This model was evaluated on the same medical data as the other experiment. In the third experiment, the pre-trained model was fine-tuned only on the medical training data, and then evaluated on the same test set as the other two experiments.

These models were evaluated using three evaluation metrics, Namely: the BiLingual Evaluation Understudy (BLEU) score, the System output Against References and against the Input sentence (SARI) score [3] and the Flesch-Kincaid Grade Level (FKGL) score. The BLEU score was developed for evaluating machine translation tasks. Despite studies showing that it does not evaluate simplification accurately, early works used BLEU and it was implemented in this project to be able to provide a direct comparison with these studies. On the other hand, the SARI score was developed specifically to evaluate text simplification systems and is now the standard for such tasks. While, FKGL would not take into consideration whether the meaning is preserved in the output sentence, it was used to score the readability of the output sentence.

This project set out to address two research questions, namely: a) establishing the extent to which BART-base performs the task of text simplification, and b) establishing the extent to which it adapts to domain-specific language when fine-tuned accordingly.

Figure 1. The set-up of the experiments

[1] M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, and L. Zettlemoyer, “BART: denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension,” CoRR, vol. abs/1910.13461, 2019. [Online]. Available:

[2] W. Hwang, H. Hajishirzi, M. Ostendorf, and W. Wu, “Aligning sentences from standard Wikipedia to Simple Wikipedia,” in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Denver, Colorado: Association for Computational Linguistics, May–Jun. 2015, pp. 211–217. [Online]. Available:

[3] W. Xu, C. Napoles, E. Pavlick, Q. Chen and C. Callison-Burch, “Optimizing Statistical Machine Translation for Text Simplification”, Transactions of the Association for Computational Linguistics, vol. 4, pp. 401-415, 2016. Available: 10.1162/tacl_a_00107.

Student: Benjamin Bezzina
Course: B.Sc. IT (Hons.) Artificial Intelligence
Supervisor: Dr Claudia Borg