Language analysis on the Voynich

The Voynich manuscript (VMS) has, for a long time, been a subject of interest to cryptanalysts, scholars and linguists alike. The VMS contains around 240 pages of writing in an unknown language, as well as illustrations related to plants, astronomy, astrology the zodiac and of biological scenes often involving female figures. Its origin, authorship, meaning, alphabet and writing system have to date remained a mystery. 

Numerous theories have arisen as to whether the VMS was written in a natural language, that it is enciphered in some fashion, or whether it is merely an elaborate hoax.      

Among the many hypotheses, the natural-language theory claims that the VMS is neither a code nor a hoax, but rather presents a language that is yet to be identified and explained. The strength of this theory relies on the fact that the VMS follows specific natural-language patterns such as Zipf’s law and the existence of repeated phrases and words. In recent years, with the advancement of computing techniques and methods, researchers have benefitted from a multitude of ways in which they could employ these techniques in order to analyse the VMS in greater depth and to compare it to other known (similar) texts.

This study mainly consisted in analysing common sequences of words found in the VMS using n-grams, in order to identify patterns that  would be common to other natural languages. In order to gather data, n-grams of different sizes were are collected with varying gaps between them. These patterns could shed light on the underlying structure of a language, such as common phrases and how word patterns are used. 
Ultimately, the goal of this analysis was to provide evidence for or against the natural-language theory outlined above. This could be achieved by comparing the language of the VMS to widespread languages with distant origins, such as Romance languages, Germanic languages, among other groups.  

Figure 1. Graph displaying the n-gram comparisons between languages

Figure 2. A page from the Voynich manuscript

Student: Ryan Scerri

Supervisor : Dr Colin Layfield