The Voynich manuscript (VMS) is a handwritten book believed to have originated in medieval Europe, and has been the centre of much debate, particularly among researchers. The VMS acquired its name from Wilfrid Voynich, a Polish book dealer who purchased the manuscript in 1912. The authorship of the script is unknown, as is the creator or origin of the language in which the manuscript has been written (referred to as Voynichese).
Among the many theories surrounding the origin, language and authorship of the VMS, Prescott H. Currier ‒ a cryptologist who was fascinated by the mysterious manuscript ‒ hypothesised that the script was written in two statistically distinct languages, and by five to eight medieval scribes, who primarily fulfilled the role of copyists of various manuscripts. Lisa F. Davis, a palaeographer at Yale University, took Currier’s theory a step further by applying a digital palaeographic tool to be able to differentiate between the handwriting of these scribes. This method allowed Davis to identify a total of five scribes (see Figure 1). Her findings, therefore, indicate that Currier’s multiple authorship theory was indeed plausible.
This dissertation involved statistically analysing a transliteration of the VMS generated from the Extensible Voynich Alphabet (EVA) by using a stylometry-based approach. Stylometry examines text to seek to establish its authorship on the basis of the use of certain stylometric features, which have been used in this experiment for measuring the linguistic style of each scribe.
The most frequently used words have proven to be good stylometric features for differentiating between the five scribes. Machine learning algorithms were applied to the transliteration to find possible distinctions between the scribes identified by Davis. Unsupervised learning algorithms, such as k-means clustering and hierarchical agglomerative clustering were utilised to help cluster the scribes according to these features. Additionally, supervised learning algorithms such as naive Bayes and support-vector machine (SVM) were also applied to help determine the likely scribe or scribes on the basis of the most common words. The results obtained point at the possibility of more than one scribe, thus further corroborating the findings of Currier and Davis.
Course: B.Sc. IT (Hons.) Software Development
Supervisor: Dr Colin Layfield
Co-supervisor: Prof. John M. Abela