Analysis of Aviation Safety and Aviation Accidents

Aviation is a key method of transport in modern society and statistically also one of the safest [1]. Nevertheless, considering that there are tens of millions of commercial flights, annually [2], some accidents are inevitable.

Data mining is a process for sourcing new and potentially significant information from large databases. This makes it an invaluable tool in any field of research where data is recorded on a large scale, and the field of aviation safety is no exception. A multitude of data-mining techniques has already been employed for this field in previous research efforts, using a variety of databases. Some of the most notable examples are the aviation database of the National Transportation Safety Board (NTSB) of the United States, and the NASA Aviation Safety Reporting System (ASRS) database. Such databases could contain either structured data or unstructured data, such as reports written by airport staff ‒ or a mix of the two. For the purpose of this project, the focus was on finding noteworthy correlations and deviations, primarily using structured data, as extracting information from the unstructured data would require the use of various natural language-processing techniques.

Figure 1. Fatalities per billion passenger miles of different modes of transport in the US between 2000-2009 [1]

The research carried out in this project aims to contribute to the discovery of significant deviations in this data, thus supporting domain experts in increasing air-travel safety. This could be achieved by comparing the effectiveness of different contrast set mining algorithms on the data, based on a variety of metrics without relying on expert feedback. Moreover, the use of multiple databases would allow for a better comparison of the aforementioned algorithms, by allowing their evaluation on differently structured sets of data. Finally, this variety of data would also be of use in searching for a wider variety of different deviations, such as the differences between fatal and non-fatal accidents, or the differences between incidents caused by human factors and incidents occurring as a result of technical issues.

Figure 2. Sample contrast sets found from the ASRS database using the STUCCO algorithm


[1] I. Savage, “Comparing the fatality risks in united states transportation across modes and over time, ”Research in Transportation Economics, vol. 43, no. 1, pp. 9–22, 2013.

[2] (Last accessed 17/12/2019)

Student: Jamie Grech
Course: B.Sc. IT (Hons.) Artificial Intelligence
Supervisor: Dr. Joel Azzopardi