Aviation is currently a growth industry and, with an average 400 aviation accidents occurring on a monthly basis, it would be imperative to determine the causes of these accidents to improve aviation safety. Figure 1 presents the aviation accidents occurring each year, according to data collected by the Aviation Safety Reporting System (ASRS).
In this research, data and text mining techniques were used to extract useful information from a large database of aviation accidents. This study has drawn largely from the ASRS database, which consists of over 210,000 aviation accident reports since 1988. The ASRS holds narratives which contain a detailed account of what occurred in the accidents, as well as categorical information about the flights in question such as weather elements and aircraft information. The study of such accident reports helps to identify the cause of these accidents, with a view to extract similarities or differences amongst them in order to prevent fatalities and minimise the loss of resources.
This work demonstrates the use of data mining techniques to determine the primary problem of accident reports from the ASRS and predict the risk factor of these accidents. This is achieved through the use of machine learning classifiers such as naive Bayes and support-vector machines (SVMs), and deep learning techniques for both classification and prediction.
To identify the primary problem of accidents, the narratives were subjected to a preprocessing exercise, which involved reducing words to their stems, removing punctuation and stop words, and mapping synonyms and acronyms to umbrella terms. Machine learning classifiers were then used to predict the primary problem of an accident. This method achieved an accuracy of 60% on the test data with the use of SVM.
For the task of predicting the risk factor of accidents, similar steps for preprocessing were carried out on synopses, which are brief summaries of narratives. SVM once again proved to be the best performing classifier with a test accuracy of 61%. Furthermore, structured data was also used to predict the risk factor of accidents. After encoding the data and labels, SVM provided an accuracy of 66% on the test data.
The work achieved through the proposed system demonstrates that machines could reliably identify a flight’s primary problem, as well as a high-risk situation in a flight.
Course: B.Sc. IT (Hons.) Artificial Intelligence
Supervisor: Dr Joel Azzopardi
Co-supervisor: Mr Nicholas Mamo