This project focuses on using data-mining techniques within the crime analysis domain, which has been increasing in popularity in view of many datasets becoming publicly available to researchers.
Crime has become a socio-economic problem, with an ongoing impact on quality of life and economic growth . Criminology is also one of the most important fields to which to apply data-mining techniques, as these could produce significant results . In fact, providing accurate and reliable crime predictions assists law enforcement authorities and other entities to effectively prevent crimes from recurring, while handling them effectively if and when they occur .
The study focused on three objectives, namely: predicting the type of crime (crime type prediction), predicting the number of future crimes that could occur (crime rate prediction) and establishing how much data would be needed to train these models. Two distinct American
public datasets were chosen for this study ‒ one focusing on the city of San Francisco and the other on New York City (NYC). Each dataset provided geographic, temporal, demographic, and historical crime data, and were preprocessed differently, depending on the objective in
For the first objective, which was crime type prediction, a number of models were implemented. These ranged from traditional machine learning to deep learning methods, all of which could classify the crime category based on the inputted data, mainly: the time of crime occurrence, its location and whether an arrest was made or not. The results obtained were then compared to those presented in an existing research paper, with the twofold aim of replicating the findings in the said paper and to improve upon the performance of classification of the work. It is worth mentioning the current experiment created models that classified the crimes to a high degree
of accuracy than in the said research paper.
The second objective concerned crime rate prediction, which essentially means forecasting the number of future crimes in a particular place. As with the first objective, the experiment set out to
replicate and improve upon the performance of the models from the read literature (which utilised the same NYC dataset) by implementing both machine learning and deep learning models. This task took a regression-based approach, meaning that it predicted a continuous value (the expected future crimes), as opposed to the classification-based approach above (which predicted a category). Furthermore, the task integrated additional data for this objective that helped create
models offering higher accuracy. As with the first objective, the results obtained were compared to a research paper. Again, the experiment yielded better predictions overall.
The third objective required establishing the minimum amount of data required by the learning models in order to produce effective results. With each objective, the original size of the training set was reduced, while keeping the test set unchanged, so as to check how much data would be required by the developed models to retain their optimal performance. The results obtained were around 70% f1-score for the crime type prediction and under 6% MAPE for crime rate prediction.
: Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Nuria Oliver, Fabio Pianesi, and Alex Pentland. Once upon a crime: Towards crime prediction from demographics and mobile data. 09 2014. doi: 10.1145/2663204.2663254
: K. Zakir Hussain, M. Durairaj, and G. Rabia Jahani Farzana. Criminal behavior analysis by using data mining techniques. In IEEE-International Conference On Advances In Engineering, Science And Management (ICAESM 2012), pages 656–658, 2012
Student: Karl Attard
Course: B.Sc. IT (Hons.) Artificial Intelligence
Supervisor: Dr Joel Azzopardi