Analysis of online behaviour to ensure the appropriate use of web-based systems

The objective of the project was set out after identifying specific weaknesses in online verification systems designed to keep certain vulnerable user groups safe online. These user groups include underage users, users who are feeling tired, and users under the influence of alcohol. An experimental application was developed using the Django web framework. This consisted of four different tests analysing online behaviour and cognitive performance metrics which were established as being able to differentiate these vulnerable user groups from other non- vulnerable user groups (see Figure 1).

As mentioned above, the three vulnerable user groups identified were underage users, users under the influence of alcohol, and tired users. Participants who fall under one of the three aforementioned user groups were mainly recruited by word of mouth. Most of the participants who were invited to take part in this research included relatives and friends. In total, 43 participants were engaged, and agreed to take part in this research. Out of the 43 participants, 11 were under 18 years of age, 13 others carried out the test under the influence of alcoholic substances which they consumed by their own accord and at their convenience, and overall, 23 classified themselves as being quite to severely fatigued.

The application was deployed on a ‘Heroku’ cloud server live environment where the participants engaged were able to carry out a test. After all the participants had participated in the test, the data was gathered, analysed, and optimized. Using RapidMiner, a series of machine learning tests were then run on the normalized data to analyse how well predictive models were able to attribute the results of a test to a specific user group.

The four machine learning models chosen for data analysis included the naïve Bayes classification model, the k-nearest neighbour model, the random forest model, and the deep learning model. Out of the four machine learning models chosen, the random forest model performed the best across the three user groups analysed.

The machine learning models analysed performed most effectively on the data set gathered from participants under the age of 18. Overall, this project was successful in tackling the research question since the application developed novel methods to recognise the three vulnerable user groups identified previously. From the results obtained during data analysis, one can conclude that such methods can be efficiently used to help detect vulnerable user groups when accessing web pages containing high-risk content.


Figure 1. The above screenshot illustrates the second out of the four tests developed which the users had to carry out when participating in this study.

Student: Damian Lee Callus
Supervisor: Dr Lalit Garg
Co-supervisor: Dr Peter Albert Xuereb
Course: B.Sc. IT (Hons.) Computing and Business