Mining the CIA World Factbook

Recent years have seen impressive advancements across a vast number of areas within the domain of artificial intelligence (AI). Such developments translate into an increased use of AI and data mining with the aim of gaining insight into a number of available datasets that could be utilised for drawing practical conclusions from the acquired information. These conclusions could then be used to increase profits, further advance research and to increase personal security by detecting fraudulent or illegitimate activities.

This study involved the application of data mining techniques to the CIA World Factbook, a large dataset covering a wide range of topics that is constantly maintained and updated by the CIA (Central Intelligence Agency) to ensure that the data remains up to date. This project explored the extent to which data mining could be applied to a large dataset and analysed the results to determine whether the results could adequately reflect real-world situations. This was
accomplished by making use of a number of data mining and graph- analysis techniques in order to bring together AI and the field of international relations.

The first phase of the project consisted in data extraction. This process involved the extraction of data from the CIA World Factbook, modifying it to a format more suited to the task at hand. These files were then mined for useful information, which in turn was entered into a graph data structure, and stored in a graph database.

The second stage of the project was the data-clustering phase, where algorithms were implemented in an attempt to divide the countries and territories contained within the CIA World Factbook into a number of clusters of countries, using a number of different algorithms and representations of the data. This facilitated the process for determining whether the clusters would support real-world alliances and dependencies.

The final stage consisted in data analysis. This part of the experiment entailed the application of other analytical techniques for mining further results from the graph. These results, along with the results from the previous phase of the project, were then analysed in order to formulate a number of conclusions, based on the data and results of the implemented algorithms. This evaluation determined the ability of AI in reflecting the situation of the world and international relations,
despite the continuously shifting state of the real world and current affairs.

Upon a final analysis, it was established that the implemented algorithms detected a number of alliances between countries (such as those between EU Member States), identified various countries that are dependent on trade with others, and identified the main global superpowers.

Figure 1. Subgraph of data mined from the CIA World Factbook

Student: Matteo Farrugia
Course: B.Sc. IT (Hons.) Artificial Intelligence
Supervisor: Dr Joel Azzopardi