Constructing and analysing knowledge maps via source code

The most valuable asset held by software engineering organisationsis, arguably, the knowledge held by their employees. It is the creation, transfer and application of such organisational knowledge that provides such companies with the competitive edge required to provide value to their customers. With high staff-turnover in the ICT industry, companies need to protect knowledge assets ‒ or rather, they would need to find a way to mitigate and address knowledge risk.

Knowledge risk is generally defined as an ‘operational risk caused by a dependency on, loss of, unsuccessful intended or unintended transfer of knowledge assets, and results in a lack of, or non-exclusivity of these assets’. One example of knowledge risk may involve scenarios whereby a few members of staff would hold a disproportionate amount of knowledge about specific ‘knowledge assets’ when compared to the larger staff complement. This would be an issue (i.e., a risk) because in the event of such people leaving the company, access to important knowledge and/or expertise would be lost.

The motivation for this project was to help mitigate the notion of knowledge risk by analysing source code repositories, and subsequently building knowledge maps for an organisation or project. The underlying hypothesis is that one could deduce ’what’ and ’how much’ a person knows about a particular knowledge asset, on the basis of the frequency and nature of their commits of code related to that asset.

Following a literature review, the approach taken to test this hypothesis was to create a tool that would pull information from source code repositories (e.g., Git), build a mathematical representation of an organisation’s knowledge using graphs, and then applying graph theory to identify potential instances of knowledge risk. Evaluation was carried out on a number of open-source repositories.

The tool analyses commits on the project to build a profile for every knowledge worker. Knowledge assets are then extracted from both the internal and external packages used in a particular commit. The knowledge maps produced by the tool are represented as graphs
structures, which consist of knowledge assets and knowledge workers as vertices and represent the relationships between them through weighted edges in order to identify potential knowledge risk and mitigate it.

An evaluation exercise was carried out with the help of a number of open-source repositories, to determine the extent to which the constructed knowledge map would represent the real situation.

Figure 1. Example of a generated knowledge map

Student: Jack Piscopo
Course: B.Sc. (Hons.) Computing Science
Supervisor: Dr Mark Micallef