Automatically Resolving Forward-References in Article Headlines to Identify Clickbait

Today, journalists have to take several linguistic features into consideration when publishing material online to draw users’ attention, motivate engagement, and ultimately increase potential for commercial revenue [1]. For instance these would include bold claims, such as “Cure to cancer found”, taboo headlines written purposely to provoke certain people, or titles that are simply gossip or hearsay such as an alleged scandal concerning a celebrity or politician [1]. Consequently, clickbait has become an ongoing problem. It floods our news feeds and is even being used by trustworthy newspaper websites [2]. This not only is cause for annoyance, but also presents misleading information to the user. The identification process of clickbait headlines can be somewhat arduous given the different amount of strategies used to create them. It would therefore be convenient if a common underlying structure within clickbait headlines could be found. This would mean that a simple yet accurate approach could play a key role in filtering such headlines out of users’ news feeds and internet searches.

A technique which is rarely researched but commonly used by journalists is forward referencing [1]. The knowledge base essentially consists of entities referring to mentions in the text which succeed them and are usually more specific. Generally they take the form of pronouns, common nouns/noun phrases and proper nouns/proper noun phrases. An example of this could be the headline “She has managed to do it again, incredible”. The identity of the person is not known at this point and in addition neither is it known what “She has managed to do again”. Entities in the headline are referring to more specific mentions within the article body, hence the hypothesis of this dissertation, is that if such forward referring entities within the headline can be detected, then this could be a way of identifying “clickbaity” headlines. An expert system was implemented where a list of rules was manually produced, built on the knowledge base of forward referencing. Given this information a forward referencing score is given to the headline in order to indicate the level of clickbait within the title. This is achieved by detecting forward referring entities and assigning an appropriate score to each so that an average of these scores can be given to the headline itself. The research concluded that in spite of the fact that forward referencing is widely used by journalists when writing article headlines, results obtained by the system were not up to standard. In addition to this, the database used was very large, and given that this is an expert system, every single factor has to be included within the rules constructed [3]. Therefore it can be said that forward referencing is not sufficient alone to detect clickbait. This does not mean that it is not a useful technique, but rather that it is a step in the right direction.

Figure 1. Owner refers to Jamie while dog and it both refer to Bobo

References

[1]         Blom, J.N. and Hansen, K.R., 2015. Click bait: Forward-reference as lure in online news headlines. Journal of Pragmatics, 76, pp.87-100.

[2]         Potthast, M., Köpsel, S., Stein, B. and Hagen, M., 2016, March. Clickbait detection. In European Conference on Information Retrieval

(pp. 810-817). Springer, Cham.

[3]         Waltl, B., Bonczek, G., and Matthes, F., 2018. Rule-based Information Extraction: Advantages, Limitations, and Perspectives, in: Jusletter IT 22. February 2018

Student: Rafael Warr
Supervisor: Dr Chris Staff
Course: B.Sc. IT (Hons.) Artificial Intelligence