Formal languages are a fundamental concept in computer science. Understanding their grammars is essential in various fields, including in artificial intelligence (AI) applications. In AI, one of the ways to learn the rules of a language is through grammatical inference, which uses a set of positive and negative strings which respectively belong to and do not belong to a language. One common model used for representing these regular languages is a deterministic finite automaton (DFA), which is a mathematical model that recognises string patterns in the language. An example of a DFA is shown in Figure 1.
The main methods for DFA learning include state-merging algorithms and connectionist approaches (Figure 2). To date, state-merging algorithms have yielded the better results. Nevertheless, connectionist approaches are a promising alternative.
State merging algorithms work by simplify an initial highly-specific DFA by grouping states that behave the same way, making the DFA smaller and easier to understand, while still recognising the same language. However, this method could be limited in their ability to handle large or complex DFAs.
Connectionist approches in the context of machine learning refer to the use of neural networks composed of interconnected neurons, such as recurrent neural networks (RNNs). RNNs process sequential data and have delivered promising results in previous studies and other fields.
This final-year project compares the effectiveness of RNNs for learning DFAs to the state-merging EDSM algorithm, using Python and GO programming languages.
In essence, this project highlights the importance of exploring different techniques for DFA learning and the potential of connectionist approaches in this field. Understanding the strengths and limitations of these approaches, would be conducive to developing more effective methods for learning and extracting the rules of languages, which could be applied in various fields.
Figure 1. DFA in 3 states, where State 1 is the starting state and State 2 is the ending state; transitioning through states helps update the string with the number in-between the states. Strings marked with a green tick are positive, whilst the red cross indicates negative strings
Figure 2. Simple flowchart representing the flow of DFA learning for 2 approaches, respectively; the results are generated through the Python and GO code to facilitate the comparison process
Student: Nikola Jančić
Supervisor: Dr Kristian Guillaumier