Morphological classification of SDSS galaxies using machine learning techniques

This project set out to classify images of galaxies based on their morphological types, that is, their visual and physical characteristics – ‒ a task that would be unfeasible to classify manually, spurring the need for automated techniques. 

Projects such as the Sloan Digital Sky Survey (SDSS) generate data on hundreds of thousands of galaxies This work uses deep learning (DL) and convolutional neural networks (CNNs) to train models for classifying three types of galaxies. The motivation behind this project was to establish to what extent morphological classification would improve the understanding of how different galaxies formed and evolved.


CNN architectures for classifying galaxies already exist in academic literature. However, these are difficult to compare, due to differences in datasets and the adopted classification scheme. Therefore, this research sought  to compare CNN architectures more comprehensively when applied to morphological classification.


This project utilised a dataset from the crowd-sourced project Galaxy Zoo 2 (GZ2), which provided a catalogue of around 300,000 visually classified galaxies sourced from the SDSS. GZ2 was processed and reduced to a set of robust classifications that could be used for training and testing the models. The models used in the project classified galaxies into three categories, namely: ellipticals, spirals and barred spirals (as shown in Figure 1).


The CNN architectures that were considered include those adapted from similar studies and state-of-the-art architectures developed for more general image classification tasks. Additionally, techniques to solve class imbalance were implemented, along with data augmentation, to investigate any changes in performance.


The evaluation of the models using various metrics showed that the generic architectures could match the performance of a novel architecture sourced from the consulted literature (see Figure 2). In particular instances, they also performed better when compared to a study using a similar dataset and classification scheme.

Figure 1. Barred spiral galaxy from the SDSS

Figure 2. F1 scores for implemented architectures

Student: Robert Mifsud

Supervisor : Mr Joseph Bonello