This final-year project investigated the complex task of art generation from a text prompt. This was done through an implementation of an architecture based on generative adversarial networks (GANs) and machine learning (ML) neural networks that would be capable of creating artistic paintings from text. The VQGAN model was adopted, with its parameters being modified to facilitate the experimental approach.
The implementation was divided into two tasks. The first was the text-to-image synthesis, which is the process of generating an artwork based on a given text prompt, whilst the second used a neural style transfer to blend two images together, combining content and style.
The text-to-image synthesis used the VQGAN model and the multimodal neural network, CLIP. The model generated an image from text, starting from random noise. This was then encoded, and a series of iterations were generated until a final image ‒ which was as close to the prompt as possible ‒ was produced. Subsequently, the neural style transfer was applied to the generated image to retain the content of the image, whilst using different styles from the available dataset.
The HEART (Holistic Evaluation of Art) approach was adopted, and adapted to evaluate the outcome of this investigation. The first stage of the evaluation included the use of different parameter configurations to test the quality of the artistic outcome, and two qualitative surveys that provided a measure of the individual appreciation of the generated works. The second stage consisted in a survey distributed amongst 150 participants, who were asked to compare the VQGAN model with outputs from state-of-the-art applications, such as Dream by Wombo, Midjourney and DALL-E 2.
The overall result of the investigation indicates that while, the state-of-the-art applications could generate hyper-realistic images, the VQGAN model could generate more abstract and intriguing pieces, while remaining loyal to the prompt.
Figure 1. Generated artwork entitled, Flowers in Space, inspired by Vincent van Gogh’s The Starry Night
Figure 2. Neural style transfer applied to generated duck sketch
Student: Jean Claude Sacco
Supervisor: Dr Vanessa Camilleri