AI Generates Entire Songs in Your Chosen Style Using New Transformer-GAN Breakthrough

Download PDF Copy

Frontiers of Information Technology & Electronic EngineeringAug 13 2025

By embedding style cues directly into a Transformer-GAN framework, researchers have created an AI that composes music matching emotional tones or composer signatures, with listeners ranking it as the most humanlike and musically rich yet.

Style-conditioned music generation with Transformer-GANs. Image Credit: Ole.CNX / Shutterstock Style-conditioned music generation with Transformer-GANs. Image Credit: Ole.CNX / Shutterstock

A study led by researchers from South China University of Technology published a research paper in the special issue "Latest Advances in Artificial Intelligence Generated Content" of Frontiers of Information Technology & Electronic Engineering 2024, Vol. 25, No. 1. They proposed an innovative music generation algorithm capable of creating a complete musical composition from scratch based on a specified target style.

Rule-based music generation models rely on theoretical knowledge, which makes it difficult to capture deep structures and limits the diversity of outputs. Among deep learning-based music generation models, generative adversarial networks (GANs), variational auto-encoders (VAEs), and Transformers each have their advantages but face issues such as high training difficulty, insufficient handling of long sequences, or a lack of style control. Although style-based music generation research has introduced style information, it has not considered the model's structural awareness. This paper combines structural awareness with interpretive ability, conducting research and verifying the effectiveness of the method in emotion and composer style generation.

A music generation model under style control, namely style-conditioned Transformer-GANs (SCTG), is proposed. The importance of data representation, style-conditioned linear Transformer, and style-conditioned patch discriminator in the style music generation model is discussed. Data representation includes the representation of MIDI event sequences with inserted style information and grouped musical information. The style-conditioned linear Transformer addresses the limitations in music style-conditioned generation by embedding style information, which is directly embedded into the hidden space of the model and combined with output features to influence the output of the entire sequence. The conversion of generated music into discrete scores enhances the learning effect of the discriminator and promotes the expression of style information in the generated music.

Experiments were conducted using the emotion-style dataset EMOPIA and the composer-style dataset Pianst8, comparing with two state-of-the-art models. Objective evaluations showed that the proposed model achieved the best results in traditional metrics as well as style distance (SD) and classification accuracy (CA), with outstanding style consistency of generated music and similarity to the original data. In subjective evaluations, participants gave the highest scores to the music generated by this model in terms of humanness, richness, and overall quality, indicating its potential for practical use.

Based on CP-Transformer, style conditions are innovatively embedded, and a style-conditioned patch discriminator is implemented. Comparisons between the style-conditioned generator and the style-conditioned patch discriminator showed that the music style information in the model helps the generator discriminate music styles and enables it to generate music of specific styles. Additionally, the model performs poorly when either loss is removed; therefore, both Loss₍Cls₎ and Loss₍Gan₎ are important for the performance of the style-conditioned patch discriminator

Source:

Frontiers of Information Technology & Electronic Engineering

Journal reference:

Wang, W., Li, J., Li, Y. et al. Style-conditioned music generation with Transformer-GANs. Front Inform Technol Electron Eng 25, 106–120 (2024). DOI: 10.1631/FITEE.2300359, https://link.springer.com/article/10.1631/FITEE.2300359

Posted in: AI Research News

Comments (0)

Download PDF Copy

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.

Post a new comment

(Logout)

Post

Sign in to keep reading

We're committed to providing free access to quality science. By registering and providing insight into your preferences you're joining a community of over 1m science interested individuals and help us to provide you with insightful content whilst keeping our service free.