AI Generates Entire Songs in Your Chosen Style Using New Transformer-GAN Breakthrough

By embedding style cues directly into a Transformer-GAN framework, researchers have created an AI that composes music matching emotional tones or composer signatures, with listeners ranking it as the most humanlike and musically rich yet.

Style-conditioned music generation with Transformer-GANs. ​​​​​​​Image Credit: Ole.CNX / Shutterstock​​​​​​​Style-conditioned music generation with Transformer-GANs. ​​​​​​​Image Credit: Ole.CNX / Shutterstock

A study led by researchers from South China University of Technology published a research paper in the special issue "Latest Advances in Artificial Intelligence Generated Content" of Frontiers of Information Technology & Electronic Engineering 2024, Vol. 25, No. 1. They proposed an innovative music generation algorithm capable of creating a complete musical composition from scratch based on a specified target style.

Rule-based music generation models rely on theoretical knowledge, which makes it difficult to capture deep structures and limits the diversity of outputs. Among deep learning-based music generation models, generative adversarial networks (GANs), variational auto-encoders (VAEs), and Transformers each have their advantages but face issues such as high training difficulty, insufficient handling of long sequences, or a lack of style control. Although style-based music generation research has introduced style information, it has not considered the model's structural awareness. This paper combines structural awareness with interpretive ability, conducting research and verifying the effectiveness of the method in emotion and composer style generation.

A music generation model under style control, namely style-conditioned Transformer-GANs (SCTG), is proposed. The importance of data representation, style-conditioned linear Transformer, and style-conditioned patch discriminator in the style music generation model is discussed. Data representation includes the representation of MIDI event sequences with inserted style information and grouped musical information. The style-conditioned linear Transformer addresses the limitations in music style-conditioned generation by embedding style information, which is directly embedded into the hidden space of the model and combined with output features to influence the output of the entire sequence. The conversion of generated music into discrete scores enhances the learning effect of the discriminator and promotes the expression of style information in the generated music.

Experiments were conducted using the emotion-style dataset EMOPIA and the composer-style dataset Pianst8, comparing with two state-of-the-art models. Objective evaluations showed that the proposed model achieved the best results in traditional metrics as well as style distance (SD) and classification accuracy (CA), with outstanding style consistency of generated music and similarity to the original data. In subjective evaluations, participants gave the highest scores to the music generated by this model in terms of humanness, richness, and overall quality, indicating its potential for practical use.

Based on CP-Transformer, style conditions are innovatively embedded, and a style-conditioned patch discriminator is implemented. Comparisons between the style-conditioned generator and the style-conditioned patch discriminator showed that the music style information in the model helps the generator discriminate music styles and enables it to generate music of specific styles. Additionally, the model performs poorly when either loss is removed; therefore, both Loss₍Cls₎ and Loss₍Gan₎ are important for the performance of the style-conditioned patch discriminator

Source:
Journal reference:

Comments

The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of AZoAi.
Post a new comment
Post

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
AI Transforms Eye Care With Early Detection and Personalized Treatment Tools