Lemiffe Music, Software, Stories & AI

The State of AI Music Generation

This is a list of tools, VSTs, services, websites and research on AI music generation. Last updated: 30/Mar/2024.

I’ve divided the list into four areas: Research, Websites & Services, AI vocalists, VSTs & Plugins for DAWs. There is quite a varied use of AI models across the board, the first section primarily focuses on prompt to music, aka “ChatGPT for music”, but there are also tools and VSTs that use RNNs, deep neural nets, diffusion models, etc.

I’ve also provided my opinion/thoughts on some of these tools, I didn’t have a chance to try out every website/service, nor did I purchase every VST as some are a bit expensive, but I did go through example videos, and sometimes comparison videos with other tools/services, to get a high-level overview on each tool/service/plugin. Hopefully it is useful to you as a reference list.

State of the art research in text to music prompts and audio effect generators

  • Meta
  • Google
  • Stability AI
    • Stable Audio: Diffusion-based generative models for latent audio diffusion
    • Had a few demos here that while they were impressive, I would put them on par with Google and Meta’s efforts so far. Check the official website here.
    • Another demo that seems to be more recent (and better!)
    • HN comment about Ed Newton-Rex who built stable audio quit after release re training data - he went on to found https://www.fairlytrained.org/
  • Generative Music in Games
    • Research paper on the topic
    • Could be the most interesting area of research: on-the-fly generation of music that adapts based on the character, story arc, etc. It could potentially work incredibly well for those games that have a bunch of different endings where every choice matters in the decision tree. This way the musicians and audio direction can focus on the theme and primary music while leaving atmosphere and subtle background music to embedded AI music generators.

Websites and services to generate music and sound effects with AI

  • Audiogen.co: Doesn’t mention what model they are using but it seems very likely they’ll be using one of the research ones covered above
  • Lalal.ai: Extracts or isolate stems in music - so you can, for example, remove the drums and overdub with your own, or use the vocal takes in a remix
  • AI Mastering Services: Quite a few options available here, more popping up all the time. AI mastering has already been a thing for a while, but certain tools/VSTs and websites are making it much easier to use and afford. This article from Ars Technica includes a very good breakdown and comparison amongst popular services such as LANDr, Ozone, etc.
  • SoundDraw: Interesting one as it doesn’t work with the usual prompt to music principle, it uses segments built of samples submitted by the devs, and you can turn segments on and off and rearrange them, like in a rudimentary DAW, as well as generate new segments as well. A big pro is that you can try it out without making an account; but I’m unsure about the licensing aspects (there is a FAQ that dives into this however)
  • Beatoven.ai: At a glance seems similar to the previous one
    • Fast registration (requires email confirmation)
    • Generation took less than a minute, with 4 alternate tracks
    • Quality was poor, and the default length was very small (plans start at $6)
  • AIVA: AI music generation assistant
    • Login / account creation was broken when I tried it out
  • Loudly: Also similar to previous websites
    • Quick registration process
    • Instead of just being prompt based, it offers parameters, where you select duration, instruments, genre, etc.
    • Generation is FAST
    • Provides 3 options per generation, liked some of the options but I’d like to see how it generates more variance with longer tracks
    • Limited to 31 second output for free, you have to upgrade to get longer tracks
  • Suno: Also similar to the previous websites.
    • BEST choice out there right now in my opinion
    • Very simple interface, got 50 credits for free upon logging in the first time.
    • V3 of their engine allows you to make a track up to 2 minutes long, you can also toggle to generate instrumental-only songs.
    • It took a couple of minutes to finish, and it generated 2 options for 5 credits per track, which means you can try 5 prompts for free
    • It allows you to continue from the end of a previously generated track, which would allow you to “stich the results” together to create longer tracks
    • The results were quite interesting, I asked for A droning techno with a slow buildup towards an epic theme and then with a decay back into techno… The first option delivered just this, albeit a bit lackluster, but the second option went from absolute rubbish into amazing bright vivid super uplifting music which quite surprised me
    • Login only works with Discord, Google or Microsoft? (why no email option?)
  • Boomy: Untested. Con: Need to make an account
  • Musicfy AI: Similar to previous but with AI vocals as well?
  • Soundful
    • Pro: License covers everything from videos, websites, corporate use, games, etc.
    • Pricing is per year, and is less than half the price that Epidemic Sound costs per year. Or you can pay a bit more to obtain full copyright for all creations and not just a license for usage.
    • First one billed as “royalty free music for creators” — I think this is what a lot of us thought when AI music generation started becoming a viable thing… that its primary use-case would be to generate background music for videos and live streaming on Twitch and YouTube, not to create new content to flood streaming services like Spotify with auto-generated songs… yet a few of the last 6 or 7 services seem aimed towards people without music or singing knowledge to create tracks and assist with publishing them on streaming services

There’s this section in Colin and Samir’s Jacob Collier Interview where they dive into the topic of AI and music and I really loved the positive attitude, specially where he talks about crafting new sounds and using it as a tool to unlock or further creativity.

As a music world builder (as Jacob Collier talks about himself) it can be instrumental to find sounds that can change the way we dream up and work with compositions, venturing into unexplored audio territory, kind of what analog and modular synths have done to open up a new world of possibilities over the past 50 or 60 years, or granular synthesis in the past 20 to 30 years.

It seems anyone with a say on AI and music generation take one of three stances:

  • Hate it, afraid of the implications for copyright, don’t like the potential for saturation with “fake work”, the dilution of quality music in a sea of artificial sounds.
  • Afraid, unsure about what it means, how much impact will it cause, will it be used as a tool or will people use it to create full works, will it take away jobs?
  • Positive but slightly weary, knowing there is potential for wrong-doing but the toolset it gives us outweighs the cons, such as crafting new soundscapes, imagining sounds that we could have never thought possible otherwise, helping create drafts of soundscapes which we can then build upon, or helping unblock continuation of songs for artists struggling with a creative block

AI singers and artist imitation

AI virtual music instruments and VSTs

  • Synplant 2: Lets you craft sounds and branches - this is an EPIC plugin!
  • iZotope Elements: RX, Neutron, Ozone and Nectar plugins
  • Guitar Rig 7 Pro: Guitar and Bass Amp Simulators. Uses AI (with Convolution?) to craft epic guitar tones
    • For example you feed it the tone of an amp or a sound you are looking for and it adjusts its parameters to allow you to reach that sound without having to own expensive amps / preamps. It is expensive though.
    • Great example video
  • Focusrite’s FAST bundle: Makes mixing much easier
  • Magenta Studio (for Ableton)
    • Covered 1.0 but now 2.0 is out and it has been redesigned to be much simpler to use. Only works in session view but you can drag and drop the clips into arrangement view. Very easy plugin to make drum loops sound a bit more natural, or to extend parts of a song.
    • Powered by RNNs
  • Synthesizer V - Voice synthesizer
    • From lyrics and midi to vocals, without having to open your mouth! I’m still conflicted about this area of plugins; it does open doors and reduces barrier to entry, but on the other hand live music is amazing and vocals are often the centrepiece and I’m not sure I want to live in a world where even more emphasis is placed on the DJ, specially when no vocalist is required anymore, everything can be virtual. Using it in combination with real lyrics, or in a few tracks where it can be used stylistically, hell yes! But as a full replacement? I’m not sure. I’m still on the fence.
    • The price? 89 dollars, sounds amazing - scarily amazing - supports 3 languages as of February 2024
    • Research (don’t include in credits, mentioned below)
    • Check out the website for videos/examples
    • Deep neural network-based
  • Vocaloid 6 (by Yamaha) - Voice synthesizer
    • Another vocal synth, a bit less impressive than than Synthesizer V in my opinion, with less voices and quite a bit more expensive out-of-the-box (although you can pay per voice pack with Synthesizer V). Also supports 3 languages. IMO sounds a bit more robotic, yet for some styles this might be the right fit!
    • Research (same link as Synthesizer V)
    • Check out this example video from their website

Continuation of my previous rant:

Even though the AI singers imitating known artists generated a lot of drama, in my opinion it pales in comparison to the voice synthesis… I mean, I think a lot of people were frustrated with autotune, it just means you don’t even need to know how to sing, YouTubers such as Roomie have generated many videos of autotune vs real vocals. Some artists would use it covertly to fix mistakes, but then it spawned a use-case where it was used as a texture, the whole point was cranking it to 10 making it a stylistic choice…

And I just think in a way AI voice synthesis for singing means that some people who make a living singing on tracks will maybe make less if producers don’t have to find the right talent anymore, not even just someone with a voice to then auto-tune, but now they can skip the vocalist completely. How will that work in a live setting?

These are just some of my raw thoughts as all of this is still quite new and I don’t know how it will pay out in the long run, but the pace of innovation in the AI music generation and voice synthesis in general seems to be accelerating, and I’m very curious what’s going to come out next, and how these tools will be used once the dust has settled.

Other Tools and Services:

There are a bunch of lists of top AI tools including quite a few I didn’t cover like Orb Producer Suite 3, Playbeat sequencer, Atlas 2, Spark Chords and others… This website covers some of them if you are interested in more overviews and links to the plugins

Thanks for reading!