Is the future of game sound in jeopardy due to the rise of artificial intelligence or on the cusp of an exciting new era? We discussed what the future holds with our community…
The future of game sound is a topic constantly being discussed. The advancements we’ve seen in games development as a whole in the last 30 years are staggering, and we’re continuing to see truly mind-blowing results as technology continues to be refined.
At dBs Institute, we’ve seen first-hand how technology, particularly artificial intelligence, is changing the landscape of what’s possible in game sound. You only have to look back at the collaboration between Charisma.ai and dBs Pro on ‘The Kraken Wakes’ to see the potential that AI voice actors can have on player immersion.
But what is the true impact that these emerging technologies have had on the current workflows of game audio professionals, and how will it change in the next 5-10 years. To discuss this question, we invited composer, sound designer and dBs alum Virginia Leo (Team Terrible Games), technical sound designer and dBs graduate Matthew Owen (Virtex), and software developer and dBs tutor William Sawyerr, to share their perspectives.
Has your suite of tools / processes changed due to emerging technologies?
Virginia Leo: Honestly, it has not changed much but the ways it has changed have only been beneficial to my workflow. There are some useful tools out there such as Splitter AI which I recently became aware of when one of my work colleagues showed it to me. It's great for separating and isolating music layers such as vocals, percussion, and bass and exporting them as individual stems. This can all be done within a browser as well which is very convenient (https://vocalremover.org/splitter-ai). I use this occasionally in conjunction with Splice samples and it works really well, however, it's not perfect.
I find that emerging technologies can be exciting, but often fail to deliver exactly what they are hoping to achieve. I don't rely too heavily on them because they are often unreliable and it could be a while until they reach their full potential, but I understand the importance of being aware of them so you can follow their development within the industries.
Matt Owen: So far, my direct tools haven't changed just yet, but I have been closely watching the rise of AI both in the use of voice creation and smart NPC use. The way that you can now also give a simple one sentence prompt and an AI software will give you a fully-fledged song is interesting (whilst also being scary!).
Where do you foresee the future of game sound going?
Virginia Leo: I believe the future of game sound has a few different paths it could take depending on which industry we are talking about.
From the perspective of indie video game development, which is what I am more familiar with, I believe that emerging tools and technologies can help smaller studios create assets much quicker and more efficiently on tighter budgets. However, I think that in terms of AI, the world of indie games will always have more of a human characteristic compared to, for example, AAA video game development.
I think the larger companies will be investing in these technologies at a higher rate and they will pave the way for smaller studios to follow, should they even decide to. I also think that procedural audio is something that could become even more widespread in all areas of game audio. It has already existed in the field of game audio for many years and it will only continue to become more powerful as technology advances.
Matt Owen: I'd love to see the continuation into research around AI voice use. For example, having NPCs that talk back to you dynamically, but do so with realistic voices will be a game changer for a lot of games.
Emotionally driven conversations that are dynamic would be a huge win for world-building, especially for smaller studios who may not be able to afford the large costs associated with recording lots of voice actors.
I'd also love to see a continuation to see more dynamic and easier to implement reverb systems that react to destructive and material changes to the environment.
William Sawyerr: There are several different avenues of sound that I foresee changing in the future. I believe game audio will increasingly leverage spatial audio technologies like Dolby Atmos, DTS:X, and Sony’s 3D audio for Playstation 5, to really push the boundaries of player immersion.
I think we’ll start to see more dynamic and responsive audio. Procedural audio generation, where sound effects are created algorithmically in real-time, will allow for more varied and nuanced audio experiences. Adaptive music systems that change based on player actions, game state, or emotional tone will become more sophisticated, too, offering a seamless auditory experience that feels alive and reactive.
AI will play a significant role in game audio development too, from automating sound design and mixing to enhancing dialogue generation and lip-syncing. Machine learning can be used to analyse player behaviour and preferences, adapting the audio experience in real-time for a more personalised experience.
Finally, with the rise of cloud gaming, more audio processing could be offloaded to the cloud, allowing for more complex and high-quality soundscapes without taxing the local hardware. This also enables real-time audio updates and streaming of new audio content as the game evolves.
Are there any specific facets within game sound that you feel will change dramatically?
Virginia Leo: The changes that may happen will certainly be prominent among those who work in game audio. Our workflows may change dramatically, but I don't think the changes will be perceived as heavily by the audience who play the video games we work on.
I recently watched an interview with the ex-president of Nintendo, Satoru Iwata, where he mentioned that even if you had a machine with 10x the processing power and 20x the graphics, it means there will be more workload but it doesn't necessarily mean the audience will clearly recognise the difference from the previous machine.
I think the first changes will be more noticeable on the backend of programming, and I think it's a positive change for us creatives. As much as I enjoy audio implementation, I'd be lying if I said I wasn't looking forward to spending less time implementing audio and more time creating it when the technologies advance even further.
Matt Owen: I believe voice acting could potentially change for lesser roles within games (NPCs). Without going into the legalities too much, I think voice actors could start emerging ways to sell "their voice" to implement with these new AI NPC software.
William Sawyerr: Dialogue is definitely going to be an area of change, but it’s one of many that comes under how AI is impacting the industry.
Things like AI-driven dialogue giving a more dynamic experience for players, localisation and accessibility improvements through machine learning, AI-generated environment soundscapes and emotion AI, which analyses the player’s emotions within the context of the game.
We could also start to see more developers introducing a biometric element to their games, using their biometric data to enhance and adapt the soundscapes based on a player’s physiological state.
Outside of AI, when we look at middleware tools like Wwise and FMOD and how they continue to evolve, we’re seeing deeper integration with game engines and a more intuitive interface for sound designers. By extension, I think we will also start to see more in-game audio mixers allowing players to customise their audio on-the-fly, and to a greater level than what we’ve seen before.
What innovative tools / techniques aren't getting that much attention, but you feel are really exciting for the future of game sound?
Virginia Leo: I think projects like Google Magenta can be very inspiring. I hope to see more similar projects develop in the future because they're a great source of inspiration that can be incorporated into the creative process.
I understand the controversies that arise around using AI tools and technology in combination with art and music, but I also see a harmony between the two worlds of technology and creativity colliding, especially when it comes to video game music where it's all about mixing and matching various disciplines between artistic and technical elements.
We can't always find instant inspiration as creatives, we get burnt out and sometimes we just can't find the drive, it's completely normal and I think it's part of being human. Even when we are required to be creative by a demanding career in an ever-expanding industry, we can't always find the spark and I think that's when technology can become a useful tool.
Matt Owen: With UE5, we had the introduction of MetaSounds that work well with blueprints. This has given us designers a lot more control over DSP, and I've seen some very cool dynamic music systems being used with this (Quartz timing). This introduction felt like something that smaller studios could use for their audio solution (as opposed to FMOD/Wwise middleware) that offers an impressive suite of tools.
William Sawyerr: Real-time 3D audio engines like Embody’s Immerse and Steam Audio are moving really quickly, and allow for more precise sound placement and acoustic modelling, which has massive implications on the realism of game audio, but are still being underutilised.
I’ve mentioned procedural audio already, but I do think software such as Tsugi’s GameSynth and other procedural audio tools are making strides towards offering sound designers more tools to create complex and dynamic soundscapes without the need for large libraries of pre-recorded sounds.
Neural Synthesis is another really exciting area for the future of game sound. The use of neural networks to synthesise realistic sound effects and music on-the-fly is an emerging field; one that can dramatically reduce the footprint of audio files in games, while still offering high-quality, real-time audio.
We’re seeing Audiokinetic’s Wwise Reflect and Unity’s built-in audio tools evolve to provide more intuitive and powerful ways to integrate and manipulate audio directly within game development environments. Plus, advanced HRTF (Head-Related Transfer Function) implementations can provide more accurate and individualised 3D audio experiences, especially in VR and AR applications. These systems can adapt to different head and ear shapes for a more personalised auditory experience.
Do you want to be on the cutting-edge of the future of game sound? Check out our brand-new MA Game Sound degree.
FIND OUT MORE
https://www.virginia-leo.com/
https://www.linkedin.com/in/matthew-owen-a04152182/
https://www.dbsinstitute.ac.uk/team/william-sawyerr