Abbey Road's Mirek Stiles Explores The Tempest 3D Audio Tech for Sony's PS5

Last week Sony PlayStation’s lead designer and all-round gaming legend Mark Cerny presented a technical deep dive into the system architecture for the much anticipated Sony PS5, and I have to say the messaging around audio is very exciting indeed (if you are into that sort of thing).

Audio has always felt like an afterthought or at least lower down the priority list when it comes to spec’ing out a games console. The PS4 even took a few steps backwards from the PS3 in the sound department, due to the CELL Broadband Engine found in the PS3 having a superior SPU (Sound Processing Unit) to that found on the PS4. So, it’s massively encouraging to see this announcement coming from Sony, that in a nutshell is rather game changing (excuse the pun) regarding the audio department.

The main message from Sony is “3D Audio for All”. The Sony vision for 3D Audio will be provided via the rather epically named Tempest 3D Audio Tech. The Tempest Audio Engine is based on AMD GPU (Graphic Processor Unit) technology and has been modified in a way that’s very close to the aforementioned SPU found in the PS3, but with a little more kick. The Tempest Engine has no caches and all data is accessed via Direct Memory Access, independent of the CPU (Central Processing Unit). Using a GPU to process audio (as opposed graphics) is something that’s been explored in the past by various parties, as in theory it can handle thousands of operations at once due to its parallel processing vs. just a handful by a CPU, which is designed for serial processing. Using GPU technology to process audio can be a tricky nut to crack – looks like Sony might have something up their sleeve.

To give some context to the almighty power of the Tempest Audio Engine, the PS4 has 8x “Jaguar” cores, with fierce competition for use of those cores sound would typically get a fraction of one core for an entire game (so unfair). The Tempest Engine is the equivalent to all 8 Jaguar cores, and that just for sound processing! In the past, for 99.9% of games, it was expected that the audio teams would achieve a lot with very little, hopefully this will no longer be the case. This feels like a large leap in the right direction to unharnessing the true potential of game audio.

Mark explained that for him personally a game is dead without the audio and the impact from the quality of the audio is huge.
 
 
The goals Sony addressed are:

1. Great Audio for everyone – so the audio is part of the console itself, and not just a peripheral

2. Support hundreds of sound sources – every sound in the game to have dimensionality, so developers don’t have to make compromises

3. Address the challenges of presence and locality – presence meaning you feel you’re really there and locality referring to directions from all around you, including sounds from above and below


The refence given in the presentation was to imagine the sound of rain in a video game. Usually this would come from a single sound source, or bed, but with the power of Tempest each raindrop could have its own sound source falling from all around you. The difference in emersion would be massively amplified. This would be possible because the Tempest Engine is capable of delivering around 5000 sound sources. By way of comparison Dolby Atmos can deliver 32, one of the reasons why Sony decided not to go down the Dolby route, the other being the key message is "3D Audio for All” and not just those with licenced soundbars and devices.

The key to unlocking 3D audio for all is via the use of HRTF Head Related Transfer Functions to render the spatial sound via any pair of headphones. The PS5 will initially launch with 5 profiles to choose from to best suit the user, but in future there will be functionally to create your own personal HRTF via photos/videos of your head and ears or via some sort of video game to tune your ears. This is important because in my experience using a personalised HRTF does improve the clarity of spatial information and could give the player an edge, especially in first person shooters.
 
 
It was noted by Mark that 3D audio is a major academic research topic, and no one currently has all the answers, but Sony seem to be focusing on Binaural Audio over headphones via HRTFs to render not only audio objects, but also Ambisonic sound fields. The fact Mark had a slide dedicated to Ambisonics, and High Order Ambisonics (HOA) from what I could tell, in his presentation is exciting (for someone who has been playing around with the technology over the last few years) and the first time I think it’s been given such prominence in a presentation for a mass consumer device. The use of HOA allows developers to create extremely detailed 3D sound beds to compliment the various sound objects – when used together the end results can be spectacular.

The strategy for launching “3D Audio for all” initially via headphones makes sense because most people have access to pair of headphones, and you can control exactly what each ear hears. Sony are also developing what they call Virtual Surround, which I assume will be some sort of binaural rendering over speakers. This is tricky technology to crack, mainly due to the cross talk inherent from each speaker. I have heard several examples of this technology via start-ups and academic research and it is possible, albeit with a relatively small “sweet spot”. Sony said they have their own version currently working and are looking at ways to improve the size of the sweet spot, as well as increasing the overall performance of the 3D information locality. It will be interesting to see what they come up with.

So in summary it’s all rather promising and does genuinely feel like a fresh approach to gaming audio. The messaging around “3D Audio for all”, huge audio processing power, thousands of audio objects, ambisonics, binaural audio and HRFT’s is extremely exciting for those in the audio gaming development world. It will be interesting to see how game engines like Unity and Unreal react to this announcement, especially as they seem to be geared more to a traditional channel-based audio workflow like 5.1 and 7.1. For example, it’s currently not possible to import High Order Ambisonic files directly into Unity or Unreal, even though we have microphones, plugins and DAWs that let us easily create beautiful HOA sound fields – hopefully this will change in the near future.

I will be keeping a keen eye on Tempest. As someone who has been experimenting with the possibilities of 3D audio, especially from a game engine workflow point of view, this announcement from Sony was very encouraging indeed. Bring on the new dimension in game audio!
 

Sign up to our mailing list for more like this.