Abbey Road Red: 10 Trends for 2021 Red Talk - Immersive Audio and High Tech Hearables

6th July 2021

In our second blog post covering our 10 to watch in 2021 Red Talk, we would like to highlight our first featured trend and speaker. The trend was immersive audio and high tech hearables and our featured speaker was Dr. Poppy Crum, Chief Scientist at Dolby Laboratories and a leading light in the area of spatial audio and hearable hardware technology.

In the run-up to the event, we enjoyed getting to know Poppy and her key areas of interest for the talk, which we cover below.

 

The context we set was that immersive audio is firmly making its way into the mainstream arena in various ways from gaming experiences driven by more powerful hardware to hearables with head tracking and directional audio like Apple's AirPods Pro and Max; smart devices with multi-directional speakers like Amazon's Echo Studio; affordable Dolby Atmos soundbars; content being mixed in spatial audio formats like Atmos and made available to these devices; and more.

Coupled with this, advances in processing power, cloud computation, and hardware miniaturisation mean that our hearable hardware from buds to over-ears is becoming smarter and capable of increased personalisation, head tracking, and spatial audio reproduction. Whereas immersive audio used to be a niche content experience it is becoming widely accessible and affordable.

This was all before Apple announced its launch of spatial audio via Dolby Atmos in Apple Music, a huge step forward into the immersive mainstream. We threw this into the melting pot with Poppy.

 
 

What are technologies you feel will make an impact this year and next in the hearables and immersive audio space?

Technologies that enable deep personalisation. There are two layers to technology driving this personalisation. The first is Empathetic technology, something that's listening to the body and its context, for example, temperature, background noise level or movement sensors and then using that context to drive experience and utility. For example, adaptive music which responds to movement, or automatic noise suppression which suppresses background noise and enhances voices when people are talking in loud environments. The second is the Personalised Head Related Transfer Function or P-HRTF. The HRTF is a mathematical measurement that describes how ambient sound of different frequencies interacts with a listener’s physical body - specifically the contours and shapes of a listener’s head, ears, and torso.

We each have our own bespoke HRTF where our brain has grown accustomed to the unique shapings of amplifications and attenuations across the frequency spectrum. Use of an HRTF is critical for creating an experience of spatial audio. However, to date, most spatial audio algorithms make use of a single exemplar HRTF for everyone’s experience. This is a great limitation when you consider that the reason spatial audio algorithms include an HRTF is specifically to depict to a listener’s brain how sound interacts with the individual human body, and we all come in different shapes and sizes – one size doesn’t fit all.

Inclusion of an HRTF is a critical stage and filter in helping trick the brain into experiencing the perception of sound as being all around us as it would have been in the 3-dimensional natural world, and a one-size-fits all approach doesn’t work when our brain has a lifetime of data to tell it otherwise. We are collectively very sensitive to these differences.

Technology developments in mobile device cameras and sensors paired with machine learning and AI have enabled development of scalable, user-friendly, rapid, and accurate solutions to personalising the HRTF. Using a Personalised HRTF (P-HRTF) provides listeners improvements in the timbral quality of sounds, the experienced spatial resolution in azimuth, depth, and height, consistency in image size reducing unintended masking differences, reduced fatigue with longer periods of head-phone listening, and generally allows a more accurate, consistent, and intimate translation of creative intent between the mixer and the listener. Positional stationarity of sounds experienced via head tracking are also improved with use of a P-HRTF.

If P-HRTF is an underlying driver of consistently successful high-quality immersive audio, a key area of opportunity for technology in the near future is the effective measurement of P-HRTFs from consumer devices.

Computational methods of inferring a listener’s P-HRTF from physical features of their bodies that can be captured from cameras on mobile devices and paired with effective computer vision and machine learning have reached parity on many levels with the historic gold-standard methods of days of measurement spent in anechoic labs and only able to reach a few select individuals. New methods of computing the P-HRTF from consumer devices allow a democratised solution that can reach individuals globally.

Diversity and inclusion in sharing of successful experience from technology is also an important issue here. Historically most one-size-fits-all HRTF solutions are modelled on a mid-sized European white male. Sex, body size, ethnicity all influence our physical features which, in turn, influence the details of our personal HRTF (shaped by our bodies' contours) that our brain is adapted to. Enabling personalised solutions for everyone is a great start at helping bring the opportunities that can be experienced from technology to everyone.
 

What are the implications for sound designers and engineers?

So, in summary, the opportunity is here for technologies that make these measurements more accurate and easy to do for listeners, as well as making the design of the hearable hardware pieces that use them more finely tailored or adaptable to the individual. Future technologies must enable more accurate P-HRTFs, better hardware fit, and compensation for shaping from the unique device transform. P-HRTF is only the beginning in personalisation. We will see hearables supporting improvements in hearing thresholds as well as other features.

It’s time to start thinking about the hearable as a personalised listening device that should work for all of us in ways that can augment immersion and how we interact with the world around us.

From a technology perspective, with new formats - it's got to be about enabling more meta-data that allows for richer creative control and then focusing on technology implementations that consistently support translation of the creative intent from these experiences.

Making it successful is always about how effectively and consistently you translate creative intent into a meaningful perception for everyone to enable a differentiated and great listening experience. When you introduce more meta-data like tags for a sound’s distance (eg near, mid, far), you are enabling new ways of working with sound in the immersive environment.

These types of inclusions in meta-data allow more dimensional descriptions of a mixer’s creative intent, which then gives the technology solutions providing the experience to the listener more opportunity to help in how that is achieved.
 

What would you like to see in terms of exploration from start-up founders and new technology platforms?

I'm very interested to see what the role of the hearable is in immersive environments - what is an augmented experience for music? Hearable hardware can also become an augmented layer of information, a constant part of our day providing us with reactive, closed-loop, interactions with our technologies and our environments. Within the ear, hearable devices can leverage sensors to capture incredibly rich insight about the state of our bodies and brains as we interact with our environments. (Hearables Will Monitor Your Brain and Body to Augment Your Life - IEEE Spectrum). What can we do to enhance the experience of the listener or creatively in our music when our technology has insight to our real-time engagement and experiences? How do we integrate the hearable in our ear into our other listening experiences with the speakers that surround us or other elements in our environments?

The question of occlusivity versus non-occlusivity becomes important here in hearable design. There is a delicate balance between fully isolating designs and features which may possibly interfere with a listener’s sense of connection to their environment as well as the impact of providing persistent neural representations that don’t align with natural listening. Every device that colors how we hear will shape our brain in different ways that we can anticipate. Some are less good than others. A device with less occlusivity enables longer wear times without fatigue where a listener has a natural and seamless integration with their environment.

With a continuum of occlusivity there is always a trade-off of noise reduction and isolation versus comfort and connection.

This means there will be use cases and needs for both types of devices. Current technology feature trends have been focused on occlusive designs that have electroacoustic passthrough. These are successful for shorter-term listening in noisy environments, but still don’t provide a low-fatigue, seamless integration to the environment compared to what can be achieved with the fully non-occlusive directional transducers on a smart-frame.
 
 

In the next and final blog post from our 10 Trends for 2021 Red Talk we will look at our other featured topic, voice activation and social audio.

 
 

Related News