Safe and sound: a few thoughts on pleasantness in videogame sound effects

Denis Zlobin
UX Collective
Published in
10 min readOct 26, 2020
A photo of a cotton field
Photo by Amber Martin on Unsplash

In my two previous posts, I looked for a reliable technique to control the perceived unpleasantness of video game sound effects. Even though I couldn’t discover a 100% working method, there is evidence that it is not an impossible goal. Here I’m investigating if it works the other way around, and exploring several ways to make sounds pleasant to hear.

I’m not trying to build a theory that explains why things sound pleasant to us. Instead, I focus on psychology- and psychoacoustics-driven approaches to help sound designers solve design problems. Remember that I’m no expert in either of these fields, so please prepare your grain of salt.

My recent study of unpleasantness started with an idea of universally hated sounds. The sounds that every person finds horrible, disregarding the context. The scratching of fingernails over a chalkboard is the best-known example of this. But is there any universally loved sound? Something that sounds pleasant even if it wakes you up in the night or appears out of nowhere? To my knowledge, there is no such thing.

In the book “The Universal Sense,” Seth Horowitz explains that positive emotions are more complex than negative ones. They heavily depend on the listener’s previous experience. David Poeppel expresses similar thoughts in the 13th episode of “Twenty Thousand Hertz” podcast, adding that alarm signals should produce very similar outcomes for the sake of survival, but positive stimuli are open for interpretation. One can say that our auditory system, at a low level, is tuned to detect threats, but not beauty.

We can not exclusively rely on low-level auditory cues to trigger pleasant sensations in human beings. But it doesn’t mean we can not evoke positive emotions with sounds — at least sound designers for various media do it every day. We just need to look for a different, less ambitious approach and align with contextual factors instead of ignoring them. It also means we must acknowledge that the wrong context can ruin our best efforts, no matter what we do.

In other words, there will be no 100% reliable method to make the sound effect “intrinsically” pleasant to hear. But there might be ways to increase the likelihood of such interpretation.

Acoustic features

As I said above, low-level auditory cues are unreliable when we need to induce pleasant sensations. But they are also not entirely useless. In the previous posts, I focused on the acoustic features that communicate a threat on a preconscious level. It might be interesting to reverse that idea and explore the acoustic features that indicate safety.

First of all, safety is the absence of a threat. In that sense, we might want to neutralize or reduce some of the features I described here. However, I must warn you against following it as a guideline for the design process: doing so may result in sterile, soulless, and bland sounds. Instead, I see it as an analytical framework. If I feel there is something wrong with a sound, I examine it for those features and see if I can control them to reduce the unpleasantness.

One study suggests that reverberation time is an important safety-related auditory cue. We perceive sounds as more pleasant when they exist in a small or a medium-sized space. Probably evolution has taught us that small, tight rooms are safer and more predictable than open spaces. Interestingly, this effect only works one way. Shorter reverberation time doesn’t reduce the unpleasantness of aversive sounds because a dangerous object should not seem less threatening when located in a small room.

Of course, the reverberation of diegetic sounds depends on the room acoustics of the level. I by no means suggest overriding it to make diegetic sounds more pleasant to hear. Doing so will likely result in the opposite effect because of incongruity between acoustic properties and visual presentation of the space. But this technique may be handy when working on non-diegetic sound effects like user interface feedback and notifications.

More importantly, this idea opens a question about the players’ listening setup. We usually place speakers in the room we know and consider to be safe. So the sound coming from speakers is typically heard with some familiar room reverb, unlike the sound from headphones. Does listening to a dry sound with speakers feel “safer” and more pleasant? Technically speaking, it should, and I would love to test this assumption in a controlled environment. If true, we may start sweetening some parts of the mix with a room reverb exclusively for the headphone mix to compensate for the lack of feedback from the listening space.

Gaming desktop
Photo by Abdul Barie on Unsplash

Another concept deeply related to auditory pleasantness is the tonal relation between sounds. The term “consonance” is well-known to anyone familiar with Western music theory and well-described in various literature about music perception. If you are curious to learn more, check the Wikipedia article about consonance and dissonance. From a practical standpoint, we can think of sonance as a spectrum where dissonant sounds lead to unpleasant sensations, and consonant ones feel more pleasant to hear, even though people of various cultural backgrounds may perceive it differently.

Perceptual fluency

One explanation for our preference towards consonance is tied to the idea of perceptual fluency. Consonant intervals have simpler ratios between harmonics, so they are easy to process. And we tend to like things when our brains can quickly process the sensory information about them. Ease of processing makes information appear more truthful. And even more, there is a theory that directly links perceptual fluency to the very concept of beauty!

Perceptual fluency is interconnected with the mere-exposure effect: the idea that we develop a preference for familiar objects. Generally speaking, the more we experience something, the more we prime ourselves to like it. And some researchers suggest that easy-to-process information naturally seems familiar to us.

To understand why familiarity is important, let’s once again think about the evolutionary role of hearing as a threat detection system. An unfamiliar sound tells about a novel object somewhere around us. It may or may not be dangerous, but we need to stay alert until we somehow resolve this uncertainty. Familiar sounds relate to the objects we have previously examined, so we already know how to react in their presence. Unfamiliar sound is always a mystery that requires extra attention.

A picture of a labyrinth inside a human brain
Photo by Morgan Housel on Unsplash

But there is a limit to these effects. Both oversimplification and overexposure lower the level of pleasure we experience. If we reduce a rich musical instrument timbre to a simple sine wave, it will be easy to process but sound dull and inexpressive. Children, naïve listeners, and even animals are attracted to consonant music, but professional musicians often get used to it and develop a preference for dissonant tones. Many researchers write about an inverted U-shape relationship between complexity and aesthetic pleasure: a piece needs to be complex enough to be perceived as beautiful. And the more experience the person has with a particular kind of information, the more complexity they need to feel pleased.

Reber et al. explain that perceptual fluency has the strongest impact when the source of fluency is unobvious, and processing efficiency comes as a surprise. Here is my, probably wrong and oversimplified, understanding of this phenomenon that ties these ideas together:

The human brain may set estimates for information processing tasks. If it manages to process the information faster than expected (and doesn’t detect any threat), it rewards itself for saving resources, and we feel a pleasant sensation. If the information is obviously simple, the brain expects shorter processing time from the beginning, so we don’t get any “efficiency bonus.” The more practice we have with certain kinds of information, the more fluent our brain becomes, so the estimates become more accurate after repeated exposure. It leads to a smaller discrepancy between estimated and real processing time, so we get less reward for processing fluency.

I have no authority to speculate on this subject, so please don’t take my explanation seriously: it is just an attempt to build a coherent mental model that aligns with the literature I’ve read. The remark about a threat is worth looking at further. Some researchers see processing fluency not as the mechanism that drives positive affect but as an amplification factor that makes positive stimuli appear more positive and negative stimuli — more negative. It implies that boosting the perceptual fluency of unpleasant sound, especially the one that communicates threat, will only make it even more unpleasant to hear.

One can think of several ways to apply these concepts in game audio design, starting with how we mix the game and ending with the design process of individual sound effects. For example:

  • To be perceived as pleasant, sound effects should be easily audible rather than buried in the mix.
  • They should contain enough detail to challenge our brain but not be overloaded with useless information.
  • They need to be easy to recognize and clearly communicate the message.
  • We need to maintain the right balance of repetition and variation.
  • Tonal relationships between sounds are important.
  • Less is more, and mix clarity matters.

None of these statements is a revelation for audio designers, but it is always good to back our intuitive judgments with evidence.

We could look even deeper and think about how we can trick our brain by communicating that the sound is harder to process than it really is. Sound exists in time, so the most obvious trick is to pack more information into the attack phase and let it resolve into something beautiful and straightforward. The THX Deep Note can serve as an exaggerated illustration of this principle. Many reward-related sound effects in games follow the same pattern, resolving an initial non-musical layer into a consonant reverberant tail.

I will return to the concept of perceptual fluency in my future post, where I will try to explore how sound contributes to cognitive load, focusing on the ways we can help players make faster decisions.

References and metaphors

We can look at familiar sounds from a different angle and use them to remind players of good memories of their past. Joel Beckerman and Tyler Gray describe a similar design process in the book “The Sonic Boom: How Sound Transforms the Way We Think, Feel, and Buy”. When it comes to positive emotions, they suggest thinking not about the sounds themselves but about the experiences people commonly associate with them. For example, many people have childhood memories connected to ice-cream truck jingles, so hearing similar sounds might bring positive emotions through the association.

The technique of referencing familiar experiences is not a new thing in sound design. It has been used in audio branding, product UI sound design, and some of the best sounding games. The iconic example from the videogame world is the Overwatch hit marker that derives from the sound of opening a beer can, an experience many of us find pleasant.

Pay attention to the high-frequency sounds that mark successful hits.

But as any method, it has its limitations. The most obvious one is that associations hugely vary across the population. The beer can reference works for many people but might trigger bad memories in people struggling with alcohol addiction.

Of course, “missing” the sound reference in a game will not make it unplayable. But I still think we should be careful with this approach to make it work. Instead of referencing what feels pleasant to us, we need to search for what is enjoyable to the players. And to do that, we need to understand the players’ values and culture codes, which becomes harder the broader our audience gets.

Another problem may arise when the audience is unfamiliar with the reference. In “The Sound Book: The Science of the Sonic Wonders of the World” Trevor Cox gives an example of birdsong that is usually associated with safety and calmness. Such a connection makes perfect sense from the evolutionary perspective: birds stay silent if they sense a threat, so the mere presence of a birdsong preconsciously tells us that the environment is safe. But this association may only work for the local birds in your region. People who move abroad sometimes report that birdsong sounds irritating in a new place. It most likely happens because it doesn’t sound familiar, so it drags the attention instead of being filtered out as a part of the habitual soundscape.

To make things sound more familiar, we could start localizing or culturizing the iconic sound effects as we do it with the voice-over for different regions. But this approach would require gathering a lot of information about the users while having a questionable ROI, so I doubt it is realistic at the current state of the industry.

I hope you are still reading because I’m going to end this post with a super important note. It is easy to think that I advocate for making the game soundscapes pleasant to hear and avoid inducing unpleasant sensations. But in fact, I don’t. To explain why I’ll quote one excellent article about healthy soundscapes and cognition:

The pleasant sonic environment allows the listener full freedom and control over the mind states.

While this is entirely true for the real world, videogames are fundamentally different. Games are virtual environments where people expect to be challenged. Controlling the player’s mind states is what we, as game developers, eventually do. It is what players want us to do. Experiences they enjoy in real life won’t necessarily feel good in the game context, and vice versa.

The methods and approaches I describe in my blog deal with very low-level problems. They may harm the product if applied thoughtlessly. Ensure you clearly understand what you want to achieve before applying any of them in your work and trust your ears.

The UX Collective donates US$1 for each article published in our platform. This story contributed to Bay Area Black Designers: a professional development community for Black people who are digital designers and researchers in the San Francisco Bay Area. By joining together in community, members share inspiration, connection, peer mentorship, professional development, resources, feedback, support, and resilience. Silence against systemic racism is not an option. Build the design community you believe in.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Written by Denis Zlobin

Functional Audio and Audio UX in video games. I write articles about game audio design that teach you nothing about DAWs, plugins, game engines, and middleware.

No responses yet

Write a response