Hear Here: How Loudness and Acoustic Cues Help Us Judge Where a Speaker is Facing

Researchers explored how humans use loudness and other sound-related cues to understand the speaker’s facing direction in virtual environments

Immersive media, including augmented and virtual realities, are taking over the world by storm, underscoring the need to progressively improve user experience through more realism. A relatively underexplored area in this field is how the user perceives the speaker’s orientation.

Accordingly, research led by Sophia University showed that the loudness of the speaker’s voice, followed by spectral cues, helped the listener judge the speaker’s orientation. Their findings answer longstanding questions in auditory perception.

As technology increasingly integrates complex soundscapes into virtual spaces, understanding how humans perceive directional audio becomes vital. This need is bolstered by the rise of immersive media, such as augmented reality (AR) and virtual reality (VR), where users are virtually transported into other worlds. In a recent study, researchers explored how listeners identify the direction from which a speaker is facing while speaking.

The research was led by Dr. Shinya Tsuji, a postdoctoral fellow, Ms. Haruna Kashima, and Professor Takayuki Arai from the Department of Information and Communication Sciences, Sophia University, Japan. The team also included Dr. Takehiro Sugimoto, Mr. Kotaro Kinoshita, and Mr. Yasushige Nakayama from the NHK Science and Technology Research Laboratories, Japan. Their study was published in Volume 46, Issue 3 on May 1, 2025 in the journal Acoustical Science and Technology.

In the study, the researchers asked participants to identify the direction, a speaker was facing using only sound recordings, using two experiments. The first experiment involved sound recordings with variations in loudness, and the second experiment involved recordings with constant loudness. The researchers found that loudness was consistently a strong indicator in judging the speaker’s facing direction, but when loudness cues were minimized, listeners still managed to make correct judgments based on the spectral cues of the sound. These spectral cues involve the distribution and quality of sound frequencies that change subtly depending on the speaker’s orientation.

“Our study suggests that humans mainly rely on loudness to identify a speaker’s facing direction,” said Dr. Tsuji. “However, it can also be judged from some acoustic cues, such as the spectral component of the sound, not just loudness alone.”

These findings are particularly useful in virtual sound fields that allow six-degrees-of-freedom—immersive environments like those found in AR and VR applications, where users can move freely and experience audio in different spatial configurations. “In contents having virtual sound fields with six-degrees-of-freedom—like AR and VR—where listeners can freely appreciate sounds from various positions, the experience of human voices can be significantly enhanced using the findings from our research,” saidDr. Tsuji.

The research emerges at a time when immersive audio is a major design frontier for consumer tech companies. Devices such as Meta Quest 3 and Apple Vision Pro are already shifting how people interact with digital spaces. Accurate rendering of human voices in these environments can significantly elevate user experience—whether in entertainment, education, or communication.

“AR and VR have become common with advances in technology,” Dr. Tsuji added. “As more content is developed for these devices in the future, the findings of our study may contribute to such fields.”

Beyond the immediate applications, this research has broader implications in how we might build more intuitive and responsive soundscapes in the digital world. By improving realism through audio, companies can create more convincing immersive media—an important factor not only for entertainment, but also for accessibility solutions, virtual meetings, and therapeutic interventions.

By uncovering the role of both loudness and spectral cues in voice-based directionality, this study deepens our understanding of auditory perception and lays a foundation for the next generation of spatial audio systems. The findings pave the way for designing more realistic virtual interactions, particularly those involving human speech, which is probably the most familiar and meaningful sound we process every day.

Reference

Title of original paper: Perception of speech uttered as speaker faces different directions in horizontal plane: Identification of speaker’s facing directions from the listener

Journal: Acoustical Science and Technology

DOI: 10.1250/ast.e24.99

Authors: Shinya Tsuji¹, Haruna Kashima¹, Takayuki Arai¹, Takehiro Sugimoto², Kotaro Kinoshita², and Yasushige Nakayama²

Affiliations: ¹Department of Information and Communication Sciences, Sophia University, Japan, ²NHK Science and Technology Research Laboratories, Japan

About Dr. Shinya Tsuji

Dr. Shinya Tsuji is a postdoctoral fellow at the Department of Information and Communication Sciences, Sophia University. His major research interests include unilateral hearing loss, and reverberation, and his expertise involves experimental psychology, human interfaces and interactions, informatics, and humanities and social sciences. He has published five articles. He is an honorable awardee of multiple recognitions, including the 2022 Student Outstanding Presentation Award from the Acoustical Society of Japan. He is also involved in social activities and contributes actively to the Information and Community Site for Unilateral Hearing Loss.