Version
High dynamic range audio (HDR audio) is a technique to design a mix using level values spanning across a very high dynamic range as occurs in nature. HDR is also a run-time system that dynamically maps this wide range of levels to a range that is more suited to your sound system's digital output.
In the real world, the audible dynamic range that spans from the threshold of human hearing to the loudest possible sound in air is several times wider than the dynamic range offered by speakers at game play levels. The role of an HDR system is to collapse or "compress" the whole real life dynamic range, approximately 190 dB, into 96 dB (the dynamic range available for a digital device), and even less in practice due to floor noise levels.
In HDR photography, local tone mapping is applied independently to various regions of an image to enhance the contrast within each region. HDR audio works in the same way; it performs sound level mapping instead of tone mapping, and does so locally in time. Thus, at any given moment, the system automatically adapts the mapping based on the levels of sounds that constitute the audio scene.
HDR Glossary:
Term |
Definition |
---|---|
Decibel (dB) |
A logarithmic measure of the level of a sound compared to the level of another sound or an arbitrary reference value. One decibel equals |
Decibel full scale (dBFS) |
A logarithmic measure of the amplitude of a signal compared with the maximum that a device can handle before clipping occurs. A value of 0 dBFS is thus the loudest sound that can be generated by the digital audio output. 16-bit digital audio output devices range from 0 dBFS down to -96 dBFS. The level of the audio signal coming out of the Master Audio Bus in Wwise should therefore lie between these values. |
In HDR audio, you can assign volume values to sounds of the game's virtual world that span over a much larger dynamic range than the standard 96 dB of 16-bit output devices, much like they would in the real world. It is the task of the HDR system to translate these values into dBFS, as illustrated in the figure below.
The inputs are the sound levels of the virtual world, expressed in decibels (dB) relative to an arbitrary reference. The values can be chosen arbitrarily high or low, and thus have a high dynamic range. The outputs are the levels of the corresponding sounds in dBFS. The range of these values depends on the output device, which typically has lower dynamic range than the input.
In its simplest form, the HDR system operates as follows: at each time slice, the system selects the sound assigned the highest volume in the virtual world, automatically maps it to the output value of 0 dBFS, and then maps all other sounds proportionally.
Let's use an example to illustrate this. Suppose that at a given moment ("time 1") sound "blue" plays at +30 dB in the virtual world, as illustrated on the left side of Figure 11, HDR window. The reference (0 dB) is arbitrary. Because "blue" is the loudest sound at time 1, it plays at 0 dBFS at the output of the HDR system. Another sound, "purple", plays at 0 dB in the virtual world, that is, 30 dB below sound "blue". Thus, it comes out at -30 dBFS at the output of the HDR system. A third sound, "green", plays at -66 dB in the virtual world, which results in -96 dBFS at the output of the HDR system. Since the output of the system is constrained to a dynamic range of 96 dB, the level of "green" corresponds to the lower bound of all audible sounds. At time 1, any sound that is softer than "green" is inaudible.
In the previous example, the range [-66, +30] dB in the virtual world (input side) is referred to as the HDR window. It is represented by the blue region on the left side of Figure 11, HDR window. The HDR window has a fixed width, determined by the dynamic range of the output. For a 16-bit device, it is equal to 96 dB at most, but in practice it is usually smaller. At time 1, the sound at +30 dB is the loudest in the virtual world, and any sound below -66 dB is inaudible because it is below the HDR window.
Suppose that later, at time 2, another sound, "orange", starts playing at +50 dB in the virtual world. To accommodate this new louder sound, the HDR system slides the window up by 20 dB, so that its bounds are now [-44, +50] dB on the input side. All sounds are then mapped to the new values. The sound at +50 dB plays at 0 dBFS, the sound at +30 dB now plays at -20 dBFS, and the sound at -66 dB is now below the window and is therefore completely inaudible. When the orange sound stops playing at time 3, the window gently slides back to where it was, and other sounds take their former volume.
On the left, sound levels are represented at the input of the system, in decibels (dB) with arbitrary reference. The HDR window is represented by the blue region. At time 1, the top of the window is aligned with the loudest playing sound (blue). At time 2, another sound (orange) starts playing at +50 dB, and the window slides up immediately to accommodate it. The sound at -66 dB (green) is then clearly below the window, and thus inaudible. The resulting levels in dBFS at the output of the system are shown on the right. When the HDR window slides up by +20 dB because of the orange sound, the volume of other sounds drops by -20 dB. During this time, the green sound is completely excluded from the output. Notice that the orange sound plays at the same output level as the blue sound during time 1. When the orange sound stops playing, the window gently slides down to its former level, and the volume of other sounds increases accordingly.
The HDR system works like a dynamic range limiter/compressor. It affects your mix by making soft sounds inaudible when loud sounds play, and making them audible again when playing alone. The relative levels of sounds between one another in the HDR world are preserved, creating the illusion of a greater dynamic range, while in fact they are compressed within the output device's lower dynamic range. Furthermore, thanks to the system's automatic volume ducking when louder sounds play, your mix will be cleaner and have better focus. The next figure illustrates this principle.
The window slides up only when louder sounds play. When the window slides up on the input side, the sound volume drops on the output side. What was formerly audible, such as the sound of leaves in a tree, can become completely inaudible when a gunshot is played. The actual volume of sounds at the output of the system depends on the distance between them and the top of the window, at any given moment. Here, the gunshot and explosion would come out of the system at the same level if played alone, but because the explosion is considered louder than the gunshot and effectively ducks its volume, listeners are left with the impression that it is indeed louder.
Questions? Problems? Need more info? Contact us, and we can help!
Visit our Support pageRegister your project and we'll help you get started with no strings attached!
Get started with Wwise