Immersive Sound, SMPTE DCP, and Security

Audio is the stepchild of digital cinema. Not that there’s no one who understands it. But rather that there are so many who think they understand it better than anyone else. There is now an opportunity to manage the standardization of immersive sound in cinema and clean up a few things along the way. It remains to be seen if the industry is smart enough to get this right.

The complication behind audio is not the audio signal itself, but the metadata that accompanies it. The wonderful thing about metadata is that it provides so many ways to do the same thing. SMPTE DCP, my favorite example, provides at least three incompatible ways to package 5.1 audio. This unchecked creativity of standards committees results in a burden to manufacturers and distributors. Without an ROI for implementing the whims of standards committees, technology providers rarely do more than the minimum. For SMPTE DCP, this results in distributing audio in a manner similar to Interop DCP. This diminishes the value of published standards, forcing fulfillment companies to study real products to learn how to successfully play movies with sound.

Ironically, it is this same complaint of inconsistent implementation that plagues the Interop DCP format. But history does not stop there. Film also provided three incompatible ways to distribute digital 5.1 sound, introduced by Dolby, DTS, and Sony. While exhibitors and distributors lamented such inefficiency, film duplication companies profited from it. A similar division exists today, making it important to remember that one company’s woes can be another’s joy when observing who is opposed to what.

The emergence of immersive sound could exacerbate the situation. Immersive sound requires that metadata and audio signal be more tightly coupled than any sound format before it, which should give one pause. But immersive sound also has the potential to simplify distributions if a 5.1 fallback track were to be included. A fallback track would allow the movie to play without an immersive sound system present. The first observation spells the need for tools that allow manufacturers to properly and uniformly evaluate their implementations. The second observation provides an opportunity to clean up the mess previously introduced by an unrestrained standards committee. There is a third area that deserves consideration, as well, that of security. I’ll review all three.

Immersive sound is characterized by what can best be described as a database of audio clips accompanied by metadata that instructs a rendering engine what to do with the clips. Usually, although there is no requirement to do so, the addition of linear sound tracks is included as a “sound bed,” over which the sound clips are positioned, processed, and panned. The accurate portrayal of creative intent will become the meat of competitive rendering engines. But this could also become a nasty game where a manufacturer adds some clever metadata that simply doesn’t register on competitive products.

There are two ways to monitor and mitigate this problem. The first is to monitor the mix on competitive rendering engines while the soundtrack is in production. The ability to do so could become a competitive capability of sound mixing facilities. The second is to assist design engineers with test materials that enable them to properly evaluate their products. This is not as simple. Test materials made available through the grace of competitive technology providers are too easy to over simplify, and too easy to ignore. A suitable baseline of rendering capabilities will need to be recognized by an accepted industry body, which may or may not be SMPTE.

The second idea, that of a fallback track, goes back to the days of digital sound tracks on film, where the fallback was the Lt/Rt matrix-encoded analog track that was also on film print. In immersive sound, a fallback track is useful when a rendering engine fails or is not available. The successful management of rendering failure is more likely to occur with outboard sound processors, the popularity of which is likely to fade as in-media-block rendering engines emerge. But the utility of a fallback track can and should be extended to legacy systems, too. If the industry is to simplify its distributions, the immersive sound DCP must also play on non-immersive systems. The best way to do this is to include Interop-style 5.1 sound with the immersive sound DCP, since Interop-style sound tracks are the only sound tracks that play on all systems, including legacy systems. This would bypass the convoluted and multiple methods of wrapping 5.1 sound currently called for by SMPTE.

Security of sound tracks could also be a concern. With normal cinema sound, sound tracks arrive in the cinema in encrypted form. They are decrypted and simultaneously streamed by a DCI-compliant media block. The sound track emerges from the media block in unencrypted form, but forensically marked. It is the unencrypted nature of the output signal that raises question as to the value of securely decrypting the many channels of immersive sound. Dolby manages it by decrypting within an outboard processor that is not necessarily as secure as a DCI-compliant media block. This was not an irrational move, considering that the output signal can be comprised of as many as 64 signals, which is not a useful form for pirating.

But that won’t keep DCI from attempting to define a test plan for secure DCI-compliant immersive sound processors. There is plenty of speculation that the less-than-transparent DCI is giving this consideration. However, with Dolby as the dominant technology provider in this space, having an installed base of hardware that would be costly to replace, there is risk that DCI would embarrass itself if Dolby were to ignore its wishes. This time ‘round, it may be better to let technology providers have a voice in a new security specification for cinema immersive sound.