Transcoding for the future

Supporting only low-resolutions with low video quality, this small site (YouTube) changed consumers’ general perception of video. Video was no longer something only viewed on TVs; now it was on their computers and soon on their phones. The popularity of nontraditional video soared, bringing with it many new video sites and devices. The professional media production community rapidly realized the value of these sites for promotion.

This created problems, for the distribution of this content. Every new site or video-enabled device seemed to include a new codec, container or transmission method. These all required new ways of creating compatible video. Some of the new media sites would take any video format and automatically re-process it to meet required standards, while others demanded that the user upload or transmit an already-compatible format. While this was not a problem for consumers, who typically use a few formats shared with their friends, it created significant challenges for professional content creators seeking a wide audience.

The web hosts of the sites and the content creators, in attempting to get their videos on every device, while available anytime anywhere with the highest possible video quality, have been trying to keep up with the onslaught of new codecs and formats.

Following is a description of some of the challenges in multiscreen transcoding and some of the technological advances that answer them.

Bandwidth Concerns vs. Video Quality

These diverse ways of consuming media have created challenges on the last-mile operator networks, especially in bandwidth limited countries. Selection of codecs and formats has become an even more significant task. Content that is playable on legacy handhelds for example may be deemed unwatchable on later generation devices (iPads for example). Adaptive bitrate streaming allows the operators the flexibility to choose multiple formats, frame rates and bitrates to suit any number of screens.

Selecting transcoders that maintain high video quality at all bitrates is crucial to the success of the service. If the content doesn’t look good, users will simply walk away and look for that content elsewhere. Quality of experience is king.

Getting high quality transcodes while keeping costs low requires some clever planning and use of technology.

“One-to-Many” Transcoding

“One-to-many” transcoding is the process of reading the source once, doing any common scaling and frame rate conversion operations required, and feeding multiple encoders. Doing this greatly reduces the number of machines required to complete a task with multiple output variants.

Joining together common outputs reduces the number of transcodes significantly. This is achieved by reducing the number of tasks required of the transcoding farm (live or file-based). Without “one-to-many” transcoding, an HD source would be scaled to other formats multiple times unnecessarily. Cutting these combined scaling operations greatly reduces the stress on the system.

To take advantage of this methodology, the operator of the transcoders must design output profiles with some amount of commonality and configure the transcoders to work accordingly. There is room for improvement in allowing transcoders to automatically take advantage of this type of processing, but it should always remain a configurable option as it will increase the latency for some use cases.

Re-Packaging

The second solution to the problem of scale in transcoding is the ability to re-wrap and pass-through video and audio essences. As the number of video-enabled devices and container types increases, the common thread between them is the codecs. MPEG-2, H.264, AAC, and AC-3 currently represent the majority of video and audio codecs supported on any type of device.

Suppose there are three sets of file wrapping (or containers) for web variants with three rates each. This is required because while Silverlight, Quicktime, and Flash all support H.264 and AAC, they use different container types and store metadata differently. The future transcoder should be able to encode one H.264 video stream and one AAC audio stream, and then simply re-package this stream three times for the three different player types. This again significantly reduces the size of the transcoding horsepower by doing only three encodes and nine muxes (“wrapping” tasks), instead of nine encodes and nine wraps separately.

The most dramatic savings offered by repackaging versus transcoding is in regards to the adaptive bit-rate streaming technologies. Adobe, Microsoft, and Apple have developed different containerization and packaging technologies, which use multiple layers of identical video and audio essences. As a side note, MPEG DASH (Dynamic Adaptive Streaming over HTTP) is a developing ISO Standard. As the name suggests, DASH is a standard for adaptive streaming over HTTP that has the potential to replace existing proprietary technologies like Microsoft Smooth Streaming, Adobe Dynamic Streaming, and Apple HTTP Live Streaming (HLS).

The future of transcoding lies in the ability to keep up with new formats while more efficiently using and repurposing assets. While superior transcoding technology is vital for success, strategic implementation of that technology is equally important. “One-to-many” transcoding, in most cases, is a different way of implementing technology that already exists, while re-packaging is a familiar concept that has not been widely implemented. The combination of these two approaches can create a reasonably-sized and manageable transcoding farm capable of taking on new formats and containers without growing exponentially.

Guy Li-Ran, Director, Regional Solutions
Harmonic Inc.
San Jose, California, USA