Video on the Web
A performance assessment of the royalty-free and open video compression specifications Dirac, Dirac Pro, and Theora and their open-source implementations
additional comments May 2009
Despite the tremendous success of online services like YouTube, video on the web is far from ubiquitous, compared to for instance still images. This article discusses likely reasons and shows an interesting recent development for potential remedies. However, its main contribution lies in the performance assessment of some newly released video compression algorithms.
We're quite familiar with how the situation has been for the last couple of years. HTML 4.01 with its support for text and still images has been with us for a while now. For everything beyond, such as audio and video, we know there are browser plug-ins, and among these the de facto standard named Flash Player is likely to be the most successful plug in there is so far.
But recently, the winds of change have started blowing. The W3C now have an activity addressing web video, including support for video (and audio) in recent drafts of the upcoming HTML 5 standardization efforts, browser makers like Mozilla and Opera are preparing to implement support, and there is a on-going public discussion around what multimedia formats to support. The choice of supported formats is the most crucial point for the success of standardized support for video. Indeed, the editors of HTML 5 are aware of this. In the Working Draft by February 12, 2009, they write:
It would be helpful for interoperability if all browsers could support the same codecs. However, there are no known codecs that satisfy all the current players: we need a codec that is known to not require per-unit or per-distributor licensing, that is compatible with the open source development model, that is of sufficient quality as to be usable, and that is not an additional submarine patent risk for large companies. This is an ongoing issue and this section will be updated once more information is available.
The high focus on visual content is influenced by recent progress regarding the development of royalty-free and open specifications for video compression. In 2008, the BBC have gone public with two specifications called Dirac Pro and Dirac, and they also provided reference implementations. Even earlier, in 2006, the Xiph.org Foundation have released the specification for a compression algorithm known as Theora I. Now, why have the BBC and the Xiph.org Foundation developed their own specifications? The answer is coming right up.
Moving images vs. still images
Concerning compression of still images, the situation is quite clear. People typically use GIFs or PNGs for graphics, and JPEGs/JPGs for natural images. This applies to private storage as well as web images. When it comes to video, however, the situation is much more complicated. There is a number of MPEG standards, such as MPEG-1, MPEG-2 Video, MPEG-4 Visual, and MPEG-4 AVC, and there are plenty of ITU-T standards, including H.261, H.262, H.263, H.263+, H.263++, and H.264. Not to mention all those proprietary solutions for various Microsoft and Apple platforms, and many many more.
Now why is that? Obviously this is mostly about royalties. GIF, PNG, and JPEG are now royalty-free as well as open, and most people are satisfied with their performance. On the other hand, there was so far no such thing like a royalty-free and open video format that simultaneously performs well. The question is, did the situation change with the advent of Dirac and Theora? Both are claimed to be royalty-free, but what about their performance? The next section tries to give answers to that.
Testing and results
In a recent project at the research institution where I am working, I had the opportunity to take a look at Dirac and Theora. The project had actually long-term storage of video content for private use in mind. A good compression algorithm, however, is not only useful for off-line storage but ― with a reasonable complexity ― for transmission (that is, web video), too, given the limited bandwidth / storage capacity in both cases. Besides, a format which is used on the web has the potential to be used for storage purposes in the long-term run; we have seen this with e.g. JPEG.
So the concrete question in the aforementioned project was how well the mentioned specifications are suitable for transmission and off-line storage. Here, I give you pointers to the results of the experiments conducted, and I summarize the most important results. All details of the experiments can be found in the conference article.
The specifications pros and cons
The first conclusions can be drawn without any experiments, simply by reading the specifications. They are as follows.
- Dirac Pro, Dirac
- Dirac Pro and Dirac have a lossless mode. This is useful in case you have later storage with different formats in mind and have to give re-compression a thought with all the implications involved, such as quality loss during the transcoding process.
- Dirac Pro provides spatial and quality scalability, useful to save bandwidth during the transmission of a single bit stream to receivers with different image resolution and bandwidth requirements.
- Theora I
- The Theora specification leaves its target areas unspecified. I figure this is a drawback, as all research has been showing that there is nothing like a one-size-fits-all. In fact, all algorithms have strengths and weaknesses, and typically algorithms are tailored for operating under particular conditions. The profiles defined in the MPEG standards are an example for that.
- A big disadvantage with Theora's specification is that it lacks support of a lossless mode. Imagine a better compression format evolves in the future, and you want all your Theora video to be in that format. Obviously you have to transcode your collection, and this introduces unwanted artifacts with lossy formats, so a lossless format is of advantage for private storage.
- Finally, I am missing the ability for support of any form of scalability in Theora. This non-feature is important in particular in the light of the claim to use Theora to a wide extend on the web, for instance for streaming purposes.
The implementations pros and cons
Next, I tested the implementations for both specifications with regard to their performance. That is, I tested two applications which claim to implement the respective specification, dirac and libtheora. The main questions with such a performance evaluation are always what is the image quality compared to the bit rate achieved, and at what price in terms of complexity come the computations? Below only the most important results; for more details, I refer to the conference article.
- Dirac / dirac
- The most important result is that Dirac/dirac is inferior to the reference standard/implementation H.264/ x264 (also known as MPEG-4 AVC) with a substantial margin. More specific, between roughly 10 dB (at low rates) and roughly 5 dB (at high rates) in terms of PSNR, depending on the video material. That means a significant difference in the image quality.
- The performance of Dirac Pro is comparable to the performance of Motion JPEG2000, i.e. their rate distortions curves are quite similar. However, H.264 in intra-frame mode appears to be superior to both specifications, in particular at high bit rates. In lossless intra-frame mode Dirac Pro is even inferior to Motion JPEG2000 with a substantial margin.
- Dirac/dirac is severely limited when it comes to processing at very low bit rates. It was impossible for dirac to go beneath a particular minimum bit rate, and the image artifacts introduced were severe, given that the reference systems produced significantly less artifacts at a lower bit rate.
- The lossless mode of dirac (version 0.9.1) was not stable enough to allow an evaluation.
- dirac is not capable of real-time decompression, even with QCIF sized video.
- dirac is the least efficient implementation in terms of processing power measured in video frames per second when the efficiency of openjpeg, the implementation for Motion JPEG2000, is ignored.
- dirac has no option for so-called rate distortion optimization. This makes it difficult to conduct fair software and algorithm comparisons, which requires this kind of optimization. A near optimization can be achieved by requiring a full motion estimation search.
- Theora / libtheora
- Theora/libtheora is outperformed by H.264/x264 by a very substantial margin of roughly 10 dB. That means huge difference in terms of image quality, and probably plenty with artifacts.
- The implementation (version 0.19) of the quality factor seems to have serious flaws. In some cases, the desired quality / target bit rate range cannot be reached. Also, with certain video dimensions (QCIF and HD), the video quality cannot be controlled at all.
- libtheora is not capable of running in intra-frame mode, i.e. without motion estimation.
- libtheora has no option to output raw video data, i.e. data not embedded in the OGG container format. This not only complicates the calculation of the true rate consumption, but also is a hinder in case a different container format is desired.
Both Dirac and Theora have been developed to be state-of-the-art technology. However, my experiments have shown that they are in fact not even close to the cutting edge. Also, what is cutting edge? H.264 is from 2003, and Motion JPEG2000 is from 2001. That's ages in the ICT development. Dirac and Theora are comparable to MPEG-2 and H.263+, at best. That's the previous century.
So, what is needed concerning video compression for private use (storage) and for video on the web? Improved implementations, and if it is impossible to improve them further, better algorithms are required, better specifications. Of course, you don't need much fantasy to realize that it is difficult to compete with MPEG and ITU-T standardization, as a billion dollar industry is involved in the participating groups. And, all the patent work associated with the development of a new specification is likely very cumbersome. But companies like the BBC and organizations like the Xiph.org Foundation have shown that it is possible. The 'only' thing missing is a specification describing a moderately efficient algorithm.
The BBC have already realized this and are working on a better implementation named schrödinger. Extensions to existing specifications ― I'm thinking of Dirac and Theora ― is another option (which Xiph.org is aware of), or the development of an entirely new specification (as announced by Sun with the Open Media Commons initiative). However, also reference systems will evolve; maybe there'll be extensions to H.264, maybe there'll be H.265 in the not too far future, maybe the development JPEG2000 will be continued. The current progress of JPEG XR is quite interesting, maybe someday a Motion JPEG XR will see the light of the day. What I am trying to say here is that state of the art in, say, 2 years time will be different than today's cutting edge, which has to be taken into account in the development process of any new specification.
- Article at IMAGAPP conference [PDF file]
- Note: This is an altered version of the conference article. The changes comprise of a couple of corrected (minor) errors, and a slightly extended reference section.
- Presentation at IMAGAPP [PDF file]
- BibTeX entry for citations [.bib file]
In the last couple of weeks I've been discussing the paper with various people, some of them working on libtheora and dirac. They asked me to stress a couple of important points.
- The software is from Q2-2008, even though the paper was first published in Feb. 2009. This due to the fact that the paper was rejected twice at other conferences because of a "lack of innovation", which I leave uncommented here. Needless to say, there are newer (improved) versions available as both Dirac and Theora appear to be currently very active projects.
- Evaluations like this here are able to look at a particular specification only through the eyes of the implementation. That is, the only thing which can be evaluated is the software, and if it's good and standard compliant, then you can draw positive conclusions about the spec. If it's bad, it is impossible to say if it is the spec's fault or the implementation's.
- I've intentionally checked the codecs for performance at bit rates which they might not have been designed for. I believe it is useful information to see where exactly the boundaries lie, and if a codec can be used for different purposes than the ones originally desired. Also, this kind of testing checks the target application claims of the spec, if any.
- The research uses a limited number of sequences, and hence the results cannot be generalized and should only be used as a first pointer. More sequences are needed for more objectively valid results.
- The meaning of PSNR is known to have boundaries regarding its ability to say something about the real image quality. The most valid method would be to let many human eyes decide on the quality.
All content on this page is licensed under the Creative Commons Attribution 3.0 License