Real-time HD video encoding on PS3?
Digital Foundry puts Cell to a serious test.
I decided to pit CodecSys against two different PC configurations running the x264 open source encoder, and the reasoning behind this is very simple. With its dual $200/2000 price points, personal users could buy a CPU and RAM upgrade for the price of the encoder, while professionals could buy a state-of-the-art Core i7 workstation for the same financial outlay. So here's four different encodings using three sets of equipment. The Eurogamer HD video player is used, but to see the entire 720p resolution, the full-screen button needs to be used. The default setting is 960x540, with the HD video scaled down. This produces an effect known as super-scaling which can reduce artifacting... not good in a quality comparison.
1. Core i7 Workstation: Running at 3.33GHz, this unit was especially put together for h264 encoding plus a spot of gaming. I matched as many parameters as I could against the CodecSys encoder (including limiting encoding to one pass, effectively limiting x264's potential right from the off), but still with an eye for reasonable quality and produced this video in 183 seconds. It's worth pointing out that the same hardware takes up to 30 minutes to process a clip like this for my typical Eurogamer HD encodes, so this is still nowhere near the top-end.
2. Pentium Dual Core PC: An E5200 2.5GHz dual-core CPU, based on Core 2 architecture and purchased with my own funds for an astonishingly low GBP 52. It's actually been discontinued now in favour of the 2.6GHz E5300, which has the same ultra-low price-point. I also fitted 2GB of RAM (another £25). At current exchange rates that would give £44 left over for a new motherboard if you needed one (or a faster CPU), up against the cost of a CodecSys licence. The same settings as used in the Core i7 video took 478 seconds, but this video, with a pared back encoding profile was complete in 208 seconds.
3. CodecSys CE-10: The PS3 was connected to the i7 workstation (the importance of which will become evident later) and averaged an astonishing 35-40FPS encoding speed, with the video being complete in an eye-opening 52 seconds. Yup, four times faster than the "fast" dual-core Pentium encode. Unfortunately, the result is this pretty sorry looking video - nowhere near the same league as the x264 encodes.
So the conclusion is somewhat disheartening. CodecSys achieves lightning fast results, but they are simply unusable when encoding demanding video at low bitrates. In fact, analysing the stream the CE-10 encoder produced, a number of drawbacks in the encoding become evident, most notably poor motion search, wasted bandwidth on a relatively huge number of intra-blocks, and bad rate control, with bitrate cut-outs severely impacting quality.
While it has an obvious speed advantage, the fact is that you can effectively make x264 work as fast as you want it to if you dial back enough quality settings. Not that there's much point if you're left with a sub-optimal end result. Unfortunately, you cannot ramp up the quality settings on CodecSys. On the stress test video there just aren't enough tweakables to produce a meaningful increase in video quality, and it's simply not good enough compared to the competition.
Its speed aspects are also conditional on the power of the host PC too. I originally ran the CodecSys test on my 2.5GHz Core 2 laptop (separate to the 2.5GHz desktop used above) and managed a poor 8FPS, due to the lossless nature of the HD source. This is because while the PS3 is doing the encoding, any decoding of the source clip needs to be done on the PC, plus in this case, laptop hard disks are significantly slower than their desktop equivalents. So depending on what you're encoding, you'll still need a fairly powerful PC to keep CodecSys CE-10 "fed" with enough data to sustain its awesome speed. Fixstars itself recommends a 1.8GHz dual-core PC with a RAID array.
The more bandwidth you give CodecSys CE-10, the better the results, but the fact is that the same is true for any video encoder, and in my tests with 8Mbps and 16Mbps clips, x264 once again came out with a clear quality advantage. It's worth bearing in mind that this was with one-pass encoding too - there's even more quality to glean from x264 by going for a two-pass approach, which in my tests added between 25 to 30 per cent to the encode time. CodecSys CE-10 allows you to encode at up to 150Mbps, but once you reach those massively high throughput levels, you're not going to be using h264, you'll be using one of the professional industry codecs, like Avid DNxHD, CineForm HD or Apple's ProRes.
What I did find curious was that CodecSys appeared to sustain very similar encoding speeds no matter what kind of changes you made to the settings. It suggests that the encoder is built totally for speed, and backs up my findings that quality is very much a secondary concern. Flexibility is also a big issue too. There's no support for the MP4 format (yet - it's coming in an update, but this is basic stuff), and limited profiles. You'd expect a personal h264 encoder to automatically pump out PSP, AppleTV and iPod encodes for example, but CE-10 doesn't even contain an inbuilt scaler to resize the image. It's another feature coming later.
All of which leads to the inescapable conclusion that in the here and now you're better off upgrading your PC than tapping into the power of the Cell you might already have installed in your home. The speed is there, but it comes at the cost of too much of a hit in quality. Now, it is true that my test case here isn't typical video. By its very nature, it's a stress test. The average movie rip or home camcorder encoding will look a lot better, but the point is that the PC software encoding will do an even better job, and it won't cost you $200. With regards the more expensive package, it's obvious that a professional would be looking for a flexible, quality-based solution if they had a $2,000 budget to play with; CE-10 is missing so much key functionality that it's not worth the asking price.
All of which begs the question: just how useful will the Cell ever be as a co-processor? Folding@Home annihilates the PC in terms of speed, but the GPU clients are significantly faster still. That said, the fact is that in that case, the faster the host platform, the more narrow the focus of the Folding work. The PC multi-core client deals with more complex work units than the PS3, which in turn works on harder tasks than the GPU version. Different architectures, different utilisations, different results. All have their uses but it remains the PC that is the most versatile and ultimately the most useful overall.
Applying that logic to video encoding, the question needs to be asked whether the quality level will ever compete. Is Cell really built for high-end quality-based encoding work? Right now, the jury is out. Looking forward positively, CodecSys CE-10 is an ongoing project. Like x264, it is a software-based encoder, and Fixstars is working on improving the program. In the here and now though, CE-10 is an intriguing experiment and fun to play with for the duration of the 14-day trial period, but is in need of some serious upgrades in order to compete with the more traditional solutions and thus make it worth paying for.