Tech Interview: Metro 2033
Oles Shishkovstov on engine development, platform strengths and 4A's design philosophy.
You can calculate it like this: each 360 CPU core is approximately a quarter of the same-frequency Nehalem (i7) core. Add in approximately 1.5 times better performance because of the second, shared thread for 360 and around 1.3 times for Nehalem, multiply by three cores and you get around 70 to 85 per cent of a single modern CPU core on generic (but multi-threaded) code.
Bear in mind though that the above calculation will not work in the case where the code is properly vectorised. In that case 360 can actually exceed PC on a per-thread per-clock basis. So, is it enough? Nope, there is no CPU in the world that is enough for games!
The 360 GPU is a different beast. Compared to today's high-end hardware it is 5-10 times slower depending on what you do. But performance of hardware is only one side of equation. Because we as programmers can optimise for the specific GPU we can reach nearly 100 per cent utilisation of all the sub-units. That's just not possible on a PC.
In addition to this we can do dirty MSAA tricks, like treating some surfaces as multi-sampled (for example hi-stencil masking the light-influence does that), or rendering multi-sampled shadow maps, and then sampling correct sub-pixel values because we know exactly what pattern and what positions sub-samples have, etc. So, it's not directly comparable.
Yes and no. When you have more performance on the table, you can either do nothing as you say, and as most direct console ports do, or you add the features. Because our platforms got equal attention, we took the second route.
Naturally most of the features are graphics related, but not all. The internal PhysX tick-rate was doubled on PC resulting in more precise collision detection and joint behavior. We "render" almost twice the number of sounds (all with wave-tracing) compared to consoles. That's just a few examples, so that you can see that not only graphics gets a boost. On the graphics side, here's a partial list:
- Most of the textures are 2048^2 (consoles use 1024^2).
- The shadow-map resolution is up to 9.43 Mpix.
- The shadow filtering is much, much better.
- The parallax mapping is enabled on all surfaces, some with occlusion-mapping (optional).
- We've utilised a lot of "true" volumetric stuff, which is very important in dusty environments.
- From DX10 upwards we use correct "local motion blur", sometimes called "object blur".
- The light-material response is nearly "physically-correct" on the PC on higher quality presets.
- The ambient occlusion is greatly improved (especially on higher-quality presets).
- Sub-surface scattering makes a lot of difference on human faces, hands, etc.
- The geometric detail is somewhat better, because of different LOD selection, not even counting DX11 tessellation.
- We are considering enabling global illumination (as an option) which really enhances the lighting model. However, that comes with some performance hit, because of literally tens of thousands of secondary light sources.
Great! It's simply great. Although the API is still awkward from pure C++ design perspective, the functionality is there. I really enjoy three things: compute shaders, tessellation shaders and draw/create contexts separation.
The major thing that can up the performance is the compute shaders. Today, games spend the majority of the frame doing the various kinds of post-processing. The easy route to extract some performance is to rewrite that post-processing via compute. Even the simple blurs can be almost twice as fast. For example we've rewritten our depth-of-field code, to greatly enhance quality while still maintaining playable frame-rate.
Although we do not use tessellation on the Xbox 360, we use it when running on DX11 hardware. Specifically, all the "organic" things like humans are tessellated, and monsters use real displacement mapping, to greatly enhance visuals.
The core advantage is simply the performance. The CPUs just aren't there to enable large-scale physical effects (although they are very competitive when processing traditional rigid-body things). However, when you offload costly PhysX processing to GPU, we've got less GPU time for rendering.
It's a difficult question when choosing what hardware will provide the best experience. I'd say that dedicating another (maybe less powerful) GPU specifically for PhysX is the right thing to do!
We do not add PhysX effects if they aren't integral to the gameplay experience. We don't add an effect for the sake of an effect. Human eyes and brain are trained to see the inconsistencies. We are only trying to remove those inconsistencies in order to not distract from gameplay and not to lose that immersion we were heavily building brick by brick.
That's easy. PhysX SDK has the similar notion of the "task" as we use. The SDK spawns them for every operation which can be safely parallelised, for example each rigid-body shape-shape collision detection, each cloth or fluid update, even the solver(s) is heavily sub-divided into the tasks.
We forward those tasks to our task-scheduler and they are processed in the same manner as everything else. The only "conceptual" difference is between their and our task-model - we "spawn-and-forget" tasks and PhysX uses a "spawn-and-wait" model.
Yes, we're investigating this scenario. Please wait to hear more about it.
Well, the majority of our Metro 2033 game runs at 40 to 50 frames per second if we disable vertical synchronisation on 360. The majority of the levels have more than 100MB heap space left unused. That means we under-utilised the hardware a bit...
Oles Shishkovstov is chief technical officer at 4A Games.