Author:

Paul Preißner
Supervisor:Prof. Gudrun Klinker
Advisor:Sven Liedtke
Submission Date:15.2.2020



Abstract

Virtual reality (VR) and modern low-level graphics APIs, such as Vulkan, are hot topics in the
field of high performance real-time graphics. Especially enterprise VR applications show the
need for fast and highly optimized rendering of complex industrial scenes with very high
object counts. However, solutions often need to be custom-tailored and the use of middleware
is not always an option. Optimizing a Vulkan graphics renderer for high performance VR
applications is a significant task. This thesis researches and presents a number of suitable
optimization approaches. The goal is to integrate them into an existing renderer intended
for enterprise usage, benchmark the respective performance impact in detail and evaluate
those results. This thesis likewise includes all research and development documentation of
the project, an explanation of successes and failures during the project and finally an outlook
on how the findings may be used further.

Results/Implementation/Project Description

Conclusion

[...] the goal of this thesis was to gather a selection of optimization methods specifically
tailored to VR. Of these methods, a subset was to be implemented in an industrial
real-time visualization rendering engine. Finally, these implementations were to be benchmarked
in a high stress scenario to asses the performance of each optimization by itself and
in conjunction with the remaining methods. This goal sprung from the hope to collect useful
information and tangible numbers about ways to speed up VR rendering by a significant
margin.
While the list of presented optimization approaches is not exhaustive or complete, as new or
more advanced methods are constantly being developed in this field, this thesis does in fact
contain an overview of information and elaboration on key avenues. Chapters 3 through 6
cover multiple angles such as GPU versus CPU performance gains, pipeline speedups and
varying hardware architectures and their intricacies. The implemented optimizations, while
promising, did not all pan out as expected. Stencil Masking and Multiview Rendering show
clear and tangible improvements in frametimes and were considered a success. Superfrustum
Culling provided a tradeoff to alleviate stress and provide headroom on the CPU in exchange
for higher GPU workloads and required fitting circumstances to pay off in a target scenario.
Finally, Monoscopic Far-Field Rendering was the most interesting of the concepts presented,
but in the practical implementation delivered incorrect and disappointing results.
In the end, valuable insight for deployment of these four optimizations was gained and even
MFFR still shows promise given additional iteration and care to iron out the observed issues.
It is my hope that the presented approaches and demonstrated results are of similar value
to other efforts in the field. After all, every millisecond shaved off is precious in real-time
graphics.



tl;dr Final presentation: 
Building on top of an existing Vulkan renderer (RTG Echtzeitgraphik GmbH's Tachyon/FTL; thesis as industry collaboration), implemented, analysed and benchmarked four major VR rendering optimizations. The desired result was to gain better insight into the performance impact of each as existing documentation in the field is lackluster. The changes to the renderer necessary for each optimization were implemented and documented for reference. 

The optimizations implemented were: 

  • Multiview Stereo Rendering (MVIEW)
    (per-eye draw command/data submission in a single call instead of one for each eye, reducing CPU time; with hardware acceleration significantly reduces GPU time as well)
  • Hierarchical Superfrustum Culling (SFRUST)
    (contrary to naive stereo frustum culling with one view frustum per eye/viewport, a Superfrustum combines both into one to reduce CPU time at the expense of slightly higher GPU time)
  • Fitted Stencil Masking (SMASK)
    (usage of render mask shaped after the VR HMD's eye-visible screen area to lower fragment stage pixel count and thus GPU time)
  • Monoscopic Far-Field rendering (MFFR)
    (in an effort to exploit negligible stereo separation at large distances, splitting the render effort into two passes; one for objects within the stereo-relevant near field, another for the far field where stereo separation becomes insignificant and objects can be rendered monoscopically with minimal visual accuracy loss; implementation flawed, performance results catastrophic, requires further work)

Using a synthetic test scene with high object count, benchmarked in fixed frame count loops for identical workload per frame, the most important performance results (of all 16 possible permutations) are as shown here: 

Primary lessons from measurements:

  • Multiview is a must-have for high performance, especially when hw-accelerated; no significant downsides except lack of support on older or some mobile GPUs
  • Stencil Masking is also a must-have for high performance; once again no significant downsides, implementation is simple
  • Superfrustum is only viable in conjunction with Multiview as otherwise the CPU savings are mostly offset by increased GPU time
  • Superfrustum plus software-Multiview are ideal candidates when CPU power is scarce and provide significant gains

Secondary lessons from thesis: 

  • Vulkan requires a lot of caution to extract its benefits over more accessible APIs like OpenGL, but is essential for highly optimized and low level customized rendering


[ PDF ] 

[ Slides Kickoff ] 

[ Slides Final ]