Author:

Simon Leber
Supervisor:Prof. Gudrun Klinker
Advisor:Benedikt Biallowons B.Sc.; Linda Rudolph M.Sc.
Submission Date:15.05.2019

Abstract

In an effort to reduce the cost of maintenance services, an augmented video guidance
system can help customers to solve issues with the help of remote technicians. The
developed and evaluated system gives customers the ability to live stream video of the
object in question (coffee machine, minor car maintenance/repair, fuse box etc.) with
their personal handset device. The technician has the possibility to place virtual objects
in the video stream to provide guidance additional to standard voice guidance. The
system architecture relies solely on well approved and/or standardized web technolo-
gies such as HTML5/JavaScript, WebRTC and OpenCV for object tracking. To be able
to use standard web technologies without building a complete media server to process
(decode, alter, encode) the system relies on a simple WebRTC videostream combined
with a server supported WebSocket channel or serverless WebRTC DataChannel used
to bidirectionally submit information about the augmentation. In this work I show
that the implementation of such a system is possible using standard webtechnologies
only. I also show that at the time, a completely serverless solution is not feasible. NAT
traversal and signaling to overcome firewalls and other network restrictions can be
challenging without a server. Other findings are that today’s handsets provide enough
processing power to implement expensive tasks such as real-time object tracking in a
browser.
Keywords Remote guidance, collaborative augmented reality, openCV, JavaScript,
WebRTC, real-time object tracking, object tracking, remote technical guidance, comput-
ervisionResults/Implementation/Project Description

Conclusion

Implementing an AR remote guidance system using standard web technologies comes
with several benefits and challenges. Benefits include instant availability to all users
that own a smartphone or tablet with a web browser installed. There is no need to
install an application for a task that is likely not performed regularly.
Based on new technologies for the web such as WebRTC for real-time communication
and WebAssembly for efficient code execution, I introduced a feasible architecture made
of three main components. A front end application in JavaScript with a customized
openCV module compiled to WebAssembly or asm.js to provide advanced visual
tracking algorithms to the web. A back end server application that features two REST
endpoints to establish and manage sessions and provide communication between peers.
And a STUN/TURN server to help with WebRTC connection establishment and NAT
traversal. By implementing a functional prototype that was successfully tested on
different platforms with good to acceptable performance I showed that it is feasible
to implement such a system although limitations exist. Depending on the hardware
platform performance of the available tracking algorithms in openCV ranged from more
than sufficient (>15FPS) to not sufficient (<15FPS). Compatibility with all major browser
has been shown with the exception of Microsoft Edge which does not implement
the WebRTC APIs correctly and was therefor found incompatible with the developed
application. During performance tests on the various platforms one of the findings was
that not all systems perform equally well under the same configurations. For example
Apples Safari web browser delivered the best object tracking performance with the
combination of openCV.js compiled to asm.js (in contrast to the generally twice as fast
WebAssembly target) and MOSSE tracking algorithm. This together with other results
leads to the conclusion that to achieve the best possible experience platform specific
code paths should be used. Compared to a native app approach this architecture lacks
behind in tracking performance and AR capabilities. The gap in available toolkits for
native apps and web apps should narrow in the coming years due to the effort of
bringing AR to the web by big companies like Google.

[ PDF (optional) ] 

[ Slides Kickoff/Final (optional)]