Author:

Min-Shan Luong
Supervisor:Prof. Gudrun Klinker
Advisor:Linda Rudolph, MSc
Submission Date:15.03.2020

Abstract


In this thesis, we propose an approach to track the pose (6-DoF) of a pen by attaching
an object marker specifically designed for this purpose. Our work includes designing
an optimal object marker, as well as a 6-DoF tracking system for this object with one
stationary webcam. For the design of the object marker, we focused on textured planar
models. After calibrating the camera, the object to be tracked is registered in order
to use these model information as input for the tracking algorithm. Furthermore, 3D
scans of the objects are created to retrieve a digital 3D representation which is used
for object registration. Tracking quality is then evaluated by testing the accuracy and
speed.
The aforementioned work is part of a more extensive project, with the goal to interact
with objects in the augmented environment intuitively by adding extra functionality
to an ordinary pen. This pen can then be used similarly to a laser pointer. Instead of a
stationary camera, this camera is attached to the user for high mobility while capturing
visual input for the tracking.

Implementation


[ Overview of the 6-DoF tracking algorithm]


[ Detailed flowchart diagram of the 6-DoF tracking algorithm ]

Flowchart diagram of the 6-DoF tracking algorithm


[ Various designs of object markers with our optimal marker on the very left ]


The future application should consist of a camera attached to the user and an object
marker that is attached to the end of a pen, so that it can function as an augmented
reality raycasting interactable in a similar way to a laser pointer. Generally speaking,
it is possible to detect and track any arbitrary object. However, for previously stated
purposes, some objects with certain shape and texture may work better than others.
Therefore, we tested various designs for our object marker: a pyramidal and a
diamond-shaped 3D marker in combination with differently coarse texture.
For each prototype, we modeled the object so that, while it is in use, there would be a
relatively large area of the object marker visible to the camera that is attached to the user.
Furthermore, we added a slot to this object in such a way that it can be plugged onto a pencil.

Before printing, the 3D models should be checked for non-manifold geometry which is
a common cause for issues during 3D printing. Finally, we pasted the object marker up
with printed texture that is supposed to be good to track.

Results


[ Demo video ]

Demo Video.mp4

Conclusion


In this thesis, we presented an approach to track a textured object marker with 6-DoF
using a single RGB camera. We designed two object markers (4-sided and 8-sided
pyramid) and tested different textures in order to get better detection and tracking
results. Pose estimation issues due to coplanarity can be reduced by adding more
faces to the object marker. The object marker is shaped in such a way that mainly
one side is visible to the camera since the overall idea is aimed to transform a pen
into an interaction device with a camera attached to the user. Moreover, its texture is
designed to offer sufficient distinctive feature points necessary for feature matching.
These feature points are supposed to be clearly recognizable within a preferably wide
spacial range during application up to one arm length away from the camera.
To register our object marker, we use synthetic data obtained through its 3D scan.
The synthetic data is created by rendering the 3D scan in Unity from various poses. This
facilitates the object registration procedure as the pose of the rendered 3D scan is known
in Unity. While one fixed camera provides input images, our object tracking algorithm
estimates the camera pose during the detection stage, and subsequently tracks the
textured object in 2D using optical flow. In case of successful pose estimation, the
object mesh is rendered onto the screen during tracking to visualize the estimated pose.
To obtain keypoints necessary for pose estimation, our detection algorithm performs
feature matching between features detected in a live camera feed and features from the
registered object model. Furthermore, we implemented tests to identify likely wrong
poses during the optical flow tracking. Implausible pose changes (e.g. flipped axes)
between two consecutive frames are detected by our filter so that appropriate measures
can be taken. Moreover, our optical flow tracking runs at up to 50 FPS on a modern
laptop CPU (Intel i5 9300H) which enables smooth usage in real-time.



[ Thesis ] 

[ Slides Kickoff ] 

[ Slides Final ]