KotaruGupta

Introduction

Visual navigation systems track the position and orientation of a moving camera by using a series of images captured by the camera. They have important applications in navigation, 3D object recognition and virtual/augmented reality [6]. Initial visual navigation systems required dedicated external markers like infrared light-emitting diodes [4]. Widespread adaptation of such systems was an expensive proposition because it involved installing precisely machined ceiling panels. Current visual inertial navigation systems [1, 2] achieve similar accuracy without requiring any external dedicated infrastructure. These systems use Inertial Measurement Unit (IMU) sensors which are rigidly attached to the camera along with a camera to track the camera. These systems use features available in natural images and do not need any dedicated external markers. With visual inertial navigation systems making way into our daily lives with navigation and augmented reality applications, it is important to understand the performance of these systems in typical indoor environments.

Background

Visual-inertial navigation systems have demonstrated error of less than 0.5 percent of the trajectory length in estimated the trajectory traced by a camera [3]. These systems have even been adopted by commercial systems like Google Tango [8] and Microsoft Hololens [7]. However, literature cites problems with reflective and texture-less objects, in dim light, and with repeated patterns [5]. Unlike previous literature [1], we want to evaluate the performance of the system and quantify the performance degradation specifically in these failure scenarios.

Methods

We used Hololens, a commercially available system that employs visual inertial motion tracking. Hololens' visual inertial system uses four 'environment-understanding' cameras with one depth sensor and one RGB camera [7]. We conduct experiments by changing the input image stimulus to the visual-inertial system and we want to test how the accuracy of the Hololens' position tracking changes as a function of the input image stimulus. Specifically, we place different images in front of the Hololens to change the input image stimulus for its cameras and the position tracking accuracy is measured as the absolute difference between ground truth distance traversed by the Hololens and the distance estimated by the visual-inertial system. The Hololens is rigidly mounted on a mechanical stage as shown in Figure below.

Experiment setup with Hololens rigidly attached to mechanical stage

The mechanical stage along with the Hololens is moved by 2 mm and this is our estimate of ground truth distance. The least count of the mechanical stage is 0.05 mm; so the ground truth distance is accurate to within 0.05 mm. We developed a Hololens application that logs the position data reported by Hololens. We start the application at the start of each experiment and stop the application at the end. We subtract the position reported at the end of the experiment and the position reported at the start of the experiment to obtain the distance estimated by Hololens. We take care to average multiple values at the start and the end to reduce the error in estimating accuracy due to measurement noise.

In addition, we also provide the measurement noise as a function of the input image stimulus. Measurement noise is important for augmented reality and virtual reality applications because a small millimeter-level jitter in position tracking when the user is stationary can cause visual discomfort for users. We estimate the measurement noise by using standard deviation of measurements when the Hololens is stationary. We detail each of the experiments and the input image stimulus used in the following section.

Experimental setup with Hololens for different experiments

Results

In this section we report the results of the experiments carried out in three different settings:

Different physical environments.
Varying Spatial Frequency of the input image.
Varying texture of input image.

In all three cases we conduct two set of experiments as described in Methods section above.

We keep the Hololens stationary and measure variation in location reported by the Hololens.
We use a mechanical stage to displace the Hololens by 2 mm. We then calculate the difference between the initial and final position as reported by Hololens.

Different Physical Environment

We conducted the experiments in four different settings.

Lab Environment
Repeated Pattern
Texture less surface
Glass surface

Standard Deviation in position values for different Physical Environments

Relative Error for 2 mm Displacement for different Physical Environment

We can see from the results above that texture less surface is clearly a failure mode for the Hololens. Though it is a bit surprising that Hololens performs marginally better in case of repeated pattern and glass environment than lab environment.

Varying Spatial Frequency

We generate images with varying spatial frequency. Our goal is to create an image containing a single frequency component as much as possible. We choose a cosine signal with a low frequency cos(x). Our corresponding two dimensional function will be: $f(x,y)=cos(kx)*cos(ky)$ We then define a matrix and store the values for different pairs to generate the images. The code for image generation can be found in the appendix section. The images are printed on a square sheet of side 0.7 m and attached to a wall at a distance of 1.2 m from Hololens.

Standard Deviation in position values for varying Spatial Frequency

Relative Error for 2 mm Displacement for varying Spatial Frequency

We can see from the graph below that in case of both high and low spatial frequency the relative error is very high. This can be attributed to the fact that low frequency images correspond to texture less surface and high frequency surface corresponds to repeated pattern both of which seem to be failure scenarios for Hololens.

Plot of Relative Error vs Spatial Frequency

Varying Texture of Input Images

Input Images for Varying Texture Experiments

For this experiment we start with an image of random black dots on white background. We repeatedly smoothen the image using a Gaussian filter to get varying textures. The images are printed on a square sheet of side 0.7 m and attached to a wall at a distance of 1.2 m from Hololens.

Standard Deviation in position values for different Textures

Relative Error for 2 mm Displacement for different Textures

We expected the relative error to increase as we increased the blurring. However, we notice that in case of blurring 10,000 times the relative error is lower as compared to blurring 100 times. As we repeatedly apply blurring the contrast of the image is also changing which might have had a counter effect to decrease the relative error.

Conclusions

Experiments with input images of varying spatial frequency demonstrate that the accuracy of the system greatly degrades when there is no spatial frequency or very high spatial frequency in the input image stimulus. We initially conducted experiments by placing Hololens at 21 cm from the wall. The errors in estimating the distance traversed by the Hololens was greater than 25 percent for all the experiments irrespective of the image stimulus. This demonstrates that Hololens visual navigation system performs poorly when the objects are very near to it.

For our experiments of varying texture, in addition to smoothing the image, we are also reducing the contrast and the brightness of the image making them as other independent variables that are varying along with texture. Another limitation of our experiments is while the visual-inertial system obtains both image and depth data as input, we vary only the input image stimulus. An interesting extension would be to vary the depth input provided to the system in a controlled fashion. Also, to the best of our knowledge, current development version of Hololens does not allow us to consider the performance of the system with inertial sensors alone or with reduced number of cameras. An interesting extension would be to analyze the contribution of each of the components of the visual-inertial system in the system's accuracy.

References

[1] J. Engel, V. Koltun, and D. Cremers. Direct sparse odometry. arXiv preprint arXiv:1607.02565, 2016.

[2] J. A. Hesch, D. G. Kottas, S. L. Bowman, and S. I. Roumeliotis. Consistency analysis and improvement of vision-aided inertial navigation. IEEE Transactions on Robotics, 30(1):158–176, 2014.

[3] C. Cadena, L. Carlone, H. Carrillo, Y. Latif, D. Scaramuzza, J. Neira, I. D. Reid, and J. J. Leonard. Past, present, and future of simultaneous localization and mapping: Towards the robust-perception age. arXiv preprint arXiv:1606.05830, 2016.

[4] G. Welch, G. Bishop, L. Vicci, S. Brumback, K. Keller, et al. The hiball tracker: High-performance wide-area tracking for virtual and augmented environments. In Proceedings of the ACM symposium on Virtual reality software and technology, pages 1–ff. ACM, 1999.

[5] A. Yates and J. Selan. Positional tracking systems and methods, May 2016.

[6] G. Welch and E. Foxlin. Motion tracking survey. 2002.

[7] https://developer.microsoft.com/en-us/windows/holographic/hardware_details

[8] https://get.google.com/tango/

Appendices

Appendix I

Most of the code for Hololens app development and setup was done using the following tutorial https://developer.microsoft.com/en-us/windows/holographic/holograms_101e. The Hololens app is too big to upload and can be made available on request.
Matlab code for image generation and code to process the data and image can be found here : File:SupplementaryCode.zip

Follow the instructions in the tutorial to deploy the app on Hololens. After that you can see the positions being logged using https://developer.microsoft.com/en-us/windows/holographic/using_the_windows_device_portal. Run the python code on the saved logs to get standard deviation and relative error.

Appendix II

Equal contribution by both the authors in developing the method, executing experiments and analyzing results.

KotaruGupta

Contents

Introduction

Background

Methods