Mahmoud is showing the results to Samuel at Modeso’s office in Alexandria, Egypt
Tango is a technology platform developed and authored by Google that uses computer vision to enable mobile devices, such as smartphones and tablets, to detect their position relative to the world around them without using GPS or other external signals. This allows application developers to create user experiences that include indoor navigation, 3D mapping, physical space measurement, environmental recognition, augmented reality, and windows into a virtual world.
Four devices are supporting the Tango technology at the moment we’ve written this article:
The Idea behind working on a showcase using Tango was to learn more about this smart and powerful device.
The showcase is consisting of two requirements:
The Tango project consists of three core technologies:
Our target was required to work with both, area learning and depth perception. Area learning is very important for learning the surrounding environment and it depends on the depth perception. So, you cannot enable the area learning in an application without enabling the depth perception.
Place a virtual Modeso logo on a real-world surface (e.g. wall) by touching the screen. While the device camera is pointing to some flat surface the virtual Modeso logo should be placed on this targeted surface.
We could simplify it through the following approach:
To make things easier while working with OpenGL we were using the Rajawali library. Google provided examples achieving the same concept with some extra functions and features.
From the first requirement we already know the depth on the logo in the 3d world. With the help of depth perception and the provided cloud points we can filter these points and get the subset of points with a depth value less than the depth of the logo putting in mind the quaternion of the logo.
Using the Tango update listener we can use the callback onXyzIjAvailable. This callback is invoked when new cloud point data gets available from Tango. Important to know is, that this callback is not running on the main thread.
Google issued a warning about these callbacks. Meaning you have to be very restrictive while working inside the callback, because you won’t receive new cloud data until you have returned from the callback.
So, for example if you are working on heavy stuff inside the callback it will affect the performance of your readings.
Every time we receive new cloud points from the Tango we filter these points depending on the the depth of the logo and update the Rajawali renderer with the new data.
The second step is to show a real object in front of the model to get the occlusion of real-world objects in front of the logo.
In order to make this possible the following three steps are required:
We used the Rajawali method calculateModelMatrix from the ATransformable3D class which takes the matrix of the current point cloud matrix. With the result points we could add a mask to it.
We were lucky to stumble upon https://github.com/stetro/project-Tango-poc which was a great reference and gave us the right idea on how to implement the masking.
Here we will go a little bit deeper into the different approaches used and proposed for achieving the target goal.
This is the technique used by the repo at https://github.com/stetro/project-Tango-poc and it is based upon depth mapping. In short a depth map is a regular image (occasionally a grey scaled image) that contains information related to the distance of the surfaces of scene objects in the image from a specific viewpoint/camera. The color of each pixel of the depth image represents the distance (depth) of this pixel in real world from the camera e.g. in gray scale depth images dark areas represent closer points to the camera while lighter areas represent further points (or the reverse according to how you encoded the image)
How will depth maps help us achieving the goal?
If we have a depth map for our camera view i.e. if we know the depth of each pixel of the camera view we can perform what is called the “Depth Test” or “Z-Buffering” in Computer Graphics, in this algorithm, while rendering your 3D content, the hardware compares the value of each pixel of the rendered 3d content to the corresponding pixel in the depth map and decides whether it should draw this pixel or not according to whether it is occluded i.e, there is another object at the same pixel closer to the camera, or not. Basically that’s the idea.
How this is done in code?
The first step is that we need to
compute the depth map of our view using Tango color camera to achieve that we have
1 Initialize the Tango service as documented using the C API
2 Configure the Tango device to use the color camera:
3 Configure the Tango device to use depth camera:
4 Configure the Tango device to connect local callbacks for different feeds from the device:
5 In the “OnFrameAvailableRouter” callback, which is called when a new camera frame arrives and the frame image buffer is passed to it, we construct our depth image from the camera image:
But what are we doing inside this callback exactly?
First we are converting the received image buffer (TangoImageBuffer) format from YUV color space (default) to RGB color format. This is done using the YUV2RGB function for each pixel.
Next we construct an OpenCV Matrix (image) over the resulting RGB image from the first step and create a gray scaled version of it.
After that we apply a GuidedFilter to the OpenCV image. Guided filter is an edge-preserving smoothing filter. See here for more. In effect this filtering smoothes the occluded parts of the 3d object so it looks more real. The resulting filtered gray scaled image is used as our Depth map.
Based on this depth map we will try to construct a 3D representation of the camera view. We now have the z coordinate of each pixel in the camera view and need to know the other two coordinates (X, Y). Here Tango supports us providing an equation to translate 2d coordinates to 3d ones and vice versa using the camera intrinsics.
Given a 3D point (X, Y, Z) in camera
coordinates, the corresponding pixel coordinates (x, y) are:
x = X / Z * fx * rd / ru + cx
y = Y / Z * fy * rd / ru + cy
After solving the previous equation for X and Y we now have the three components X,Y and Z. So now we can construct a 3D points version of the camera’s real view. For that we construct a vector of vertices and fill it with the data as last required step.
With that all in place we can perform the “Depth Test” in OpenGl as we have the 3d model to render and a 3d representation of the camera view. To render parts closer to camera and occlude the further ones we only need to check them against each other:
This is what you will get:
The results were not as accurate as expected to proceed with the targeted scenario.
We have noticed that the Tango is heating up very heavy while using it for more than five minutes. Google is mentioning this problem already. Area learning is using heavy processing power causing the processor to heat up and to protect the processor the device reduces the processor speed which has a negative effect on the readings and the data produced by the Tango sensors in general.
Furthermore, the object detection is very imprecise, the camera is not yet good enough to produce good results from image processing. Even with a better camera the image processing would be very expensive in processing power and would impact overall performance.
On the other hand it’s very good with augmented reality in general and placing objects on walls or floors without the need for markers. In direct comparison with vuforia and it’s markers it’s the winner without questioning.
But for real world real time object occlusion it seems to still miss needed functionality.
Credits: Modeso’s Mobile Engineers Belal Mohamed, Mahmoud Abd El Fattah Galal & Mahmoud Galal.