Immersive Experience with Spherical Videos
TL;DR RidgeRun is integrating real-time spherical video technology on embedded platforms. From creating accelerated GStreamer plug-ins, to performing camera calibration, color matching, stabilization and low-level optimization, RidgeRun can help you bring full 360 views to your system.
Spherical videos provide a more immersive and interactive experience to viewers. Unlike traditional video where a single perspective is recorded, these are able to capture the full surroundings in a single shot. The viewer can then decide which direction to look at, move it around (while the footage is playing!) and even view multiple perspectives simultaneously. Being able to look around makes you feel like you are there, recording the video yourself. This user experience gives these videos the "spherical" name. You may find them referred to as "360", "immersive" or "surround" video as well. The video below gives you a glimpse of this immersive feeling.
Spherical videos open a plethora of possible applications. You can think of making remote work feel more personal by allowing users to look at their teammates individually. You can add this system to your vehicle so the operators can have complete control of their surroundings. You can control the resulting perspective automatically by a tracking algorithm and simulate a physical camera panning around. You can eliminate blind spots in your security systems. In general, many (if not most) camera powered applications can be enhanced by the magic of surround video.
But how does it work?
We, as humans, cannot view our full surroundings all at once. A simple model that may help you understand how our vision works is to think of our "eye" as a camera in the center of an invisible sphere. When you look in a certain direction, you are looking at a certain latitude and longitude in this sphere. Like a camera, our eye needs a way to map this sphere region (which is 3D) to a plane (which is 2D), such as a photo. This is called a projection.
The projection our eyes use to map the sphere to a plane is the rectilinear projection. Most day-to-day cameras also use this projection to take a photograph. This projection feels natural to us because straight lines in the real world remain straight in the photo. Geometrically, you can construct a rectilinear mapping by placing a plane tangent to the sphere and tracing rays from the camera to it. The point in the sphere that is intersected by the ray tells the pixel which will be placed where the ray intersects the plane. The math, which exceeds the scope of this post, can be deduced from this configuration.
From the picture it can be deduced that the rectilinear projection can only have a limited field of view (FOV). In fact, it is impossible to capture a 180 degree FOV since it would require an infinitely large plane. In practice, you rarely see FOVs above 100 degrees because borders start stretching to a point that is unpleasant to the view.
By now , it is evident that a different projection is needed to capture the full perimeter of the sphere. While there are several techniques, the most used one is the equirectangular projection. Surprisingly, this mapping is easier to understand than the rectilinear, as it is the result of unwrapping the sphere. Even if you don't know it, you've seen this projection in practice before: it is how the earth is represented in a map.
[image taken from Kaidaer Mini Speaker]
There is a 1:1 relationship between the sphere and the equirectangular projection. In the former the longitude ranges from -π to π and the latitude from -π/2 to π/2. In the plane they are mapped from -1 to 1 and -0.5 to 0.5 respectively. No complex math, just a proportional relationship. Given this, you will often hear the term "spherical projection" be used interchangeably.
It is evident that portions of the plane will be stretched. See the north and south poles, for example. These singular points will be stretched to fit the top and bottom plane borders, accordingly. Straight lines will no longer be straight, resulting in a very unnatural view. Ultimately, this is the price to pay in order to be able to capture the full sphere.
So what does a camera capable of recording a 360 view look like? The most common approach is to use a combination of several wide lens captures, such as the fisheye lens. The term fisheye refers to yet another projection capable of mapping wide angles to a plane. While there are several subtypes of fisheye projections, by far the most common is one known as equidistant (from now referred to simply as fisheye).
Similar to the rectilinear projection, the fisheye mapping can be modeled with the camera at the center of an invisible sphere and a tangent plane. The intersection of the ray from the camera and the sphere determines the pixel value. This pixel will be placed in the plane, but unlike in the rectilinear case, you don't use the ray-plane intersection as the destination, but the arc length measured from the plane tangent point and the ray-sphere intersection.
Stitching it all together
Finally we can combine all the pieces together. For the sake of the example, let's imagine we are using three 190 fisheye cameras pointing in opposite directions. This is enough to map the full 360 view with some overlap between them.
The first step is to convert each fisheye image to an equirectangular representation. Since the equirectangular projection maps the whole sphere while the fisheye doesn't, there will be missing portions in the resulting image. Furthermore, since the other fisheyes are looking in different directions, they will map different portions of the sphere. Once all fisheye images have been mapped to their equirectangular projection, you can stitch them together to get the full sphere.
Finally, you can take a portion of the equirectangular image and map it using the equirectangular projection. Remember that it needs to be only a portion because, unlike the equirectangular, this projection has a limited FOV. The interesting part is that you can change the region of the original image you are mapping, or even have several different regions for several simultaneous viewers. More technically, the users can specify the longitude and latitude at which they wish to see.
How can RidgeRun help?
This post is a distilled explanation of the actual math and processing under the hood. In the practice, there are a lot of details you need to take into account. To mention a few:
Individual camera calibration to account for manufacturing and mountage variances.
Image statistics matching between different cameras.
Camera capture synchronization.
Real time performance, i.e: 4K@60fps.
Contact us to learn more about RidgeRun can help you provide your users a more immersive and interactive experience!