Computer Vision

Computer Vision (CV) is a field of study that seeks to develop techniques in order to help computers understand and react to the content present in digital images and videos.

What is Computer Vision?

The goal of Computer Vision is to use the observed image data to infer something about the world by developing methods that attempt to reproduce the capability of human vision. The understanding of digital images content may involve extracting information from the image, given by an object, a text description, a three-dimensional model and so on. This problem seems to be simply solved by people, even by very young children. However, it represents an unsolved computing problem based on the limited understanding of biological vision and the complexity of vision perception in a dynamic and nearly infinitely varying physical world.

Computer Vision is a multidisciplinary field that could be considered as a subfield of Artificial Intelligence. It may even use Machine Learning and Deep Learning techniques which may involved the use of specialized methods and make use of general learning algorithms.

Some high-level problems where we have seen success with computer vision are shown here.

Optical Character Recognition (OCR)

Inspecting Machines

Machine inspection

3D model building (photogrammetry)

Medical Imaging

Medical imaging

Match moving

Automotive safety

Motion capture

Surveillance

Fingerprint scanning

fingerprint scanning

Deep Learning and Computer Vision

[1] Liu, L., Ouyang, W., Wang, X., Fieguth, P., Chen, J., Liu, X., & Pietikäinen, M. (2020). Deep learning for generic object detection: A surveyInternational Journal of Computer Vision, 128(2), 261–318.

RidgeRun Deep Learning (DL) is a subset of Machine Learning techniques originally inspired by the human brain. DL is used in the domain of digital image processing tot solve difficult problems such as image colourization, classification, segmentation and detection. DL methods such as Convolutional Neural Networks (CNNS) mostly improve prediction performance using big data and high amount computing resources and have pushed the limits of what could be possible.

Fast growth in DL and improvements in device capabilities including power consumption, memory, processing, image sensor resolution, and optics have improved the performance and cost-effectiveness of further quickened the spread of vision-based applications.

Compared to traditional Computer Vision (CV) techniques, DL enables to achieve greater accuracy in tasks such as image classification, semantic segmentation, object detection and Simultaneous Localization and Mapping (SLAM). Since neural networks used in DL are trained rather than programmed, applications using this approach often require less expert analysis and fine-tuning and exploit the tremendous amount of video data available in today’s systems. DL also provides superior flexibility because CNN models and frameworks can be re-trained using a custom dataset for any use case, contrary to CV algorithms, which tend to be more domain-specific.

The image at the left was taken from [1] and shows us the improvement that Computer Vision accomplished in terms of precision after using Deep Learning techniques starting in 2012.

Computer Vision Experts

In recent years, RidgeRun has developed a set of Computer Vision projects on different application domains. This a brief of some projects that RidgeRun are currently performing in different areas such as Camera Calibration, Video Stabilization, Motion Detection and others.

Lens Distortion Correction

Nowadays, most modern camera sensors, regardless of current technology precision, have geometric manufacturing defects. Although these defects typically go unnoticed by the naked eye, many computer Vision algorithms require them to be corrected to obtain appropriate results.

RidgeRun offers camera calibration and lens correction solutions for a variety of embedded platforms. These are designed to use the available hardware resources in order to meet real-time performance requirements on resource constrained devices.

FPGA Image Processing

RidgeRun developed an Image Signal Processor (ISP) fully on FPGA. It offers video processing accelerators commonly found in integrated ISPs, such as demosaicing, histogram equalization, auto white balancing, color space conversion and geometric transformations. It is also possible to integrate your own acclerators to FPGA ISP, which allows you to connect FPGA ISP directly to your camera, preprocess the image and send the final result to your CPU, reducing the transmission overhead and receiving an image ready to use.

FPGA ISP is powered by Xilinx High-Level Synthesis, a powerful framework which enables us to implment complex image processing solutions faster than Verilog or VHDL. Moreover, it makes it easier to adapt FPGA ISP to your needs, leading to less time to market and exploiting the potential of FPGAs.

For more information, please visit: https://developer.ridgerun.com/wiki/index.php?title=FPGA_Image_Signal_Processor/Introduction/Overview

Bird’s Eye View

Bird's Eye View is an algorithm that creates a top-down view of the scene based on several input frontal views. To do so, it performs a perspective transformation called Inverse Perspective Mapping (IPM). IPM takes the frontal view, applies a homography and creates a top-down view of captured scene by mapping pixels to a 2D frame (Bird's eye view).

This algorithm is pretended to help the drivers on perspective effects given by on-vehicle cameras. These can cause a misunderstanding of the real distance between the vehicle and more objects. To obtain the output bird's eye view image, the algorithm uses a projection matrix to map the relationship between a pixel of the bird's eye view image and a pixel from the input image. The IPM transformation works great in the immediate proximity of the car, assuming the road surface is planar.

For more information please visit:
https://developer.ridgerun.com/wiki/index.php?title=Birds_Eye_View/Introduction/Research

Video Stabilization

RidgeRun is aware of the importance and quality of digital imaging tools such as video stabilization. Video stabilization refers to video quality improvements by removing unwanted camera shakes and jitters due to hand jiggling and unintentional camera panning.

Video stabilization is especially useful for real time applications. Based on VisionWorks, OpenVX and GstNvStabilize makes use of CUDA to leverage the GPU in order to accelerate the procssing. It works mainly by Harris Feature detector and sparse pyramidal optical flow method to estimate motion in frames.

For more information please visit: https://developer.ridgerun.com/wiki/index.php?title=GStreamer_Video_Stabilizer_for_NVIDIA_Jetson_Boards

Motion Detection

RidgeRun has developed a Motion Detection GStreamer element that is able to detect motion from an incoming video image. The element implements the approximate median method for background subtraction algorithm with adapting background.

The Motion Detection GStreamer element generates a start and stop motion signal when it detects movement and when it stops respectively. There is an option that allows the video frame data to be modified by making the movement trail visible, so a kind of movement wave can be seen in the displayed video.

For more information, please visit: https://www.ridgerun.com/gstreamer-motion-detection

GstDispTEC

Another project in the Motion Detection area that RidgeRun has developed is GstDispTEC. This is a GStreamer plug-in that integrates the DispTEC library's algorithms, making it possible to incorporate their functionality into GStreamer pipelines, such as motion detection in a video sequence, multi-objective object tracking, and static gesture recognition. GstDispTEC has the ability to show where the object is located on a frame, even with non-stationary cameras.

GstDispTEC provides functionalities to add several DispTEC-based elements in the same pipeline to strengthen image analysis, connect your own applications to the pipeline for receiving the analysis data and much more. The algorithms can be run either on CPU or GPU, making the most of the computational resources and reducing the processing time.

For more information, please visit: https://developer.ridgeru.com/wiki/index.php?title=GStreamer_DispTEC-Plugin

GstCUDA

GstCUDA is a RidgeRun developed GStreamer plug-in enabling easy CUDA algorithm integration into GStreamer pipelines. GstCUDA offers a framework that allows users to develop custom GStreamer elements that execute any CUDA algorithm. The GstCUDA framework is a series of base classes abstracting the complexity of both CUDA and GStreamer.

GstCUDA offers a GStreamer plugin that contains a set of elements that are ideal for GStreamer/CUDA quick prototyping. Those elements consist in a set of filters with different input/output pads combinations that are run-time loadabile with an external custom CUDA library that contains the algorithm to be executed on the GPU on each video fram that passes through the pipeline. With GstCUDA, developers avoid writing elements from scratch, allowing them to focus on the algorithm logic, thus accelerating time to market.

For more information, please visit: https://www.ridgerun.com/gstcuda