Applications and Algorithms for Computer Vision
Updated: Nov 6, 2020
Applications for Computer Vision
Object recognition or classification allows the recognition of one or more objects, or classes of objects that have been specified or learned. It is usually performed together with their 2D positions in an image or 3D poses in a scene.
Object recognition involves identification, where a single instance of an object is recognized. It also involves detection, where image data is scanned to detect a specific condition. Object classification can also be related to getting multiple classification instances of the same object, as a multi-label problem. Object classification problems might also require getting information about the meaning of objects in an image, as a semantic problem.
Object Classification Examples
Examples of object classification include identifying the face or fingerprint of a specific person, identifying handwritten digits, or identifying a specific vehicle brand.
Object localization refers to identifying the location of one or more objects in an image and drawing a bounding box around each object.
Object Localization Examples
Objects are usually located in an image by using regression, where numbers are returned instead of classes. Such numbers are related to bounding boxes and the system is trained with an image and ground truth bounding boxes, trying to reduce the distance between the predicted bounding boxes and the ground truth.
A perfect example of object localization is used in self-driven car applications, where information about the location of cars, pedestrians, animals, road signs, and barriers is fundamental.
Object detection combines the localization and classification of one or more objects in an image.
Object Detection Examples
The localization of elements in the road might not be enough in a Computer Vision application for self-driving cars. It might be also required to get their classification as an object detection process, getting information about the signs, pedestrians, traffic, barriers and more, in order to make right decisions.
More examples of object detection include the detection of possible abnormal cells or tissues in medical images, or the detection of a vehicle in an automatic toll system. Object detection based on simple and fast calculations is sometimes used to find smaller regions of interest on image data that can be further analyzed by using more computationally demanding techniques to produce a correct interpretation.
Face recognition is the problem of identifying and verifying people in a photograph by their face. In face recognition, techniques are used to detect faces of individuals whose images are present in a data set. The process of recognizing human faces is considered as a process comprising detection, alignment, feature extraction, and a recognition task.
Face recognition can be achieved by using deep learning methods, which are able to leverage large datasets of faces and learn rich and compact representations of faces, outperforming the face recognition capabilities of humans.
Face Recognition Examples
Face recognition techniques are gradually being applied to more industries, disrupting design, manufacturing, construction, law enforcement, security and healthcare. Applications are being developed in fields like advertising, security in control access and payments, criminal identification, and also medicine, where diseases can be identified from detected face features.
Semantic segmentation in images consists in labeling each pixel of the image with a corresponding class of what is being represented. It is also referred to as dense prediction.
Semantic Segmentation Examples
Semantic segmentation is widely used in autonomous vehicles, equipping them with a perception to understand the road environments. It is also being used in medical applications as image diagnostics, where systems can augment analysis performed by radiologists, by reducing the time required to run diagnostic tests.
3D-image reconstruction consists of inferring the geometrical structure of a scene captured by a collection of images. The camera position and internal parameters are usually assumed to be known or they can be estimated from the set of images. By using multiple images, 3D information can be partially recovered by solving a pixel-wise correspondence problem. Since automatic correspondence estimation is usually ambiguous and incomplete, prior knowledge about the object is necessary.
Reconstructing Images Example
3D-image reconstruction has been used in tomographic reconstruction, 3D echo sounding map reconstruction, augmented reality, archaeology, gaming, and many more.
How is NVIDIA a Leader in This Space?
Computer vision and image processing algorithms are computationally intensive. The increasing demands of computer vision require creative architectures and NVIDIA’s research scientists analyze the interplay between hardware, software and media processing algorithms, and collaborate with NVIDIA’s internal product and engineering teams. By using CUDA acceleration, NVIDIA’s parallel computing platform and programming model, interactive video frame-rate performance on computer vision applications can be achieved.
NVIDIA organizes the GPU Technology Conference (GTC) annually, considered as the must attend digital event for developers, researchers, engineers, and innovators looking to enhance their skills, exchange ideas, and gain a deeper understanding of how AI could transform their work. This event allows the community to discover the latest breakthroughs in artificial intelligence (AI), high-performance computing (HPC), graphics, data science and more.
As part of such a community, RidgeRun presented “How to Build a Multi-Camera Media Server for AI Processing on Jetson” and the blog post “https://developer.nvidia.com/blog/building-multi-camera-media-server-ai-processing-jetson/” in the NVIDIA GTC 2020.
GPU Technology, Supercomputing, & AI
Deep learning relies on GPU acceleration, both for training and inference. NVIDIA delivers GPU acceleration to data centers, desktops, laptops, and the world’s fastest supercomputers. NVIDIA GPU deep learning is also available on services from Amazon, Google, IBM, Microsoft, and many others.
NVIDIA GPUs are currently powering the fastest supercomputers in the U.S. and Europe. In the U.S., Oak Ridge National Labs’ Summit is the world’s smartest supercomputer, fusing HPC and AI to deliver over 200 petaFLOPS of double-precision computing for HPC and 3 exaFLOPS of mixed-precision computing for accelerating scientific discovery.
Supercomputer centers around the world are now adopting the NVIDIA Ampere architecture. They are using it to bring science into the exascale era and simulate larger models, train and deploy deeper networks, and pioneer an emerging hybrid field of AI-assisted simulations.
How GPU Technology Allows for Better Computer Vision Task Performance
Computer vision analyzes images to create numerical representations of the scene. Computer vision tasks are computationally intensive and repetitive, and they often exceed the real-time capabilities of the CPU, leaving little time for higher-level tasks. However, many computer vision operations map efficiently onto the modern GPU technology, whose programmability allows a wide variety of computer vision algorithms to be implemented.
GPU technology provides a streaming, data-parallel arithmetic architecture. This type of architecture carries out a similar set of calculations on an array of image data. The single-instruction, multiple-data (SIMD) capability of the GPU makes it suitable for running computer vision tasks, which often involve similar calculations operating on an entire image. GPU can be used to accelerate computer vision computation and free up the CPU for other tasks. Furthermore, multiple GPUs can be used on the same machine, creating an architecture capable of running multiple computer vision algorithms in parallel.
GPU Technology Example: Self-Driving Cars
Autonomous vehicles are intended to establish safer and more efficient roads. However, it requires massive computational horsepower and large-scale production software expertise. Tapping into decades-long experience in high-performance computing, imaging, and AI, NVIDIA has built a software-defined, end-to-end platform for the transportation industry that enables continuous improvement and continuous deployment through over the air updates. It delivers everything needed to develop autonomous vehicles at scale.
Computer Vision Image Processing Algorithms
Computer vision can use data that comes straight from a sensor:
RidgeRun offers image sensor drivers like RidgeRun's Sony IMX219 CMOS Image Sensor Linux Driver for NVidia Jetson Xavier and Jetson TX1/TX2 but most often uses image signal processor (ISP)
RidgeRun offers ISPs like GStreamer OpenCL Accelerated ISP