Exploring Multi-Camera Setups in Embedded Vision

ridgerun
Jul 13
5 min read

Updated: Oct 3

Understanding the Challenges

Multi-camera systems are becoming essential in various applications. They enhance depth perception and provide comprehensive coverage. However, they also introduce several challenges. These include frame synchronization, managing increased data bandwidth, and handling metadata for each camera.

Synchronization
Metadata and Triggering
Bandwidth
Driver and User-space
Example: Signal Flow from Sensor to Application
Conclusion

Synchronization

To capture frames simultaneously from multiple camera interfaces, both hardware and software approaches are available:

Hardware Sync: Many camera sensors support a sync input or master-slave configuration. One sensor (the master) drives a synchronization signal to others (the slaves). This ensures all sensors expose simultaneously. With rolling shutter sensors, synchronization usually aligns the start of readout. However, due to the rolling nature, microsecond differences may still occur. In contrast, global shutter sensors can align exposures accurately with a sync pulse.
Shared Clock: Another hardware method involves using the same clock source for all sensors. By feeding the same reference clock to each camera, their frame timing becomes inherently aligned. This often accompanies a sync signal for the start of the frame.
SerDes Sync: In SerDes systems like FPD-Link III or GMSL, the deserializer can generate a synchronized frame sync output for all connected cameras. For example, TI’s FPD-Link III chipset can distribute a common synchronization signal, ensuring all cameras start frame capture together.
Software Sync: If hardware synchronization isn’t feasible, software methods can be employed. This includes timestamping frames and aligning them in software. However, this approach is less precise, usually achieving only millisecond precision via system timestamps. It often adds latency and is considered a last resort when hardware sync is unavailable.

Metadata and Triggering

Some camera interfaces output a frame counter or timestamp in embedded metadata. This metadata can help verify synchronization or facilitate software alignment. Drivers must capture and expose this metadata effectively.

For instance, on Jetson platforms, the Argus camera stack can ingest embedded metadata lines if the driver marks them properly in the device tree. On i.MX8, the V4L2 API delivers metadata either through separate V4L2 buffers or sideband channels. This area is still evolving, and not all platforms support metadata out-of-the-box.

Bandwidth

Utilizing multiple cameras naturally increases total throughput. A Jetson AGX Xavier, for example, can handle multiple 4K cameras by using various CSI interfaces or virtual channels over a few CSI interfaces. The camera driver developer must configure the vi/ISP capture settings accordingly to allocate resources for each stream.

NVIDIA provides guidelines on the maximum number of cameras supported. For instance, an Orin might support up to 16 cameras via 16 virtual channels on multiple CSI ports. On NXP i.MX8, limitations may arise, such as the total pixels per second that the ISI or DDR can handle.

Driver and User-space

RidgeRun’s camera drivers often support multi-camera setups by either instantiating multiple V4L2 devices (one per camera) or creating one device with multiple video nodes. For synchronized capture, user-space can utilize the VIDIOC_SYNC mechanisms or ensure that it triggers all cameras and waits for frames.

In GStreamer, users can employ the ts-offset or synchronization properties to align streams. Custom pipeline elements can also be created to merge or sync frames. For example, RidgeRun has developed an image stitcher that merges multiple camera feeds in real-time for surround view, requiring tight synchronization between inputs.

Example: Signal Flow from Sensor to Application

To illustrate the process, let’s consider a typical signal flow for a camera interface on an embedded platform, such as a GMSL camera on a NVIDIA Jetson:

Image Sensor: Captures images (either rolling or global shutter) and outputs raw data (often Bayer RAW10/12) over MIPI CSI-2.
Serializer/Bridge: If used (like a GMSL serializer or an HDMI-to-CSI bridge), it encodes the sensor output for transmission over coax cable or another medium.
Physical Link: For instance, a 15m coax cable carries the high-speed serial data. If using MIPI CSI without SerDes, this step involves a short FPC cable or board trace.
Deserializer/Receiver: Converts data back to MIPI CSI-2 for SerDes or directly feeds the SoC if it’s already CSI-2. On Jetson, this is where the data reaches the Tegra CSI-2 receiver block.
SoC CSI-2 Receiver: This hardware IP parses CSI-2 packets, separates video streams by virtual channel ID if necessary, and forwards them to memory or ISP. On Jetson, this is the “VI” (video input), while on i.MX8, it’s the ISI/CSI capture interface.
Image Signal Processor (ISP): Many SoCs feature an ISP to convert raw Bayer data to usable RGB/YUV. Jetsons have a built-in ISP accessible via the libargus camera stack or the nvarguscamerasrc in GStreamer. If using raw data directly or if the sensor has its own ISP or outputs YUV, this stage might be bypassed.
Kernel Drivers: The V4L2 driver orchestrates the above processes, configuring the sensor via I²C, setting up the SerDes chip, and programming the SoC capture pipeline. Once streaming, it mediates buffer exchange, delivering frames to user space.
User-Space Application: This could be a GStreamer pipeline (e.g., using v4l2src for raw frames or nvarguscamerasrc for Jetson to obtain ISP processed frames), OpenCV grabbing from /dev/videoX, or a custom app using V4L2 ioctls or NVIDIA libargus. The application receives frames for display, encoding, or running computer vision algorithms like TensorRT for AI inference.
Synchronization & Control: If multiple cameras are in use, an additional synchronization mechanism might be employed either in the driver or user-space. Control software may also adjust camera settings (exposure, gain) via V4L2 controls, coordinating between cameras for consistent imaging.

Throughout this chain, careful attention is needed to maintain signal integrity, especially for high-speed MIPI or SerDes links. Proper driver timing is crucial to avoid dropped frames, and metadata tagging frames with timestamps or sequence numbers is essential. RidgeRun’s extensive experience in camera driver development means these considerations are well understood.

Conclusion

In embedded vision development, choosing the right camera interface and camera shutter technology is as important as selecting the processor. With platforms like NVIDIA Jetson AGX Xavier/Orin and NXP i.MX8, developers have a solid foundation of CSI-2 inputs and ISP capabilities to build complex vision systems. The key is software support: device drivers and system integration.

This is where RidgeRun’s expertise comes in. With years of experience building V4L2 drivers for sensors and SerDes, handling multi-camera synchronization, and delivering seamless user-space integration using GStreamer and OpenCV, RidgeRun has enabled customers to capture from 6+ cameras in sync on Jetson, deploy custom thermal solutions, and optimize performance across the entire stack.

By understanding the strengths of each interface and shutter type—and leveraging proven development practices—you can confidently design an embedded vision system that meets your needs for bandwidth, range, and image quality.

For more technical depth or assistance with custom camera drivers, consult RidgeRun’s Developer Wiki or reach out to our team—we’re here to help turn cutting-edge camera technology into reality on embedded Linux platforms.