top of page

NVIDIA Holoscan and the Streaming AI Pipeline Landscape: ATechnical Deep Dive

  • allannavarro1
  • Apr 28
  • 6 min read

Real-time AI processing at the edge is no longer a niche problem — it is the defining engineering challenge across medical devices, industrial inspection, autonomous systems, and broadcast media. NVIDIA has responded with a family of frameworks that, at first glance, can seem overlapping or even redundant: DeepStream SDK, Holoscan SDK, and Holoscan for Media. Understanding what each one is, what it is not, and when to reach for it is essential before committing to any of them in a production design.

This post gives you that map.


1. GStreamer

GStreamer is an open-source, cross-platform multimedia framework built around a plugin pipeline model. Data flows through a directed graph of elements — sources, filters, and sinks — connected by pads. It is language-agnostic (C, Python, Rust bindings all exist), supports a vast ecosystem of community plugins, and is the backbone on which NVIDIA’s higher-level frameworks are built.


Core abstractions:

  • Element — the atomic processing unit (demuxer, decoder, encoder, etc.)

  • Pad — a typed connection point on an element (src or sink)

  • Pipeline — a directed graph of connected elements

  • Bus — an asynchronous message channel for state changes and errors

  • Caps (Capabilities) — the negotiated media type contract between two pads


GStreamer is strictly a CPU-orchestrated framework. GPU offloading requires you to bring your own CUDA kernels or third-party plugins, and data typically crosses the PCIe bus unless you explicitly manage GPU-pinned memory yourself. For basic capture-encode-stream pipelines it is the lowest common denominator, but scaling to multi-stream AI inference without additional tooling quickly becomes painful.


2. NVIDIA DeepStream SDK

DeepStream is NVIDIA's production streaming analytics toolkit, built on top of GStreamer. It extends the GStreamer model by providing over 40 hardware-accelerated plugins that leverage the VIC, NVDEC, NVENC, DLA, and GPU accelerators present on Jetson and discrete GPU platforms. Think of it as "GStreamer with NVIDIA superpowers for vision AI."


Core abstractions (layered on GStreamer):

  • nvinfer / nvinferserver — TensorRT-powered inference plugins

  • nvtracker — multi-object tracking plugin (NvDCF, DeepSORT, etc.)

  • nvmsgbroker — edge-to-cloud messaging via MQTT, Kafka, AMQP

  • nvdsudpsrc / nvdsudpsink — ST 2110 uncompressed video over Rivermax Service Maker — a C++ OO abstraction layer on top of raw Gst pipeline construction (introduced in DeepStream 9.x)

  • DeepStream Libraries — low-level GPU ops powered by CV-CUDA, NvImageCodec, PyNvVideoCodec


DeepStream is the right choice for multi-camera, multi-stream smart-city-style analytics: retail, traffic, manufacturing QC, and security surveillance. Its Metropolis blueprint and the Video Search and Summarization (VSS) reference architecture are production-proven at scale. It has bidirectional cloud messaging built in and ships with Python bindings via `Gst-Python`. The key limitation is that DeepStream inherits GStreamer's latency model. Pipeline elements communicate via GStreamer buffers and the scheduling granularity is buffer-by-buffer. For applications demanding sub-millisecond, deterministic latency — surgical robotics, ultrasound, radar — that model is not a fit.


3. NVIDIA Holoscan SDK

Holoscan SDK is a fundamentally different beast. It is not built on GStreamer. It is an AI sensor processing SDK whose execution backbone is the Graph Execution Framework (GXF) — a high-performance, low-latency task graph engine developed internally at NVIDIA. Originally named Clara Holoscan (targeting medical devices only), as of SDK v0.4.0 it became domain-agnostic.


3.1 Architecture

Holoscan structures applications as a DAG (Directed Acyclic Graph) of Operators executing within Fragments. Multiple fragments can be allocated to different physical nodes in a distributed deployment.

Core abstractions (from NVIDIA official docs):


  • Application — acquires and processes streaming data; a collection of fragments.

  • Fragment — runs a graph of operators on a single physical node.

  • Operator — the most basic unit of work; receives data at input ports, processes it, publishes to output ports (replaces GXF's Codelet concept).

  • Port — an interaction point between operators; input ports ingest, output ports publish (replaces GXF's Receiver/Transmitter).

  • Message — a generic data object for inter-operator communication.

  • Condition — a runtime predicate controlling whether an operator executes (replaces GXF's Scheduling Term).

  • Resource — memory pools, GPU allocators, etc., allocated at initialization.

  • Executor — manages fragment execution using the GXF Scheduler

Under the hood, GXF minimizes data copies across pipeline stages. Combined with GPUDirect RDMA support and the optional Holoscan Sensor Bridge (an FPGA front-end), the platform can ingest high-bandwidth sensor data directly into GPU memory with near-zero CPU involvement.


3.2 Key Differentiators


GPU-resident pipelines. Holoscan is architected to keep data in GPU memory throughout the entire pipeline. There is no implicit CPU round-trip between operators, unlike the GStreamer buffer model.


Deterministic, ultra-low latency. The GXF scheduler provides predictable execution timing suitable for real-time control loops. NVIDIA's own benchmark showed a 3x reduction in system latency for a surgical robotics application (Virtual Incision's MIRA robot) compared to a prior stack.


Distributed execution. A single application graph can span multiple physical nodes via the UCX (Unified Communications X) framework for high-performance point-to-point data transfer.


Hardware portability. The same pipeline code runs on NVIDIA Jetson (embedded), IGX Orin (industrial edge), AGX, and DGX (data center) without rewriting.


HoloInfer and HoloViz. The SDK ships dedicated inference (`HoloInfer`) and visualization (`HoloViz`) operators that are optimized for AI streaming pipelines.


HoloHub. A community repository of reusable operators and reference applications (endoscopy, radar, ultrasound, high-energy light source) that extends the built-in SDK operators.


Language support. Full C++ API with idiomatic Python bindings. GXF operators can be wrapped as Holoscan operators, enabling reuse of existing GXF extensions.


4. NVIDIA Holoscan for Media

Holoscan for Media is a software-defined platform for the broadcast and live media production industry. It is conceptually distinct from the Holoscan SDK: it is not a programming framework but an application platform that orchestrates containerized media workloads on repurposable GPU clusters.


Its architecture layers are:


  • DeepStream SDK — the GStreamer-based inference and processing engine at the core

  • Media Gateway — a reference containerized application built on DeepStream that provides ST 2110 ingress/egress with NMOS IS-05 dynamic connection management

  • Rivermax SDK — NVIDIA's kernel-bypass IP media transport library for ST 2110 uncompressed video

  • Kubernetes / Red Hat OpenShift — the orchestration layer (production deployments use OpenShift 4.14 with NVIDIA Network Operator)

  • Platform services — Whereabouts (IP address management), Longhorn (persistent storage), Istio service mesh


The platform targets broadcast engineers and media OEMs who want to migrate traditional SDI infrastructure to software-defined IP workflows. It is not a framework you code against directly; it is a platform you deploy and configure — primarily via Helm chart values files with pipeline DSL strings.


5. Framework Comparison


Dimension

Bare GStreamer

DeepStream SDK

Holoscan SDK

Holoscan for Media

Abstraction level

Low —element/pad/pipeline

Medium — GStreamer +

NVIDIA plugins

High — Operator/Fragment

DAG

Platform — containerized

apps

Execution engine

GStreamer GLib main loop

GStreamer + NVIDIA

accelerators

GXF (Graph Execution

Framework)

DeepStream + Kubernetes

Primary language

C, Python, Rust

C/C++, Python

C++, Python

YAML/Helm DSL + Deepstream

GPU data residency

Manual, opt-in

Partial (nvbuf surfaces)

First-class, end-to-end

Via DeepStream

Target latency

Milliseconds to seconds

Low-to-mid milliseconds

Sub-millisecond possible

Broadcast frame latency

AI inference

DIY plugins

TensorRT via nvinfer

HoloInfer (TensorRT)

Via DeepStream nvinfer

Multi-stream scaling

Manual

Native (batched NvInfer)

Per-operator parallelism

Via Kubernetes

Distributed execution

Not native

Not native

Native (UCX, multi-fragment

Via Kubernetes

Hardware targets

Any

Jetson, x86 dGPU

Jetson, IGX, AGX, DGX

x86 dGPU clusters

Open source

Yes (LGPL)

Partial

SDK is open source (GitHub)

Partially (reference apps)

Primary use case

General multimedia

Smart cities, surveillance,

retail

Medical, robotics, industrial AI

Broadcast, live media

production



6. When to Use Each

Use bare GStreamer when:

  • Your pipeline does not require AI inference or GPU-accelerated processing

  • You need maximum plugin ecosystem breadth (RIST, SRT, MPEG-TS, RTSP all work out of the box)

  • You are prototyping or building a lightweight streaming relay/transcode service

  • Your team already has deep GStreamer expertise and the workload fits


Use DeepStream SDK when:

  • You are building multi-sensor, multi-camera vision AI analytics

  • Your use case is smart city, retail analytics, manufacturing QC, or traffic management

  • You need NVIDIA Metropolis ecosystem integration or edge-to-cloud message brokering

  • Latency is important but not deterministic sub-millisecond — tens of milliseconds is acceptable

  • You want Python-accessible pipelines with existing GStreamer plugin compatibility


Use Holoscan SDK when:

  • You need deterministic, ultra-low-latency AI pipelines (surgical robotics, ultrasound, radar)

  • Data must remain GPU-resident end-to-end without PCIe round-trips

  • Your application spans multiple physical nodes in a distributed inference topology

  • You are targeting regulated domains (medical devices, industrial inspection) that require a hardened, NVIDIA-supported SDK

  • You want portability across Jetson → IGX → data center without pipeline rewrites.


Use Holoscan for Media when:

  • Your team is deploying broadcast infrastructure (live production, playout, contribution links)

  • You are migrating SDI workflows to ST 2110 IP-based production

  • You need NMOS IS-05 dynamic connection management at scale

  • You want a vendor-supported platform rather than assembling a framework from scratch

  • Your deployment target is a Kubernetes-managed GPU cluster in a data center or cloud


For more technical depth or assistance with these technologies, consult RidgeRun’s Developer Wiki or reach out to our team—we’re here to help turn cutting-edge technology into reality on embedded Linux platforms.


References

  1. NVIDIA. Holoscan SDK Overview. NVIDIA Developer Documentation. https://docs.nvidia.com/holoscan/sdk-user-guide/overview.html

  2. NVIDIA. Holoscan Core Concepts. NVIDIA Developer Documentation. https://docs.nvidia.com/holoscan/sdk-user-guide/holoscan_core.html

  3. NVIDIA. Relevant Technologies — GXF, UCX, NPP*. NVIDIA Developer Documentation. https://docs.nvidia.com/holoscan/sdk-user-guide/relevant_technologies.html

  4. NVIDIA. Holoscan and GXF. NVIDIA Developer Documentation. https://docs.nvidia.com/holoscan/sdk-user-guide/gxf/holoscan_and_gxf.html

  5. NVIDIA. DeepStream SDK Overview. NVIDIA Metropolis Documentation. https://docs.nvidia.com/metropolis/deepstream/dev-guide/text/DS_Overview.html

  6. NVIDIA. DeepStream SDK Developer Page. https://developer.nvidia.com/deepstream-sdk

  7. NVIDIA. Holoscan for Media — DeepStream Integration. NVIDIA Developer Documentation. https://docs.nvidia.com/holoscan-for-media/latest/user-guide/getting-help/deepstream.html

  8. NVIDIA. Transform Live Media Pipelines with NVIDIA Holoscan for Media*. NVIDIA Technical Blog, 2024. https://developer.nvidia.com/blog/transform-live-media-pipelines-with-nvidia-holoscan-for-media/

  9. NVIDIA. Holoscan Platform for Real-Time Edge Computing. https://www.nvidia.com/en-us/edge-computing/holoscan/

  10. NVIDIA. Holoscan SDK GitHub Repository. https://github.com/nvidia-holoscan/holoscan-sdk

  11. NVIDIA. GXF Core Concepts. NVIDIA Developer Documentation. https://docs.nvidia.com/holoscan/archive/holoscan-0.4.0/gxf/gxf_core_concepts.html

  12. NVIDIA. Clara Holoscan SDK — GStreamer and DeepStream Sample Applications. NVIDIA Developer Documentation (Archive). https://docs.nvidia.com/clara-holoscan/archive/clara-holoscan-0.1.0/introduction.html



bottom of page