SEI or In-band: Which Metadata solution do you need?
Metadata is ubiquitous. Whether you need to add synchronized subtitles to a movie, GPS information to snapshots captured by a drone, or custom data to robot video recordings for ML inference, metadata injection/extraction is one of the most frequent requirements of our customers' multimedia/AI projects.
Over the years, RidgeRun has created several solutions to help our customers accelerate their time to market, two of which were specifically engineered to solve metadata requirements: GstSEIMetadata and In-band Metadata GStreamer plugins.
This article presents a guide to help understand and choose the metadata solution that will fit best your project requirements. The GstSEIMetadata and In-band Metadata sections explain the basic concepts and usage of each solution. The GstSEIMetadata vs In-band Metadata section presents the main limitations of each solution and provides a flowchart to help guide the choice according to your project requirements. In the Conclusion section, you can find a summary with the highlights of each option.
GstSEIMetadata is a GStreamer plugin developed by RidgeRun that provides elements to inject and extract metadata in H.264/H.265 encoded video streams. The metadata injection is performed according to a provision on the H.264 AVC and H.265 HEVC standards called “Supplemental enhancement information” or SEI messages. The SEI messages are a type of NAL unit.
Encoded video bytes in an H.264/H.265 stream are contained in the “Network Abstraction Layer” or NAL units. The structure of the NAL units is shown in Figure 1:
Figure 1. Structure of NAL units.
There are different types of NAL units. SEI messages have a NAL unit type of 6 in the H.264 standard. You can learn more about GstSEIMetadata in our developer wiki.
The following subsections provide examples of how to inject and extract metadata using GstSEIMetadata.
Figure 2 shows a diagram with the basic blocks needed in a GStreamer pipeline to inject metadata into a recording:
A video source, such as a camera.
An encoder (such as x264enc or nvv4l2h264enc).
The seiinject element which is used to add the message in the encoded video.
A muxer to store the video (such as mp4mux).
A sink element that stores the video to a file or streams it through the network (with udpsink for example).
Figure 2. Injecting SEI metadata into the stream.
In concrete, the following GStreamer pipeline example will inject a "Hello World" string message into an H.264 sample video stream and store it as a video file with the mp4 container:
gst-launch-1.0 videotestsrc num-buffers=1000 ! 'video/x-raw,width=1280,height=720' ! x264enc ! seiinject metadata="Hello World" ! qtmux ! filesink location=testsei.mp4 -e
The process for extracting the metadata stored within the H.264 stream is shown in Figure 3, the basic elements are:
A recording file reader, such as filesrc. The input stream can also come from the network
A demuxer, such as qtdemux.
The seiextract element which pulls the metadata and decodes it back to text.
A decoder, like avdec_h264.
An element to display the video such as xvimagesink.
Figure 3. Extracting SEI metadata from the stream.
The following Python gist shows a concrete example of how to extract the encoded metadata:
When running this example pipeline, the image along with the encoded metadata will be displayed as in Figure 4.
Figure 4. Decoded video and metadata
Consider using GstSEIMetadata in the following cases:
Container-agnostic metadata is required. You can have any type of container that supports H.264/H.265 i.e. 3GP, MP4, TS, QuickTime, without having to worry about their particular method for storing metadata.
No particular metadata coding standard is needed.
In-Band Metadata also allows the injection and extraction of metadata in your GStreamer-based application. It enables to process, record, or stream video and its associated metadata. Figure 5 shows the general block diagram of In-Band Metadata injection.
Figure 5. In-Band Metadata General Diagram.
In concrete, In-band Metadata augments the GStreamer mpegtsmux element to multiplex metadata along with video and audio streams into a MPEG Transport Stream (a standard container for video, audio, and metadata).
In some cases, the metadata needs to follow specific standards to enable consumer applications to extract it from a MPEG TS Stream. Often the metadata standards required are defined by the Motion Imagery Standard Board (MISB), whose mission is to maintain interoperability, integrity, and quality of motion imagery, associated metadata, audio, and others. Some of the standards defined by MISB are based on KLV encoding, such as the ST0601 (UAS Datalink Local Set). KLV is a data encoding standard, often used to transport metadata along with the video. The first chunk will indicate the Key (or data type), the second one will define the Length, and finally, the last bytes associated with the data itself (Value).
You can learn more about In-band Metadata in our developer wiki.
The following subsections provide examples of how to inject and extract metadata using In-band Metadata.
An artificial source of metadata can be easily used to generate metadata and store it in a recording along with a video feed. Figure 6 shows the blocks and connections used to inject metadata into a recording file. The recording can be replaced with a streaming service over the internet, for example.
Figure 6. Metadata injection using GStreamer In-Band metadata.
The block diagram of figure 6 can be translated into a GStreamer pipeline like the following:
gst-launch-1.0 -v metasrc metadata=%T period=1 ! 'meta/x-klv' ! mpegtsmux name=mux ! filesink location=metadata.ts videotestsrc is-live=true ! queue ! x264enc ! mux.
Figure 7 shows how metadata extraction is done with GStreamer In-Band metadata. The video can be extracted from the container and displayed. Then, the metadata is extracted and parsed by a Sink element, with the ability to send the metadata as a signal back to the application.
Figure 7. Metadata extraction using GStreamer In-Band metadata.
gst-launch-1.0 -v filesrc location=metadata.ts ! tsdemux name=demux demux. ! queue ! h264parse ! 'video/x-h264, stream-format=byte-stream, alignment=au' ! avdec_h264 ! autovideosink demux. ! queue ! 'meta/x-klv' ! metasink
The main use case of In-band Metadata is when you need a standard way to carry MISB compliant metadata within a Transport Stream container. This means that the stream would be compatible with most of the Transport Stream players for metadata extraction.
GstSEIMetadata vs In-band Metadata
Both solutions handle metadata injection/extraction to/from a video stream, however, the solution that better suits your project will depend on your specific requirements. The main factors to take into account are:
Some media player applications support the extraction of KLV metadata compliant with specific MISB standards from MPEG-TS streams. If your use case needs an existing media player application to extract MISB-compliant metadata from the MPEG-TS stream, then In-band Metadata is the most convenient solution.
Video Compression Formats
GstSEIMetadata supports only H.264 and H.265 encoded video streams. In-band Metadata support is limited to the video codecs supported by the mpegtsmux GStreamer plugin. For GStreamer 1.20, mpegtsmux supports MPEG and Dirac compression formats in addition to H.264 and H.265. Therefore In-band metadata provides more options to choose from for the video compression format than GstSEIMetadata.
With In-band metadata, you are limited to the MPEG-TS container. On the other hand, GstSEIMetadata is container-agnostic, so you can use any container that supports H.264 or H.265 compressed video as input.
Image to Metadata Mapping
If you need to associate the metadata to a specific video frame, then GstSEIMetadata is the most convenient solution, because the metadata is injected into the encoded frame NAL unit, becoming part of the same stream. It is possible to associate each metadata buffer with a corresponding video frame with In-band metadata too, however, with this solution the metadata and the video are separate streams, and some work to map the video buffer timestamp to the closest metadata buffer timestamp is required to make the mapping.
Which solution should I evaluate?
RidgeRun can provide evaluation versions of GstSEIMetadata and In-band Metadata on demand. Figure 8 presents a flow chart to help you choose which of the two solutions is more convenient for your project:
Figure 8: Evaluation of GstSEIMetadata or In-band Metadata flowchart
RidgeRun offers two metadata injection/extraction solutions that can be used in different scenarios. In-band Metadata is the most convenient for use cases that require to use of a playback application that needs MISB-compliant metadata inside a MPEG-TS stream, or when flexibility to choose from more video codec options is required. On the other hand, GstSEIMetadata is the most convenient when flexibility on the multimedia container is needed and if an easy way to map each video frame to its associated metadata is required.
Feel free to contact us if you have any questions about our metadata solutions!