Accelerate Motion Detection with GPU Processing

New Product Announcement!

Introducing RidgeRun's GPU Motion Detector

Although deep learning has made significant progress in image processing and analysis, there are still scenarios that are better suited for classical algorithms. For instance, situations where there is limited training data, limited computing resources or the need for a deterministic and interpretable solution. Even without these constraints, integrating both deep learning and classical algorithms can provide a powerful solution that leverages the accuracy and capacity of deep learning techniques along with the simplicity and efficiency of classical ones.

Motion detection is one of the most popular and useful processing stages in machine vision systems. It refers to an algorithm that detects the presence of moving objects within a specific area. In the case of camera-based detectors, it works by analyzing structured changes in the pixel values and triggering an alarm if there is movement detected. Motion detectors are useful in various applications such as security systems where they can help detect intruders or suspicious activity, traffic monitoring where they can help track vehicles or notify about unidentified objects or basically any other scenario where the nature of the moving object may be unknown. Even though it's such a cornerstone algorithm, it can be efficiently implemented using classical methods.

RidgeRun's GPU Accelerated Motion Detector is a new software solution that allows you to integrate real-time motion detection to your application. The processing is performed on the GPU using CUDA, which not only speeds up processing, but also frees up the CPU for other important tasks. It comes in the form of a library that can be easily integrated into an existing application but also provides a GStreamer plug-in to ease the integration with existing pipelines. The detector will not only notify when motion is detected, but will provide the exact coordinates of the different moving objects in the scene. All these happen without the need of ever being trained or limited to a specific class of objects.

Example of the motion detector in action. Top: Binary mask highlighting areas where movement was detected. Bottom: Bounding boxes over the distilled motion regions.

Read along to find more details about its implementation, benefits and how it can boost your application's reactive ability in an efficient way.

Project Design

RidgeRun's GPU accelerated motion detector is an out-of-the box solution to detect motion over a video feed. While you use it as a drop-in software module, the detector is actually composed of three processing blocks:

Motion segmentation (masks)
Noise reduction
Blob detection

These stages are chained together to provide an integral motion detection solution as shown in the following figure.

Motion detection processing pipeline.

Motion Detection

Although the full project is named Motion Detection, it's actually the first of the three stages that actually performs this detection. The technique that is used is called background subtraction and the idea is pretty simple in concept: if you have an image of the background as a reference, you can subtract the current image from it and the pixels with significant differences correspond to those with movement. The following figure shows the previous concept.

An image and its corresponding background subtraction result. Besides the vehicles and pedestrians, leaves and smaller types of movements are also detected.

The hard part is to come up with the background reference. Since there is rarely a ground truth background available or it may even vary over time, the algorithm needs to estimate this reference over time. This technique is called background modeling and one of the most effective methods to achieve it consists of evaluating the values of the pixels over several video frames. The pixels that remain constant through a larger period of time have greater probability of belonging to a background. More specifically, RidgeRun makes use of two algorithms named MOG (Mixture of Gaussians) and MOG2, where the frames in the time window under consideration are not evaluated equally, but follow a combination of Gaussian distribution and weighting parameters.

The motion detection stage can be configured to fine tune the algorithm for your specific application. One of these parameters is the region-of-interest (ROI) as a subsection of the original image, where motion is to be found. The following image shows an example of a system where a ROI in the center of the image has been specified.

In a system with a ROI configured in the center of the frame, motion is detected within that region.

Similarly, the library exposes a sampling-frequency parameter that instructs the detector to only detect motion only in 1-in-every-N frames. The ability to specify the ROI and the sampling-frequency can reduce the resource consumption of the algorithm, and hence the full system, in great ways.

Noise Reduction

Cameras are noisy devices. The way image sensors work is that they translate light photons from the captured scene into proportional electrical signals, but this is a highly chaotic and random process. Even if the camera is still, two consecutive image frames will never have identical pixel values. Small light and/or electrical variations will make corresponding pixels have slightly different values. The problem is that all these variations may be detected as movement, but in reality they are just false positives.

To overcome this, RidgeRun's Motion Detector provides a noise reduction module that minimizes this effect. This stage applies a low pass filter and a thresholding algorithm to the image. The former reduces variability among neighboring pixels so that their values are more alike. The latter simply discards pixels below the given threshold. The combination of both effectively cleans the initial motion detection result. The following figure shows the noise reduction process.

The result of low-pass filtering the background subtraction of the previous step.

The module exposes two parameters to control the quality of the noise reduction. The size parameter relates to the effectiveness of the low-pass filter. The larger the size, the more neighboring pixels are considered and, hence, the more effective the filter. However, this also comes with the constraint of higher resource consumption and potentially missing out on small moving objects.

The second parameter is the threshold, which as its name implies, specifies the value under which the pixel values are discarded. The higher the threshold, the more selective the system is at the cost, of course, of missing small movements. Once thresholded, the output image is a binary mask: the pixel value no longer represents the "amount" of potential movement, but whether or not it contains movement.

Blob Detection

The blob detector is in charge of "grouping" neighboring pixels that qualify as movement. These groups are called blobs and will typically represent moving objects in the scene, such as leaves on a tree, cars on the street or a bird in the sky. The positions and sizes of these blobs are returned to the user, so that the precise coordinates where the movement is occurring is known.

The blob detection stage serves also as another filter to discard small undesired movement. The module exposes a min-area parameter that allows the user to specify the minimum size that a potential blob must have before being considered as a detection. This is extremely useful to filter out undesired motion. For example, if the system is supposed to focus on the movement of cars and other vehicles, the min-area can be set sufficiently big that motion produced by pedestrians, leaves or animals is discarded. The following image shows the blob detection algorithm in the works.

The biggest blobs of neighboring pixels are identified and the coordinates are returned to the user.

Using the Motion Detector

The simplest way to get started using RidgeRun's GPU accelerated motion detector is through the GStreamer plug-in. The project provides a convenient single element named rrmotiondetectionbin that encapsulates the three stages described above. Internally, the three modules are automatically chained together but their individual properties are always accessible through top-level element properties. To ease visualization and debugging, a helper element named rrmotionoverlay is provided. The sole purpose of this element is to paint the bounding boxes on top of the blobs where motion was detected.

The following image shows a diagram of a possible pipeline.

GStreamer pipeline that uses the GPU Accelerated Motion Detector and the overlay helper to detect and display movement from a camera.

Upon processing, the detector will announce the detected blobs in two different ways: as GStreamer buffer metas and in the form of a GObject signal. The former is a structure that is inserted to each processed buffer so that the application can intercept these buffers and inspect this information. The latter is a signal that will be emitted if blobs are detected so that the application may connect a callback to receive the notification. Both versions provide basically the same information and, omitting low-level details, it is in form of a JSON structure as the following:

{
   "ROIs":[
      {
         "motion":[
            {
               "x1":0.13177083432674408,
               "x2":0.17578125,
               "y1":0.7282407283782959,
               "y2":0.80509257316589355
            },
            {
               "x1":0.62526041269302368,
               "x2":0.6484375,
               "y1":0.62870371341705322,
               "y2":0.75648152828216553
            }
         ],
         "name":"roi",
         "x1":0,
         "x2":1,
         "y1":0,
         "y2":1
      }
   ]
}

From the structure above it can be seen that the JSON presents a list of regions of interest, each containing a list of movement blobs detected within them. In the specific example above, the pipeline was configured with a single ROI and two movement blobs were detected within it.

Jetson NVMM Support

Since the main goal of the project is to provide an efficient GPU accelerated motion detector, the GStreamer plug-in is capable of consuming and producing NVMM directly. This effectively results in a zero-copy pipeline implementation. The following gst-launch-1.0 pipeline captures from a camera, detects movement and displays the images, all without unnecessary memory transfers between the CPU and the GPU.

gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=3840,height=2160' ! queue !  rrmotiondetectionbin grayscale=true motion_detector::motion=mog2  ! queue ! nv3dsink sync=false

An application may connect to the on-new-motion signal to receive notifications of the detections. To visualize the results on the display, introduce the rrmotionoverlay helper element as the following:

gst-launch-1.0 nvarguscamerasrc ! 'video/x-raw(memory:NVMM),width=3840,height=2160' !  queue !  rrmotiondetectionbin grayscale=true motion_detector::motion=mog2  ! queue ! nvvidconv ! rrmotionoverlay thickness=2 ! nvvidconv ! 'video/x-raw(memory:NVMM),format=I420 ! queue ! nv3dsink sync=false -v

In this case, the memory does need to be transferred to the CPU for the overlay boxes to be painted, and back to the GPU to be displayed. However, since this is a debugging helper element, the performance hit is not critical in this scenario.

Check our developers wiki for a more complete list of GStreamer pipeline to bootstrap your application.

Using the Standalone Library

The motion detector is not tied to GStreamer, it may be used as a standalone library. The API is simple enough to allow a new application to be built in minutes, but flexible enough to fine tune the algorithms for the most demanding applications. A typical application is composed of 5 different steps, as shown in the following figure:

1. Create the factory: The factory is the object that will build all the other detector objects. Different factories will create objects for different implementations. To use the Jetson implementation, simply call create the factory as:

std::shared_ptr<rr::IMotionFactory> factory = std::make_shared<rr::JetsonMotionDetection>();

2. Create the parameters: The parameters will configure the runtime operation of the detector. Each of the three stages expose their own parameters. For example, to configure the first detection stage, you'd call:

/*Define ROI*/
rr::Coordinate<float> roi_coord(ROI_X1, ROI_X2, ROI_Y1, ROI_Y2);
rr::ROI<float> roi(roi_coord, ROI_MIN_AREA, "roi");

/*Motion detection (Mog algorithm)*/
std::shared_ptr<rr::MogParams> motion_params = std::make_shared<rr::MogParams>();
motion_params->setLearningRate(LEARNING_RATE);
motion_params->addROI(roi);

3. Create the algorithms: With the parameters at hand, now we create an instance of each algorithm:

/*Motion Algorithm*/
std::shared_ptr<rr::IMotionDetection> motion_detection = factory->getMotionDetector(rr::IMotionDetection::Algorithm::MOG2, motion_params);

/*Denoise algorithm*/
std::shared_ptr<rr::IDenoise> denoise = factory->getDenoise(rr::IDenoise::Algorithm::GaussianFilter, denoise_params);

/*Blob detection*/
std::shared_ptr<rr::IBlobDetection> blob = factory->getBlobDetector(rr::IBlobDetection::Algorithm::BRTS, blob_params);

4. Create the intermediate frames: To improve flexibility, the algorithms won't allocate frames themselves. Instead, the application must create them:

/*Create intermediate frames*/
std::shared_ptr<rr::Frame> mask = factory->getFrame(rr::Resolution(WIDTH, HEIGHT), rr::Format::FORMAT_GREY);
std::shared_ptr<rr::Frame> filtered = factory->getFrame(rr::Resolution(WIDTH, HEIGHT), rr::Format::FORMAT_GREY);

5. Process frames: Now we're ready to finally process frames. The returned motion_list is a vector with Motion objects that contain the coordinates of each moving object. Ignoring error handling, the processing will look like:

while(true) {
    /*Get input data using one of the provided ways*/

    /*Apply algorithms*/
    CHECK_RET_VALUE(motion_detection->apply(input, mask, motion_params))
    CHECK_RET_VALUE(denoise->apply(mask, filtered, denoise_params))
    CHECK_RET_VALUE(blob->apply(filtered, motion_list, blob_params))
    
    /*Check for motion*/
    if (motion_list.size() != 0) {
        /*Do something with motion objects*/
    }
}

The snippets above have been stripped from proper error handling and other implementation details to favor simplicity or reading. For a more complete walkthrough refer to our tutorial in our developer's wiki.

Besides this, RidgeRun's GPU Accelerated Motion Detector provides convenience data wrappers to ease interoperability with some common frameworks, such as:

GStreamer GstVideoFrame
RidgeRun GstCUDA
OpenCV cv::Mat
OpenCV cv::GpuMat

Closing Remarks

Motion detection is a fundamental stage in many computer vision applications. RidgeRun's GPU Accelerated Motion Detector provides an out-of-the-box solution for it, not only without the bloat of a Deep Learning implementation, but accelerated with CUDA to meet real time performances.

Learn more in the User Guide in our Developers Wiki, or purchase directly from our store. Not convinced? Ask for an evaluation version to try the library by yourself! Send a message to support@ridgerun.com for more information.