GStreamer OpenCV face detection for Jetson TX2
|
Problems running the pipelines shown on this page? Please see our GStreamer Debugging guide for help. |
Introduction
Computer vision is one of the most promising fields for embedded technologies and it is improving every day with new algorithms, new products, and new ways of implementation. OpenCV provides a framework for developing optimized applications, libraries and algorithms for pattern recognition and images processing in general, while Gstreamer is well known, it provides the tools for creating complex systems for video, audio and data processing.
The NVIDIA Jetson TX1 platform is a great embedded device with such a big capacity, with its four ARM CPUs and its Maxwell based GPU with 256 cores. So, why not to take advantage of this huge capacity?
That's why RidgeRun decided to start working on a project for developing Gstreamer elements that uses OpenCV and CUDA, in order to run any algorithm using the GPU, that's how Gst-OCV born. Ocvfacedetection is the first element, and it is still on progress, but the first attempts have been done.
Gst-OCV Overview
Gst-OCV is a project who has as objective to create a framework for easily developing new Gstreamer elements based on OpenCV algorithms. The main idea is to create an interface through macros, that the developer could use for generating new elements with its own algorithm. This framework is still on progress, but the first version is being tested using the face detection algorithm.
Limitations
- Only in-place processing is supported.
- Only affine transformation metadata is supported.
Architecture
The architecture of the gst-ocv element is shown in Figure 1.
- Input image: The full size original image to process.
- Pyramid image: The original image pyramid-processed. This typically means downscaled and, in some occasions, blurred. See the pyramid concept
- Sink pad: The pad were the full-sized input imaged is pushed. This pad is always present.
- Sink queue: Configurable queue to store past-time buffers in and allow algorithms to perform time analysis.
- Sink pyramid pad: The pad where the pyramid-processed input image is pushed. This pad is optional, meaning that if its not requested by the pipeline, the processing is done over the original image.
- Sink pyramid queue: Queue to store past-time pyramid-processed buffers. This queue is guaranteed to be in-sync with the Sink queue in a 1:1 relationship.
- OCV Element: GStreamer element wrapper that links into the pipeline.
- OCV Algorithm: OpenCV custom algorithm. This is the only piece of code that needs to be filled in by the developer.
- Src pad: The output of the OCV algorithm is pushed to the downstream elements through this pad.
- Output image: The output, possibly processed image. It may carry affine an transformation matrix to be processed by further elements.
GstOCVFacedetection: First results
The ocvfacedetection plugin is built under the Gst-OCV framework, however the pyramid sink pad is not being used yet, so the downscaling process is done inside the element through the OpenCV-CUDA API. At this point, the algorithm uses Grayscale colorspace, but later the RGB support will be added.
Also, at this point, the face detector is done only for the frontal faces, later it will be extended with a better support.
Using a pipeline like this you could display the video captured from the camera:
DISPLAY=:0 gst-launch-1.0 -v nvcamerasrc sensor-id=5 ! nvvidconv ! 'video/x-raw,format=GRAY8' ! ocvfacedetection ! \ nvvidconv ! 'video/x-raw(memory:NVMM),format=I420,width=640,height=480' ! autovideosink
And with a pipeline like this you can create a video file:
DISPLAY=:0 gst-launch-1.0 -v nvcamerasrc sensor-id=5 ! nvvidconv ! 'video/x-raw,format=GRAY8' ! ocvfacedetection ! \ nvvidconv ! 'video/x-raw(memory:NVMM),format=I420,width=640,height=480' ! omxh264enc ! h264parse ! mp4mux ! \ filesink location=test.mp4
This is an example of the face detection plugin, use it with the current state and you could see something like this:
Performance
Using the tegrastats application in the file system of the Jetson TX1, we measured the CPU load and GPU load when running a pipeline with the ocvfacedetection plugin, as you can see below:
RAM 1520/3994MB (lfb 392x4MB) cpu [0%,0%,0%,0%]@921 GR3D 49%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [48%,18%,14%,16%]@921 GR3D 42%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [54%,19%,15%,14%]@710 GR3D 29%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [44%,22%,20%,13%]@710 GR3D 14%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [49%,21%,21%,7%]@825 GR3D 67%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [48%,12%,27%,9%]@825 GR3D 67%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [49%,12%,24%,16%]@921 GR3D 56%@153 EDP limit 0 RAM 1520/3994MB (lfb 392x4MB) cpu [54%,18%,21%,6%]@921 GR3D 37%@153 EDP limit 0
The four CPUs' load is low, it means that none element is processing buffers through CPU algorithms. Also the framerate remains fine. When capturing from a camera ( Sony IMX219 Linux driver for Jetson TX1), to 720p@30 we can see how the framerate is conserved and the ocvfacedetection is not affecting it. Here are some samples:
Timestamp: 1:22:21.823469039; Bps: 302064; fps: 29.49; CPU: 28; INFO: Timestamp: 1:22:22.836659540; Bps: 303257; fps: 30.60; CPU: 27; INFO: Timestamp: 1:22:23.854855758; Bps: 301768; fps: 29.46; CPU: 25; INFO: Timestamp: 1:22:24.855143854; Bps: 307200; fps: 30.0; CPU: 31; INFO: Timestamp: 1:22:25.857528247; Bps: 306586; fps: 29.94; CPU: 25; INFO: Timestamp: 1:22:26.861957115; Bps: 305976; fps: 29.88; CPU: 27;
Contact Us
For direct inquiries, please refer to the contact information available on our Contact page. Alternatively, you may complete and submit the form provided at the same link. We will respond to your request at our earliest opportunity.
Links to RidgeRun Resources and RidgeRun Artificial Intelligence Solutions can be found in the footer below.