FPGA Image Signal Processor - FPGA ISP Accelerators - Convolution

From RidgeRun Developer Wiki


Previous: Modules/Undistort Index Next: Modules/Fast_Fourier_Transform_1D





Introduction

The FPGA-ISP convolution is the professional version of the V4L2 FPGA convolution, developed as a templated module capable of optimizing the resource usage according to the application needs. As a common convolution accelerator, it is capable of receiving a video frame from the kernel space, apply a custom kernel, and return the frame filtered.

With multiple convolution accelerators, it is possible to perform more complex operations, such as demosaicing, Sobel, DoG (Differential of Gaussian), LoG (Laplacian of Gaussian), and other spatial filters.

Hardware description optimizations allow this accelerator to avoid any bottleneck created by this module, thanks to data parallelism, fitting up to eight pixels in a single bus transference.

Our initial goal of making this accelerator run at 30fps @4K became real. This little monster can have a peak throughput of 60fps @4K, limited just by the FPGA clock and the PCIe bandwidth.

Convolution I/O properties

The input/output images have limitations in format and size, which are indicated below:

Property Input image Output image
Min width 8 8
Max width 4096 4096
Min height 8 8
Max height 2160 2160
Formats 8-bit Gray (Mono) 8-bit Gray (Mono)

Convolution in action

Currently, the convolution example is fixed to a 3x3 Gaussian kernel, represented in 16-bit Fixed-Point (Q0,16):

0.0625 0.125 0.0625
0.125 0.25 0.125
0.0625 0.125 0.0625

In a Gaussian blur, the edges will degrade, losing their sharpness.

Another example is a edge detection kernel, described by:

-1 -1 -1
-1 8 -1
-1 -1 -1

The resulting image has blurring which corresponds to the convolution output.

Current throughput

The convolution accelerator has the following throughput in several resolutions:

Maximum framerate using 3x3 convolution for several standard resolutions on a PicoEVB
Resolution Peak framerate PicoEVB (fps) Actual framerate PicoEVB - PCIe (fps)
4k 60 45,137
1080p 241 159,7
720p 542 300,2

For these measurements, we are using a 3x3 kernel. It is completely possible to have greater kernels without sacrificing speed, thanks to parallelism in terms of pixel calculation, which actually lasts one clock. Besides, having greater kernels will lead to more area consumption. Furthermore, you can have a complete convolution over a multichannel (N channels) image just splitting the channels and instantiating N convolution units, enabling real parallel processing with the same execution time as a mono-channel convolution inside of the FPGA depending on your application.

Known issues

1. GStreamer autonegotiation: The caps, such as width, height, and format, must be specified in the pipeline.

2. Numerical precision: Since the resolution is lower than using floating-point numbers, different results might be obtained.

References


Previous: Modules/Undistort Index Next: Modules/Fast_Fourier_Transform_1D