FPGA Image Signal Processor - FPGA ISP Accelerators - Convolution
FPGA Image Signal Processor |
---|
Introduction |
FPGA ISP Accelerators/Modules |
Getting the Code |
Examples |
GStreamer Pipelines |
Tested Platforms |
Contact Us |
Introduction
The FPGA-ISP convolution is the professional version of the V4L2 FPGA convolution, developed as a templated module capable of optimizing the resource usage according to the application needs. As a common convolution accelerator, it is capable of receiving a video frame from the kernel space, apply a custom kernel, and return the frame filtered.
With multiple convolution accelerators, it is possible to perform more complex operations, such as demosaicing, Sobel, DoG (Differential of Gaussian), LoG (Laplacian of Gaussian), and other spatial filters.
Hardware description optimizations allow this accelerator to avoid any bottleneck created by this module, thanks to data parallelism, fitting up to eight pixels in a single bus transference.
Our initial goal of making this accelerator run at 30fps @4K became real. This little monster can have a peak throughput of 60fps @4K, limited just by the FPGA clock and the PCIe bandwidth.
Convolution I/O properties
The input/output images have limitations in format and size, which are indicated below:
Property | Input image | Output image |
---|---|---|
Min width | 8 | 8 |
Max width | 4096 | 4096 |
Min height | 8 | 8 |
Max height | 2160 | 2160 |
Formats | 8-bit Gray (Mono) | 8-bit Gray (Mono) |
Convolution in action
Currently, the convolution example is fixed to a 3x3 Gaussian kernel, represented in 16-bit Fixed-Point (Q0,16):
0.0625 | 0.125 | 0.0625 |
0.125 | 0.25 | 0.125 |
0.0625 | 0.125 | 0.0625 |
In a Gaussian blur, the edges will degrade, losing their sharpness.
Another example is a edge detection kernel, described by:
-1 | -1 | -1 |
-1 | 8 | -1 |
-1 | -1 | -1 |
The resulting image has blurring which corresponds to the convolution output.
Current throughput
The convolution accelerator has the following throughput in several resolutions:
Resolution | Peak framerate PicoEVB (fps) | Actual framerate PicoEVB - PCIe (fps) |
---|---|---|
4k | 60 | 45,137 |
1080p | 241 | 159,7 |
720p | 542 | 300,2 |
For these measurements, we are using a 3x3 kernel. It is completely possible to have greater kernels without sacrificing speed, thanks to parallelism in terms of pixel calculation, which actually lasts one clock. Besides, having greater kernels will lead to more area consumption. Furthermore, you can have a complete convolution over a multichannel (N channels) image just splitting the channels and instantiating N convolution units, enabling real parallel processing with the same execution time as a mono-channel convolution inside of the FPGA depending on your application.
Known issues
1. GStreamer autonegotiation: The caps, such as width, height, and format, must be specified in the pipeline.
2. Numerical precision: Since the resolution is lower than using floating-point numbers, different results might be obtained.
References