FPGA Image Signal Processor - FPGA ISP Accelerators - Fast Fourier Transform 1D
FPGA Image Signal Processor |
---|
Introduction |
FPGA ISP Accelerators/Modules |
Getting the Code |
Examples |
GStreamer Pipelines |
Tested Platforms |
Contact Us |
Introduction
The Fast Fourier Transform is widely used for image analysis and processing. Its applications go from data filtering, pattern recognition, de-noising, and many others. It is also useful for processing data that are not images. With the FPGA ISP FFT, you can use it in several applications, for accelerating the applications on images mentioned above or for scientific computing, such as fast derivative and integral computations.
RidgeRun ships this FFT as a module for your projects, with the flexibility of extending its scope for single dimension and multi-dimension problems, being possible to run several FFT units at the same time, parallelizing at the data level. With this feature, you can process the FFT over an entire image in several rows at the same time, finishing synchronously.
We also exploited the capability of HLS of using C++11, allowing us to change the module size according to your problem, consuming the resources really needed by your problem. Nevertheless, the FPGA ISP module also works on several data types, since it is also templated in our module. You can easily switch from floating-point numbers to fixed-point numbers by just changing a template parameter.
Algorithm
We have found that the best approach for minimizing the area footprint without sacrificing the FFT throughput is using the Cooley-Tukey[1] algorithm with Decimation in Frequency (DIF). The foundations of the algorithm are based on the following illustration:
Based on the picture shown above, the process starts by reading the inputs in the time-space according to the output's indices but bit reverted.
Our FPGA ISP receives the input array, either in real numbers or complex numbers, load the data in a bit reverted fashion into the buffers and start doing the FFT in place. The process finishes with copying back the data from the buffers to achieve maximum throughput by fast data access (multi-port).
Applications
The FFT algorithm is widely used in many fields, from image processing to scientific computing. Let's summarize some of them.
Image processing:
1. Filtering: by cutting the high frequencies, it is possible to reduce the noise within an image. Also, you can apply ideal filters by using step-like low/high-pass filters.
2. Pattern search: there are object trackers that base their foundations on the coupled-filters, such as the Kernelized Correlated Filter (KCF)[2] and Minimum Output Sum of Squared Error (MOSSE)[3]
Scientific computing:
1. Function interpolation: finding integrals and derivatives in molecular dynamics (MD): [4]
2. Spectroscopy: finding patterns in the frequency
3. Differential equation solvers.
Properties
The FPGA ISP FFT is currently a uni-dimensional module (1D). Nevertheless, it can easily scale to 2D by assigning rows to several FFT instances, which can run in parallel if the memory access makes it possible.
Thanks to the templated development, it is possible to have almost any data type possible compatible with the commonly used C++11 types. Also, it allows us to have fixed size FFT, whose size is given by a template parameter as well.
Data types
For real FFTs:
- Fixed-Point (
ap_fixed<W,I>
) - Single Precision Floating-Point (
float
) - Double Precision Floating-Point (
float
)
For complex FFTs:
- Complex Fixed-Point (
std::complex<ap_fixed<W,I> >
) - Complex SP Floating-Point (
std::complex<float>
) - Complex DP Floating-Point (
std::complex<double>
)
The IP Core area footprint might vary according to the chosen data type.
FFT sizes
It is adjustable by a template argument. The maximum size depends on the input data (stability) and the data type used (overflow risks). At the moment, the FFT support the following sizes:
- Any size power of two () which match the conditions mentioned above.
FFT API
The FPGA ISP FFT currently supports general real and complex data. The following functions are available from the API:
/* Multipurpose FFT */ rr::isp::fft_1d_complex<SIZE, DATATYPE>(std::complex<DATATYPE> * input, std::complex<DATATYPE> * output, forward); /* Real FFT */ rr::isp::fft_1d_real<SIZE, DATATYPE>(DATATYPE * input, std::complex<DATATYPE> * output); rr::isp::ifft_1d_real<SIZE, DATATYPE>(std::complex<DATATYPE> * input, DATATYPE * output);
The API for complex numbers (line 2) combines the FFT forward and backward by using a boolean called forward
, where a true
value means a conversion from time to frequency, and from frequency to time otherwise. This will allow us to recycle the unit without creating extra instances.
The API for real numbers has been split into two functions (lines 5, 6), which are intended for special cases where a unidirectional usage is required.
IP Core Characteristics
FPGA ISP is very flexible and low-weight compared to other FFTs, such as the HLS FFT.
For a :
FFT Variant | Latency (Speed) | Footprint |
---|---|---|
Free HLS Library (Xilinx) | 6369 (4.35x) | 25% LUT |
FPGA ISP FFT | 27694 (1x) | 9% DSP |
FPGA ISP FFT (boost) | 13847 (2x) | 9% DSP |
Conditions:
- FPGA: xc7a50 (PicoEVB)
- Data type:
ap_fixed<16,1>
, complex
Note: The results may vary depending on the type, size, and architecture.
Peak performance
The peak performance can be approximated computed to
This considers the boost effect.
Boost
The boost is really a mathematical trick. It is possible to compute two real FFTs in a wrapped complex FFT:
where and are real functions.
Consumption
The consumption may vary depending on the size of the FFT and the data type. The following table is based on the data type ap_fixed<16,1>
and the complex FFT.
FFT Size | BRAM | DSP | FF | LUT |
---|---|---|---|---|
64 | 4 | 11 | 1024 | 2247 |
128 | 8 | 11 | 1002 | 2190 |
256 | 8 | 11 | 1023 | 2198 |
512 | 8 | 11 | 1044 | 2206 |
1024 | 8 | 11 | 1065 | 2216 |
Known issues
1. Numerical precision: The throughput and footprint results may differ when using different data types. Also, depending on the input values, the maximum number of inputs can be shortened because of the FFT algorithm instability (overflows). Consider the input datatype as a knob to fix possible numerical issues.
References
- ↑ Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Math. Comput. 19 (90): 297–301. doi:10.2307/2003354.
- ↑ Henriques, Caseiro, Martins, Batista. (2015)."High-Speed Tracking with kernelized Correlation Filters". http://www.robots.ox.ac.uk/~joao/publications/henriques_tpami2015.pdf
- ↑ Bolme, Beveridge, Draper, Lui. (2009). "Visual Object Tracking using Adaptive Correlation Filters". https://www.cs.colostate.edu/~draper/papers/bolme_cvpr10.pdf
- ↑ Wikipedia. Difussion Equation. https://en.wikipedia.org/wiki/Diffusion_equation