FPGA Image Signal Processor - FPGA ISP Accelerators - Fast Fourier Transform 1D

Introduction

The Fast Fourier Transform is widely used for image analysis and processing. Its applications go from data filtering, pattern recognition, de-noising, and many others. It is also useful for processing data that are not images. With the FPGA ISP FFT, you can use it in several applications, for accelerating the applications on images mentioned above or for scientific computing, such as fast derivative and integral computations.

RidgeRun ships this FFT as a module for your projects, with the flexibility of extending its scope for single dimension and multi-dimension problems, being possible to run several FFT units at the same time, parallelizing at the data level. With this feature, you can process the FFT over an entire image in several rows at the same time, finishing synchronously.

We also exploited the capability of HLS of using C++11, allowing us to change the module size according to your problem, consuming the resources really needed by your problem. Nevertheless, the FPGA ISP module also works on several data types, since it is also templated in our module. You can easily switch from floating-point numbers to fixed-point numbers by just changing a template parameter.

Algorithm

We have found that the best approach for minimizing the area footprint without sacrificing the FFT throughput is using the Cooley-Tukey^[1] algorithm with Decimation in Frequency (DIF). The foundations of the algorithm are based on the following illustration:

FFT Radix-2 Cooley-Tukey with Decimation in Frequency (DIF)

Based on the picture shown above, the process starts by reading the inputs in the time-space according to the output's indices but bit reverted.

Our FPGA ISP receives the input array, either in real numbers or complex numbers, load the data in a bit reverted fashion into the buffers and start doing the FFT in place. The process finishes with copying back the data from the buffers to achieve maximum throughput by fast data access (multi-port).

Applications

The FFT algorithm is widely used in many fields, from image processing to scientific computing. Let's summarize some of them.

Image processing:

1. Filtering: by cutting the high frequencies, it is possible to reduce the noise within an image. Also, you can apply ideal filters by using step-like low/high-pass filters.

2. Pattern search: there are object trackers that base their foundations on the coupled-filters, such as the Kernelized Correlated Filter (KCF)^[2] and Minimum Output Sum of Squared Error (MOSSE)^[3]

Scientific computing:

1. Function interpolation: finding integrals and derivatives in molecular dynamics (MD): ${\frac {\partial \phi (\mathbf {r} ,t)}{\partial t}}=\nabla \cdot {\big [}D(\phi ,\mathbf {r} )\ \nabla \phi (\mathbf {r} ,t){\big ]}$ ^[4]

2. Spectroscopy: finding patterns in the frequency

3. Differential equation solvers.

Properties

The FPGA ISP FFT is currently a uni-dimensional module (1D). Nevertheless, it can easily scale to 2D by assigning rows to several FFT instances, which can run in parallel if the memory access makes it possible.

Computing a 2D FFT by reusing the same units concurrently

Thanks to the templated development, it is possible to have almost any data type possible compatible with the commonly used C++11 types. Also, it allows us to have fixed size FFT, whose size is given by a template parameter as well.

Data types

For real FFTs:

Fixed-Point (ap_fixed<W,I>)
Single Precision Floating-Point (float)
Double Precision Floating-Point (float)

For complex FFTs:

Complex Fixed-Point (std::complex<ap_fixed<W,I> >)
Complex SP Floating-Point (std::complex<float>)
Complex DP Floating-Point (std::complex<double>)

The IP Core area footprint might vary according to the chosen data type.

FFT sizes

It is adjustable by a template argument. The maximum size depends on the input data (stability) and the data type used (overflow risks). At the moment, the FFT support the following sizes:

Any size power of two ( $N=2^{B}$ ) which match the conditions mentioned above.

FFT API

The FPGA ISP FFT currently supports general real and complex data. The following functions are available from the API:

/* Multipurpose FFT */
rr::isp::fft_1d_complex<SIZE, DATATYPE>(std::complex<DATATYPE> * input, std::complex<DATATYPE> * output, forward);

/* Real FFT */
rr::isp::fft_1d_real<SIZE, DATATYPE>(DATATYPE * input, std::complex<DATATYPE> * output);
rr::isp::ifft_1d_real<SIZE, DATATYPE>(std::complex<DATATYPE> * input, DATATYPE * output);

The API for complex numbers (line 2) combines the FFT forward and backward by using a boolean called forward, where a true value means a conversion from time to frequency, and from frequency to time otherwise. This will allow us to recycle the unit without creating extra instances.

The API for real numbers has been split into two functions (lines 5, 6), which are intended for special cases where a unidirectional usage is required.

IP Core Characteristics

FPGA ISP is very flexible and low-weight compared to other FFTs, such as the HLS FFT.

For a $N=1024$ :

Comparison of FPGA ISP FFT to the Xilinx Free FFT included in HLS
FFT Variant	Latency (Speed)	Footprint
Free HLS Library (Xilinx)	6369 (4.35x)	25% LUT
FPGA ISP FFT	27694 (1x)	9% DSP
FPGA ISP FFT (boost)	13847 (2x)	9% DSP

Conditions:

FPGA: xc7a50 (PicoEVB)
Data type: ap_fixed<16,1>, complex

Note: The results may vary depending on the type, size, and architecture.

Peak performance

The peak performance can be approximated computed to

$T=1.35\times N\log _{2}(N)$

This considers the boost effect.

Boost

The boost is really a mathematical trick. It is possible to compute two real FFTs in a wrapped complex FFT:

$a(x)=f(x)+jg(x)$

where $f$ and $g$ are real functions.

$A(\omega )=F(\omega )+jG(\omega )$

$A(-\omega )=F(-\omega )+jG(-\omega )$

$A(-\omega )^{*}=F(\omega )+jG(\omega )$

$F(\omega )={\frac {1}{2}}(A(\omega )+A(-\omega )^{*})$

$G(\omega )={\frac {1}{j2}}(A(\omega )+A(-\omega )^{*})$

Consumption

The consumption may vary depending on the size of the FFT and the data type. The following table is based on the data type ap_fixed<16,1> and the complex FFT.

Consumption examples
FFT Size	BRAM	DSP	FF	LUT
64	4	11	1024	2247
128	8	11	1002	2190
256	8	11	1023	2198
512	8	11	1044	2206
1024	8	11	1065	2216

Known issues

1. Numerical precision: The throughput and footprint results may differ when using different data types. Also, depending on the input values, the maximum number of inputs can be shortened because of the FFT algorithm instability (overflows). Consider the input datatype as a knob to fix possible numerical issues.

References

↑ Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Math. Comput. 19 (90): 297–301. doi:10.2307/2003354.
↑ Henriques, Caseiro, Martins, Batista. (2015)."High-Speed Tracking with kernelized Correlation Filters". http://www.robots.ox.ac.uk/~joao/publications/henriques_tpami2015.pdf
↑ Bolme, Beveridge, Draper, Lui. (2009). "Visual Object Tracking using Adaptive Correlation Filters". https://www.cs.colostate.edu/~draper/papers/bolme_cvpr10.pdf
↑ Wikipedia. Difussion Equation. https://en.wikipedia.org/wiki/Diffusion_equation

Previous: Modules/Convolution

Index

Next: Getting_the_Code

❯

[1] Cooley, James W.; Tukey, John W. (1965). "An algorithm for the machine calculation of complex Fourier series". Math. Comput. 19 (90): 297–301. doi:10.2307/2003354.

[2] Henriques, Caseiro, Martins, Batista. (2015)."High-Speed Tracking with kernelized Correlation Filters". http://www.robots.ox.ac.uk/~joao/publications/henriques_tpami2015.pdf

[3] Bolme, Beveridge, Draper, Lui. (2009). "Visual Object Tracking using Adaptive Correlation Filters". https://www.cs.colostate.edu/~draper/papers/bolme_cvpr10.pdf

[4] Wikipedia. Difussion Equation. https://en.wikipedia.org/wiki/Diffusion_equation

[1]

[2]

[3]

[4]