Pruning for AI on FPGA

From RidgeRun Developer Wiki
Revision as of 18:27, 15 August 2024 by Lleon (talk | contribs) (Created page with "<noinclude> {{FPGA Expert Minutes/Head|previous=AI and Machine Learning/Quantisation|next=|metakeywords=fpga,v4l2,isp,acceleration,xilinx,lattice,altera}} </noinclude> {{DISPLAYTITLE:Pruning for AI on FPGA|noerror}} == Introduction == Pruning is a model optimisation technique that removes weights that do not significantly impact the inference process. It allows for reducing the storage and computations required for the inference tasks. However, it requires specialise...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)



Previous: AI and Machine Learning/Quantisation Index  





Introduction

Pruning is a model optimisation technique that removes weights that do not significantly impact the inference process. It allows for reducing the storage and computations required for the inference tasks. However, it requires specialised data structures and hardware for the computations.

This wiki will explain the sparse data representations, acceleration considerations and how we can use the FPGA to accelerate sparse operations.

Data Representations

Regarding the data structures, sparse matrix representations are used for matrix computations, encoding the valid rows and columns with a value. However, there are multiple ways to encode the rows, columns and values:

Coordinate (COO)

It encodes the dense matrices into sparse matrices through three different arrays:

  • Row indices
  • Column indices
  • Values

The three arrays are same-sized and the arrays' positions are accessed using the same index.

COO Addressing
COO Addressing

So, to access the dense matrix index from a sparse representation, it is possible to use the following pseudocode:

index_1d = rows_indices[i] * leading_dimension + column_indices[i]


RidgeRun Services

RidgeRun has expertise in offloading processing algorithms using FPGAs, from Image Signal Processing to AI offloading. Our services include:

  • Algorithm Acceleration using FPGAs.
  • Image Signal Processing IP Cores.
  • Linux Device Drivers.
  • Low Power AI Acceleration using FPGAs.
  • Accelerated C++ Applications.

And it includes much more. Contact us at https://www.ridgerun.com/contact.



Previous: AI and Machine Learning/Quantisation Index