Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jan 13;5(1):16.
doi: 10.3390/jimaging5010016.

FPGA-Based Processor Acceleration for Image Processing Applications

Affiliations

FPGA-Based Processor Acceleration for Image Processing Applications

Fahad Siddiqui et al. J Imaging. .

Abstract

FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a k-means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the k-means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively.

Keywords: FPGA; hardware acceleration; heterogeneous computing; image processing; processor architectures.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Bandwidth/memory distribution in Xilinx Virtex-7 FPGA which highlight how bandwidth and computation improves as we near the datapath parts of the FPGA.
Figure 2
Figure 2
Illustration of possible data and task parallel decomposition of a dataflow algorithm found in image processing designs where the numerous of rows indicate the level of parallelism.
Figure 3
Figure 3
A brief description of the design flow of a hardware and software heterogeneous system highlighting key features. More detail of the flow is contained in reference [11].
Figure 4
Figure 4
(a) Impact of DSP48E1 configurations on maximum achievable clock frequency using different speed grades using Kintex-7 FPGAs for fully pipelined with no (NOPATDET) and with (PATDET) PATtern DETector, then multiply with no MREG (MULT_NOMREG) and pattern detector (MULT_NOMREG_PATDET) and a Multiply, pre-adder, no ADREG (PREADD_MULT_NOADREG) (b) Impact of BRAM configurations on the maximum achievable clock frequency of Artix-7, Kintex-7 and Virtex-7 FPGAs for single and true-dual port RAM configurations.
Figure 5
Figure 5
A range of dataflow models taken from [24,25]. (a) DFG node without internal storage called configuration ①; (b) DFG actor without internal storage t1 and constant i called configuration ②; (c) Programmable DFG actor with internal storage t1, t2 and t3 and constants i and j called configuration ③.
Figure 6
Figure 6
FPGA datapath models resulting from Figure 5. (a) Programmable ALU corresponding to configuration ①; (b) Fine-grained processor corresponding to configuration ②; (c) Coarse-grained processor corresponding to configuration ③.
Figure 7
Figure 7
Impact of the various datapath models ①, ②, ③ on fmax across Xilinx Artix-7, Kintex-7 and Virtex-7 FPGA families.
Figure 8
Figure 8
Block diagram of FPGA-based soft core Image Processing Processor (IPPro) datapath highlighting where relevant the fixed Xilinx FPGA resources utilised by the approach.
Figure 9
Figure 9
System architecture of IPPro-based hardware acceleration highlighting data distribution and control infrastructure, FIFO configuration and Finite-State-Machine control.
Figure 10
Figure 10
High-level implementation of k-means clustering algorithm: (a) Graphical view of Orcc dataflow network; (b) Part of dataflow network including the connections; (c) Part of Distance.cal file showing distance calculation in RVC-CAL where two pixels are received through an input FIFO channel, processed and sent to an output FIFO channel; (d) Compiled IPPro assembly code of Distance.cal.
Figure 11
Figure 11
IPPro-based hardware accelerator designs to explore and analyse the impact of parallelism on area and performance based on Single core IPPro ①, eight-way parallel SIMD IPPro ②, parallel Dual core IPPro ③ and combined Dual core 8-way SIMD IPPro called ④.
Figure 12
Figure 12
Section execution times and ratios for each stage of the traffic sign recognition algorithm.
Figure 13
Figure 13
(a) The simplified IPPro assembly code of 3 × 3 dilation operation. (b) The output result of implemented design.
Figure 14
Figure 14
Stage-wise comparison of traffic sign recognition acceleration using ARM and IPPro based approach.

References

    1. Conti F., Rossi D., Pullini A., Loi I., Benini L. PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision. J. Signal Process. Syst. 2016;84:339–354. doi: 10.1007/s11265-015-1070-9. - DOI
    1. Lamport L. The Parallel Execution of DO Loops. Commun. ACM. 1974;17:83–93. doi: 10.1145/360827.360844. - DOI
    1. Markov I.L. Limits on Fundamental Limits to Computation. Nature. 2014;512:147–154. doi: 10.1038/nature13570. - DOI - PubMed
    1. Bacon D.F., Rabbah R., Shukla S. FPGA Programming for the Masses. ACM Queue Mag. 2013;11:40–52. doi: 10.1145/2436256.2436271. - DOI
    1. Gort M., Anderson J. Design re-use for compile time reduction in FPGA high-level synthesis flows; Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT); Shanghai, China. 10–12 December 2014; pp. 4–11.

LinkOut - more resources