. 2019 Mar 6;5(3):34.

doi: 10.3390/jimaging5030034.

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

Runbin Shi¹, Justin S J Wong¹, Hayden K-H So¹

Affiliations

PMID: 34460462
PMCID: PMC8320917
DOI: 10.3390/jimaging5030034

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

Runbin Shi et al. J Imaging. 2019.

. 2019 Mar 6;5(3):34.

doi: 10.3390/jimaging5030034.

Authors

Runbin Shi¹, Justin S J Wong¹, Hayden K-H So¹

Affiliation

¹ Department of Electrical and Electronic Engineering, The University of Hong Kong, Pok Fu Lam, Hong Kong.

PMID: 34460462
PMCID: PMC8320917
DOI: 10.3390/jimaging5030034

Abstract

Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to process images continuously at different sizes and resolutions in hardware without interruptions. FPGA is especially suitable for implementation of such flexible streaming architecture, but most existing solutions require run-time reconfiguration, and hence cannot achieve seamless image size-switching. In this paper, we propose a dynamically-programmable buffer architecture (D-SWIM) based on the Stream-Windowing Interleaved Memory (SWIM) architecture to realize image processing on FPGA for image streams at arbitrary sizes defined at run time. D-SWIM redefines the way that on-chip memory is organized and controlled, and the hardware adapts to arbitrary image size with sub-100 ns delay that ensures minimum interruptions to the image processing at a high frame rate. Compared to the prior SWIM buffer for high-throughput scenarios, D-SWIM achieved dynamic programmability with only a slight overhead on logic resource usage, but saved up to 56 % of the BRAM resource. The D-SWIM buffer achieves a max operating frequency of 329.5 MHz and reduction in power consumption by 45.7 % comparing with the SWIM scheme. Real-world image processing applications, such as 2D-Convolution and the Harris Corner Detector, have also been used to evaluate D-SWIM's performance, where a pixel throughput of 4.5 Giga Pixel/s and 4.2 Giga Pixel/s were achieved respectively in each case. Compared to the implementation with prior streaming frameworks, the D-SWIM-based design not only realizes seamless image size-switching, but also improves hardware efficiency up to 30 × .

Keywords: D-SWIM; FPGA; high-throughput; line buffer; low-latency; streaming architecture.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 1**
A streaming architecture example for image processing with a 2D pattern. The architecture has three components: buffer, operator, and interconnections.

**Figure 2**
Motivation for arbitrary sized image processing: (a) user-defined Region of Interest (ROI) processing; (b) arbitrary sized image processing in cloud computing.

**Figure 3**
(a) Shows that the image lines are sequenced into a stream and then clipped to multi-pixel blocks; FPGA accepts one block in each cycle. (b) Shows the general pixel buffer in which the BRAM-misalignment issue occurs. (c) Shows the SWIM buffer avoids the BRAM-misalignment using specific BRAM partition.

**Figure 4**
Overview of D-SWIM framework.

**Figure 5**
(a) shows LB write behavior with conventional BRAM usage. (b) shows LB write behavior with the byte-wise write enable signal using BRAM primitive instantiation.

**Figure 6**
Example of buffer load and store with the line-rolling behavior.

**Figure 7**
(a) shows an example of the block storage pattern with parameters $N_{l i n e} = 44$ , $N_{b l k} = 16$ , and $H = 3$ . (b) shows the buffer instruction list for achieving the access pattern in (a).

**Figure 8**
D-SWIM workflow with dynamic programming for arbitrary sized image processing.

**Figure 9**
The D-SWIM buffer is composed of $LBs$ and $Controller$ . Each BRAM in the LB is equipped with an $Address Counter$ to manage the write address. It performs address incrementation or reset according to the signal on the controller bus. The $Addr MUX$ allows the write addresses to be broadcasted during a block write operation of a specific LB as the read addresses of the other LBs for block loading.

**Figure 12**
(a) shows the $f_{m a x}$ of D-SWIM and SWIM designs with the configurations in Table 3. (b) shows the power consumption of D-SWIM and SWIM, with the breakdown of static and dynamic power.

**Figure 13**
D-SWIM-based architecture for Conv2D ( $3 \times 3$ window).

**Figure 14**
D-SWIM-based architecture for HC detector ( $3 \times 3$ window).

See this image and copyright information in PMC

References

1. Guo C., Meguro J., Kojima Y., Naito T. A multimodal ADAS system for unmarked urban scenarios based on road context understanding. IEEE Trans. Intell. Transp. Syst. 2015;16:1690–1704. doi: 10.1109/TITS.2014.2368980. - DOI
1. Rosenfeld A. Multiresolution Image Processing and Analysis. Volume 12 Springer Science & Business Media; Berlin, Germany: 2013.
1. Wang M., Ng H.C., Chung B.M., Varma B.S.C., Jaiswal M.K., Tsia K.K., Shum H.C., So H.K.H. Real-time object detection and classification for high-speed asymmetric-detection time-stretch optical microscopy on FPGA; Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT); Xi’an, China. 7–9 December 2016; pp. 261–264.
1. Ma Y., Cao Y., Vrudhula S., Seo J.S. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks; Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL); Ghent, Belgium. 4–8 September 2017; pp. 1–8.
1. Pu J., Bell S., Yang X., Setter J., Richardson S., Ragan-Kelley J., Horowitz M. Programming heterogeneous systems from an image processing DSL. ACM Trans. Archit. Code Optim. (TACO) 2017;14:26. doi: 10.1145/3107953. - DOI

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

Affiliation

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous