Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Mar 6;5(3):34.
doi: 10.3390/jimaging5030034.

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

Affiliations

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

Runbin Shi et al. J Imaging. .

Abstract

Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to process images continuously at different sizes and resolutions in hardware without interruptions. FPGA is especially suitable for implementation of such flexible streaming architecture, but most existing solutions require run-time reconfiguration, and hence cannot achieve seamless image size-switching. In this paper, we propose a dynamically-programmable buffer architecture (D-SWIM) based on the Stream-Windowing Interleaved Memory (SWIM) architecture to realize image processing on FPGA for image streams at arbitrary sizes defined at run time. D-SWIM redefines the way that on-chip memory is organized and controlled, and the hardware adapts to arbitrary image size with sub-100 ns delay that ensures minimum interruptions to the image processing at a high frame rate. Compared to the prior SWIM buffer for high-throughput scenarios, D-SWIM achieved dynamic programmability with only a slight overhead on logic resource usage, but saved up to 56 % of the BRAM resource. The D-SWIM buffer achieves a max operating frequency of 329.5 MHz and reduction in power consumption by 45.7 % comparing with the SWIM scheme. Real-world image processing applications, such as 2D-Convolution and the Harris Corner Detector, have also been used to evaluate D-SWIM's performance, where a pixel throughput of 4.5 Giga Pixel/s and 4.2 Giga Pixel/s were achieved respectively in each case. Compared to the implementation with prior streaming frameworks, the D-SWIM-based design not only realizes seamless image size-switching, but also improves hardware efficiency up to 30 × .

Keywords: D-SWIM; FPGA; high-throughput; line buffer; low-latency; streaming architecture.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A streaming architecture example for image processing with a 2D pattern. The architecture has three components: buffer, operator, and interconnections.
Figure 2
Figure 2
Motivation for arbitrary sized image processing: (a) user-defined Region of Interest (ROI) processing; (b) arbitrary sized image processing in cloud computing.
Figure 3
Figure 3
(a) Shows that the image lines are sequenced into a stream and then clipped to multi-pixel blocks; FPGA accepts one block in each cycle. (b) Shows the general pixel buffer in which the BRAM-misalignment issue occurs. (c) Shows the SWIM buffer avoids the BRAM-misalignment using specific BRAM partition.
Figure 4
Figure 4
Overview of D-SWIM framework.
Figure 5
Figure 5
(a) shows LB write behavior with conventional BRAM usage. (b) shows LB write behavior with the byte-wise write enable signal using BRAM primitive instantiation.
Figure 6
Figure 6
Example of buffer load and store with the line-rolling behavior.
Figure 7
Figure 7
(a) shows an example of the block storage pattern with parameters Nline=44, Nblk=16, and H=3. (b) shows the buffer instruction list for achieving the access pattern in (a).
Figure 8
Figure 8
D-SWIM workflow with dynamic programming for arbitrary sized image processing.
Figure 9
Figure 9
The D-SWIM buffer is composed of LBs and Controller. Each BRAM in the LB is equipped with an AddressCounter to manage the write address. It performs address incrementation or reset according to the signal on the controller bus. The AddrMUX allows the write addresses to be broadcasted during a block write operation of a specific LB as the read addresses of the other LBs for block loading.
Figure 10
Figure 10
Buffer-write logic.
Figure 11
Figure 11
Buffer-read logic.
Figure 12
Figure 12
(a) shows the fmax of D-SWIM and SWIM designs with the configurations in Table 3. (b) shows the power consumption of D-SWIM and SWIM, with the breakdown of static and dynamic power.
Figure 13
Figure 13
D-SWIM-based architecture for Conv2D (3×3 window).
Figure 14
Figure 14
D-SWIM-based architecture for HC detector (3×3 window).

References

    1. Guo C., Meguro J., Kojima Y., Naito T. A multimodal ADAS system for unmarked urban scenarios based on road context understanding. IEEE Trans. Intell. Transp. Syst. 2015;16:1690–1704. doi: 10.1109/TITS.2014.2368980. - DOI
    1. Rosenfeld A. Multiresolution Image Processing and Analysis. Volume 12 Springer Science & Business Media; Berlin, Germany: 2013.
    1. Wang M., Ng H.C., Chung B.M., Varma B.S.C., Jaiswal M.K., Tsia K.K., Shum H.C., So H.K.H. Real-time object detection and classification for high-speed asymmetric-detection time-stretch optical microscopy on FPGA; Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT); Xi’an, China. 7–9 December 2016; pp. 261–264.
    1. Ma Y., Cao Y., Vrudhula S., Seo J.S. An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks; Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL); Ghent, Belgium. 4–8 September 2017; pp. 1–8.
    1. Pu J., Bell S., Yang X., Setter J., Richardson S., Ragan-Kelley J., Horowitz M. Programming heterogeneous systems from an image processing DSL. ACM Trans. Archit. Code Optim. (TACO) 2017;14:26. doi: 10.1145/3107953. - DOI

LinkOut - more resources