DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs
Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, Jinjun Xiong, Wen-mei Hwu, Deming Chen
University of Illinois at Urbana-Champaign, IBM Research
Abstract
Building a high-performance FPGA accelerator for Deep Neural Networks (DNNs) often requires RTL programming, hardware verification, and precise resource allocation, all of which can be time-consuming and challenging to perform even for seasoned FPGA developers. To bridge the gap between fast DNN construction in software (e.g., Caffe, TensorFlow) and slow hardware implementation, we propose DNNBuilder for building high-performance DNN hardware accelerators on FPGAs automatically. DNNBuilder is demonstrated on four DNNs (Alexnet, ZF, VGG16, and YOLO) on two FPGAs (XC7Z045 and KU115) corresponding to the edge- and cloud-computing, respectively. The proposed fine-grained layer-based pipeline architecture and the column-based cache scheme contribute to 7.7x and 43x reduction of the latency and BRAM utilization compared to conventional designs. We achieve the best performance (up to 5.15x faster) and efficiency (up to 5.88x more efficient) compared to published FPGA-based classification-oriented DNN accelerators for both edge and cloud computing cases.
Results
The performance of DNNBuilder can peak at 4218 GOPS when running object detection DNNs, which is the highest throughput reported to the best of our knowledge. It can provide millisecond-scale real-time performance for processing HD video and deliver higher efficiency (up to 4.35x) than the GPU-based solutions.
-
Comparison using embedded FPGAs for edge-AI solutions
-
Comparison using mid-range FPGAs for cloud-AI solutions
-
Comparison to GPU solutions
-
DNNBuilder won the IEEE/ACM William J. McCalla ICCAD Best Paper Award
Demo
Following the DNNBuilder design flow, we generated an FPGA-based YOLO accelerator to perform real-time pedestrian, cyclist, and car detection for real-life scenarios. With the 1280X384 inputs, DNNBuilder achieved 22.1 FPS using an embedded FPGA (Xilinx ZC706).
Invited Talks about DNNBuilder
Conference / Seminar | Location | Date |
---|---|---|
IBM Research AI Horizons Colloquium | IBM Research (Cambridge) | Oct. 2018 |
C3SR Seminar | University of Illinois at Urbana-Champaign | Nov. 2018 |
IBM Workshop on Architectures for Secure, Cognitive, and Datacenter Computing | IBM Research (Yorktown) | Nov. 2018 |
Seminar | Google Brain | Nov. 2019 |
Citation
If you find DNNBuilder useful, please cite the DNNBuilder paper: