High-Performance Video Content Recognition with Long-term Recurrent Convolutional Network for FPGA

Xiaofan Zhang, Xinheng Liu,  Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, Deming Chen

University of Illinois at Urbana-Champaign, Tsinghua University,  Beihang University, Inspirit IoT, Inc.

Abstract

FPGA is a promising candidate for the acceleration of Deep Neural Networks (DNN) with improved latency and energy consumption compared to CPU and GPU-based implementations. DNNs use sequences of layers of regular computation that are well suited for HLS-based design for FPGA. However, optimizing large neural networks under resource constraints is still a key challenge. HLS must manage on-chip computation, buffering resources, and off-chip memory
accesses to minimize the total latency. In this paper, we present a design framework for DNNs that uses highly configurable IPs for neural network layers together with a new design space exploration engine for Resource Allocation Management (REALM). We also carry out efficient memory subsystem design and fixed-point weight re-training to further improve our FPGA solution. We demonstrate our design framework on the Long-term Recurrent Convolution Network for video inputs. Our implementation on a Xilinx VC709 board achieves 3.1X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.

Results

We demonstrated our design on a proposed end-to-end, real-time, video content description system with a webcam for capturing images and the LRCN kernel implemented on a VC709 FPGA for DNN inference. 

LRCN4.png
LRCN3.png
LRCN_poster.png
LRCN_slides.png

Citation

If you find our LRCN accelerator useful, please cite: