Welcome
I am a software engineer at Google. I work on developing and optimizing large-scale AI systems for LLM training and serving.
Before joining Google, I was a Google Ph.D. Fellow and a Mavis Future Faculty Fellow and I received my Ph.D. from the University of Illinois Urbana-Champaign (UIUC) in 2022. My advisor is Prof. Deming Chen and I closely work with Prof. Wen-mei Hwu and Prof. Junjun Xiong. I received my B.S. and M.S. in UESTC, Chengdu, China.
My research interests include AI Systems, Energy-efficient Computing, and HW/SW Co-design.
Contact Information
xiaofanz at google dot com
Successfully defended my PhD Thesis, July, 2022
I have successfully defended my Ph.D. thesis! I am exceptionally grateful for receiving a lot of help and support from my advisors, colleagues, friends, and family over the years.
My thesis Efficient AI Hardware Acceleration is available for open access.
AutoDistill makes NLP models run efficiently on TPUv4i
Our recent collaboration with Google proposes AutoDistill, an end-to-end model distillation and model architecture exploration framework for building hardware-aware NLP pre-trained models with BERT-level accuracy but 5X fewer parameters.
[ Paper ]
Xiaofan Wins ACM Student Research Competition at ICCAD'21, Nov. 2021.
Xiaofan received the 1st place winner award of the ACM Student Research Competition at ICCAD 2021 by proposing three end-to-end design flows for building efficient edge and cloud AI systems. These flows are parts of Xiaofan's latest research including EcoSys (published in TCAD), F-CAD (published in DAC'21), and SkyNet (published in MLSys'20).
Distributed GPU Acceleration Work Accepted by ICPP'21, Jun. 2021.
This work proposes a distributed GPU system with HW/SW optimization strategies to leverage texture identification applications. It contains a highly optimized cuBLAS implementation, a hybrid GPU cache design, a customized batch process, and an asymmetric local feature extraction scheme for a more powerful and efficient system design.
It is the first work that provides real-time large-scale texture identification on distributed GPUs with 31X faster search and 20X larger feature cache capacity than conventional CUDA implementation.
Xiaofan Receives Rambus and Mavis Future Faculty Fellowship, Apr. 2021
Xiaofan has been awarded two highly competitive fellowships for 2021-2022: 1) the Rambus Computer Engineering Fellowship to recognize his outstanding research achievements in the computer engineering area;
2) the Mavis Future Faculty Fellowship (MF3) to recognize his great potential to be the next generation of engineering faculty.
F-CAD Accepted by DAC'21 as the First VR Avatar Accelerator Design, Feb. 2021
Our work with Facebook Reality Labs Research titled F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding is accepted by DAC'21. F-CAD is the first work that provides systemic design methodologies for building hardware accelerators to handle the demanding codec avatar decoding process. Codec avatar is one of the most impressive breakthroughs in VR applications that provides photorealistic reproduction of human appearance and real-time expressions.
[ Paper ]
Xiaofan's News Selected as the Top-5 CSL Stories of 2020, Jan. 2021
The Coordinated Science Laboratory (CSL) selected the five most popular news stories from its website in 2020. One of them was the story titled "CSL student receives 2020 Google Ph.D. Fellowship", reporting the story behind Xiaofan's Google Ph.D. fellowship.
Xiaofan Receives 2020 Google Ph.D. Fellowship, Oct. 2020
Xiaofan has been awarded the prestigious Google Ph.D. Fellowship and is the only awardee in the mobile computing area across the world. Google Ph.D. Fellowship Program was created in 2009 to recognize outstanding graduate students who have done exceptional and innovative research in a number of computing disciplines.
DNNExplorer Accepted by ICCAD'20, Jul. 2020
Our paper, DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator, is accepted by the 2020 International Conference On Computer-Aided Design (ICCAD). It proposes an automation tool to enable fast architecture exploration following the new DNN accelerator paradigm to address the drawbacks of existing FPGA-based accelerators.
Xiaofan Receives Sundaram Seshu International Student Fellowship, Apr. 2020
ECE fellowship committee has selected Xiaofan as a recipient of a Sundaram Seshu International Student Fellowship. for 2020-2021. In April 1966, this fund was set up in memory of Dr. Sundaram Seshu, a distinguished former member of the ECE faculty. This fund was to support "Academically outstanding students, preferably, but not invariably, from abroad" in the ECE department of University of Illinois at Urbana-Champaign.
Three Recent Works Accepted by DAC'20, Feb. 2020
1) EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions proposes a differentiable DNN and accelerator co-design to deliver the best AI capability on embedded systems.
2) HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation introduces a novel framework for building DNN accelerators with hybrid Spatial/Winograd CONV PEs.
3) A-QED Verification of Hardware Accelerators presents a new approach for pre-silicon formal verification of stand-alone hardware accelerators.
SkyNet Paper Accepted by MLSys'20, Jan. 2020
Our paper, SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems, is accepted by the 2020 Conference on Machine Learning and Systems (MLSys). There will be an oral presentation and a poster presentation at the conference on Mar. 2nd.
See you soon in Austin!
Best Poster Award at ICML Workshop, June 2019
Our DNN design strategy (Bi-Directional Co-Design Approach) won the Best Poster Award at ICML'19 Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR). This workshop aims at the novelties of resource-efficient machine learning methods and compact DNN representations.
Congratulations!
UIUC Wins the First DAC-SDC Double Championships, June 2019
Two UIUC teams won the 1st place winner award of DAC'19 System Design Contest for both GPU and FPGA tracks, June 5 in Las Vegas, NV. The GPU team proposed a lightweight DNN, SkyNet, following a novel bottom-up approach, which achieved the highest IoU (0.73) and FPS (67) running on the TX2 GPU. The FPGA team used the same DNN with a customized high-performance accelerator design, which delivered 0.71 IoU and 25 FPS on the Ultra96 FPGA.
[ Report ] Team Members Press: IBM Research Blog SyncedTech
FPGA/DNN Co-design Paper Accepted by DAC'19, April 2019
The paper titled "FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge" is accepted by DAC'19, the premier conference for design and automation of electronic systems. This co-design includes a bottom-up hardware-oriented DNN model search for high accuracy, and a top-down FPGA accelerator design considering DNN-specific characteristics. It provides an automatic flow, including an Auto-DNN engine to perform hardware-oriented DNN model search, and an Auto-HLS engine to generate synthesizable C code of the corresponding FPGA accelerator for DNNs.
[ Paper ] Press: SyncedTech
NMT paper presented at ASP-DAC'19, Jan 2019
Xiaofan gave a presentation at ASP-DAC for the work titled "Implementing Neural Machine Translation (NMT) with Bi-Directional GRU and Attention Mechanism on FPGAs Using HLS". It is the first FPGA-based design for a real-life NMT with a problem size of 172 GOP. In this paper, highly optimized HLS IPs are developed along with the hybrid parallel design and the HLS-enabled design space explorations.
[ Paper ]
DNNBuilder Receives ICCAD Best Paper Award, Nov 2018
DNNBuilder received the IEEE/ACM William J. Mccalla ICCAD Best Paper Award. ICCAD is the premier conference devoted to technical innovations in electronic design automation. Congratulations to the team members from UIUC, IBM-Research China, and IBM T. J. Watson Research Center (from left to right: Jinjun Xiong, Deming Chen, Xiaofan Zhang, Wen-mei Hwu, Junsong Wang, and Yonghua Lin)
.
iSmart2 Team Wins Design Contest at DAC'18, Jun 2018
iSmart2 (UIUC + IBM + Inspirit IoT + Boeing) received the 3rd place winner award of the DAC'18 System Design Contest for the FPGA category. Our design is selected based on 61 teams who participated around the world. Shown in the picture is the object detection running in an embedded FPGA (PYNQ-Z1) designed for IoT applications.
[ Code ] Press: IBM Research Blog SyncedTech
AccDNN Demo at FCCM'18, Apr 2018
A live demo of real-time pedestrian/cyclist/car detection was presented at FCCM'18. It is running on an embedded FPGA (XC7Z045) with 22.1 FPS (234 GOPS) and 9.92 ms response time for HD inputs. Following the proposed AccDNN flow, this design can be mapped onto FPGA automatically without any RTL programming and manual design space exploration.
CSRNet Accepted by CVPR'18, Feb 2018
The paper titled "CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes" is accepted by CVPR'18. We propose a novel architecture using dilated CONV layers for crowd counting and density map generation and achieve the state-of-the-art performance.
[ Paper ] Press: Leiphone.com
LRCN Accelerator Presented at FPL'17, Sep 2017
We presented an FPGA-based accelerator for video/image analysis that can efficiently handle the hybrid DNN structure with both CNNs and RNNs. We design a resource allocation algorithm, REALM, to generate guidelines for mapping DNNs to FPGAs under resource constraints. This design delivers 3.10X speedup compared to an NVIDIA K80 and 4.75X speedup compared to an Intel Xeon with 17.5X lower energy per image.
[ Paper ]
LRCN Demo to Vice President of IBM Power Systems Development,May 2, 2017
Brad McCredie, IBM Fellow, Vice President of IBM Power Systems Development, and President of the OpenPOWER Foundation came to visit C3SR and reviewed the cognitive systems research agenda with C3SR faculty and students. Shown in the picture are C3SR students, Xiaofan Zhang and Chuanhao Zhuge (on the left), demoing the FPGA accelerated LRCN (Long-term Recurrent Convolutional Networks) for visual recognition and description, to Dr. McCredie (front) and Ben Kreuz (back).