Xiaofan Zhang | AI Systems Engineer & Researcher

About Me

I am a Staff Software Engineer at Google DeepMind, working on large-scale AI systems to enable efficienct Gemini training and serving on TPUs. My research interests include AI Systems, Energy-efficient Computing, and Hardware/Software Co-design.

Before I joined Google, I received my Ph.D. from the University of Illinois Urbana-Champaign (UIUC) in 2022. I was a Google Ph.D. Fellow and Mavis Future Faculty Fellow. My research was conducted under the supervision of Prof. Deming Chen, with close collaboration with Prof. Wen-mei Hwu and Prof. Junjun Xiong. I completed my B.S. and M.S. at UESTC in Chengdu, China.

News

JUL

2022

Successfully defended my Ph.D. Thesis

My thesis, Efficient AI Hardware Acceleration, is now available for open access.

[Thesis] [Slides]

OCT

2020

Xiaofan Receives 2020 Google Ph.D. Fellowship

Awarded the prestigious Google Ph.D. Fellowship, recognized as the only recipient in the mobile computing area worldwide for exceptional and innovative research.

Show 3 more items ↓

JUN

2019

UIUC Wins the First DAC-SDC Double Championships

Two UIUC teams secured 1st place in the DAC'19 System Design Contest for both the GPU (SkyNet) and FPGA tracks, achieving high IoU and FPS on embedded platforms.

[Report]

NOV

2018

DNNBuilder Receives IEEE/ACM William J. Mccalla ICCAD Best Paper Award

A huge honor for our paper, DNNBuilder, to receive the Best Paper Award at ICCAD, the premier conference for electronic design automation.

[Paper] [Project]

JUN

2018

iSmart2 Team Wins Design Contest at DAC'18

The iSmart2 team (UIUC + IBM + Inspirit IoT + Boeing) secured 3rd place in the DAC'18 System Design Contest for the FPGA category (out of 61 teams). Our object detection system ran efficiently on an embedded FPGA.

[Code]

Publications

2026

PROMPTS: PeRformance Optimization via Multi-Agent Planning for LLM Training and Serving

Yuran Ding, Ruobing Han, Xiaofan Zhang, Xinwei Chen.

Conference on Machine Learning and Systems (MLSys), May 2026.

2025

ASAP: an Agentic Solution to Auto-optimize Performance of Large-Scale LLM Training

Yuran Ding, Xinwei Chen, Xiaofan Zhang, Zongwei Zhou.

Conference on Neural Information Processing Systems (NeurIPS) ML for Systems Workshop, Dec. 2025.

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gemini Team.

arXiv preprint arXiv:2507.06261, July 2025.

Reconfigurable Stream Network Architecture

Chengyue Wang, Xiaofan Zhang, Jason Cong, James C. Hoe.

International Symposium on Computer Architecture (ISCA), Jun. 2025.

Profile-Guided Quantization: a compiler solution to automate quantization for efficient LLM training

Gil Tabak, Clemens JS Schaefer, Xiaofan Zhang, Denali Molitor, Jinliang Wei, Zongwei Zhou, Philip G Hendrix, Mitchelle Rasquinha.

International Symposium on Computer Architecture (ISCA) workshop on Machine Learning for Computer Architecture and Systems (MLArchSys), Jun. 2025.

SSDTrain: An Activation Offloading Framework to SSDs for Faster Large Language Model Training

Kun Wu*, Jeongmin Brian Park*, Xiaofan Zhang*, Mert Hidayetoğlu, Vikram Sharma Mailthody, Sitao Huang, Steven Sam Lumetta, Wen-mei Hwu. (*equal contributors)

Design Automation Conference (DAC), Jun. 2025.

Show more publications ↓

2024

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan Lin.

Conference on Neural Information Processing Systems (NeurIPS), Dec. 2024.

New Solutions on LLM Acceleration, Optimization, and Application Invited

Yingbing Huang, Jiaxin Wan, Hanchen Ye, Manvi Jha, Jinghua Wang, Yuhong Li, Xiaofan Zhang, Deming Chen.

Design Automation Conference (DAC), June 2024.

AutoAI2C: An Automated Hardware Generator for DNN Acceleration on both FPGA and ASIC

Yongan Zhang, Xiaofan Zhang, Pengfei Xu, Yang Zhao, Cong Hao, Deming Chen, Yingyan Lin.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD).

Software/Hardware Co-design for LLM and Its Application for Design Verification Invited

Jiaxin Wan, Yingbing Huang, Yuhong Li, Hanchen Ye, Jinghua Wang, Xiaofan Zhang, Deming Chen.

Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024.

HomeSGN: A Smarter Home with Novel Rule Mining Enabled by a Scorer-Generator GAN

Zehua Yuan, Junhao Pan, Xiaofan Zhang, Deming Chen.

Asia and South Pacific Design Automation Conference (ASP-DAC), Jan. 2024.

2023

Compilation and Optimizations for Efficient Machine Learning on Embedded Systems

Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen.

Book chapter in Embedded Machine Learning for Cyber-Physical, IoT, and Edge Computing, Springer Nature.

EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search

Qian Jiang*, Xiaofan Zhang*, Deming Chen, Minh N Do, Raymond A Yeh. (*equal contributors)

International Conference on Machine Learning (ICML) Workshop on Differentiable Almost Everything, July 2023.

2022

Efficient AI Hardware Acceleration

Xiaofan Zhang.

Dissertation, University of Illinois Urbana-Champaign (UIUC).

Exploring HW/SW Co-Design for Video Analysis on CPU-FPGA Heterogeneous Systems

Xiaofan Zhang, Yuan Ma, Jinjun Xiong, Wen-mei Hwu, Volodymyr Kindratenko, Deming Chen.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD).

Algorithm/Accelerator Co-Design and Co-Search for Edge AI

Xiaofan Zhang, Yuhong Li, Junhao Pan, Deming Chen.

IEEE Transactions on Circuits and Systems II, 2022.

AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models

Xiaofan Zhang, Zongwei Zhou, Deming Chen, Yu Emma Wang.

arXiv preprint: 2201.08539, Jan, 2022.

2021

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding

Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

Design Automation Conference (DAC), Dec. 2021.

Exploring HW/SW Co-Optimizations for Accelerating Large-scale Texture Identification on Distributed GPUs

Junsong Wang, Xiaofan Zhang, Yubo Li, Yonghua Lin.

International Conference on Parallel Processing (ICPP), Aug. 2021.

Efficient Methods for Mapping Neural Machine Translator on FPGAs

Qin Li*, Xiaofan Zhang*, Jinjun Xiong, Wen-mei Hwu, Deming Chen. (*equal contributors)

IEEE Transactions on Parallel and Distributed Systems (TPDS).

Being-ahead: Benchmarking and Exploring Accelerators for Hardware-Efficient AI Deployment

Xiaofan Zhang, Hanchen Ye, Deming Chen.

Conference on Machine Learning and Systems (MLSys) workshop on Benchmarking Machine Learning Workloads on Emerging Hardware (MLBench), Apr. 2021.

2020

DNNExplorer: A Framework for Modeling and Exploring a Novel Paradigm of FPGA-based DNN Accelerator

Xiaofan Zhang*, Hanchen Ye*, Junsong Wang, Yonghua Lin, JinJun Xiong, Wen-mei Hwu, Deming Chen. (*equal contributors)

International Conference on Computer Aided Design (ICCAD), Nov. 2020.

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge DevicesInvited

Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen.

ACM Great Lakes Symposium on VLSI (GLSVLSI), Sep. 2020.

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, Deming Chen.

Design Automation Conference (DAC), July 2020.

EDD: Efficient Differentiable DNN architecture and implementation co-search for embedded AI solutions

Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen.

Design Automation Conference (DAC), July 2020.

SkyNet: a Hardware-Efficient Method for Object Detection and Tracking on Embedded Systems DAC'19 Champion Design

Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, Jinjun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen.

Conference on Machine Learning and Systems (MLSys). Mar. 2020.

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin.

International Symposium on Field-Programmable Gate Arrays (FPGA). Feb. 2020.

2019

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices Best Poster Award

Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, Jinjun Xiong, Wen-mei Hwu, Deming Chen.

International Conference on Machine Learning (ICML) Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR). June 2019.

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

Cong Hao*, Xiaofan Zhang*, Yuhong Li, Sitao Huang, Jinjun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen. (*equal contributors)

Design Automation Conference (DAC). June 2019.

Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs

Yao Chen, Jiong He, Xiaofan Zhang, Cong Hao, Deming Chen.

International Symposium on Field-Programmable Gate Arrays (FPGA). Feb. 2019.

Implementing Neural Machine Translation with Bi-Directional GRU and Attention Mechanism on FPGAs Using HLS

Qin Li*, Xiaofan Zhang*, JinJun Xiong, Wen-mei Hwu, Deming Chen. (*equal contributors)

Asia and South Pacific Design Automation Conference (ASP-DAC). Jan. 2019.

2018

DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs Best Paper Award

Xiaofan Zhang, Junsong Wang, Chao Zhu, Yonghua Lin, JinJun Xiong, Wen-mei Hwu, Deming Chen.

International Conference on Computer Aided Design (ICCAD). Nov. 2018.

Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA

Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, Deming Chen.

International Conference on Field-Programmable Logic and Applications (FPL). Aug. 2018.

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Yuhong Li, Xiaofan Zhang, Deming Chen.

Computer Vision and Pattern Recognition (CVPR). June 2018.

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs

Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, Jinjun Xiong, Deming Chen.

Great Lakes Symposium on VLSI (GLSVLSI). May 2018.

2017

An Energy Efficient Approach for C4.5 Algorithm using OpenCL Design Flow

Hai Peng, Xiaofan Zhang, Letian Huang.

International Conference on Field-Programmable Technology (FPT). Dec. 2017.

Machine Learning on FPGAs to Face the IoT Revolution Invited

Xiaofan Zhang*, Anand Ramachandran*, Chuanhao Zhuge*, Di He, Wei Zuo, Zuofu Cheng, Kyle Rupnow, Deming Chen. (*equal contributors)

International Conference On Computer Aided Design (ICCAD). Nov. 2017.

High-Performance Video Content Recognition with Long-term Recurrent Convolutional Network for FPGA

Xiaofan Zhang, Anand Ramachandran, Chuanhao Zhuge, Shibin Tang, Peng Ouyang, Zuofu Cheng, Kyle Rupnow, Deming Chen

International Conference on Field-Programmable Logic and Applications (FPL). Sep. 2017.

Awards & Fellowships

Google Gold Perfy Award

2024, 2025

Google Silver Perfy Award

2023

Google Ph.D. Fellowship

2020, 2021

ACM Student Research Competition Winner Award (ICCAD)

2021

Mavis Future Faculty Fellowship (MF3)

2021

Rambus Computer Engineering Fellowship

2021

Sundaram Seshu International Student Fellowship

2020

Show more ↓

Service

Peer Reviewer

Journal Reviewer

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD)
IEEE Transactions on Circuits and Systems Part II (TCAS-II)
IEEE Embedded Systems Letters (ESL)
ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Conference Technical Program Committee / Reviewer

2025 International Symposium on Computer Architecture (ISCA) Workshop on Machine Learning for Computer Architecture and Systems (MLArchSys)
2025 ACM/IEEE Design Automation Conference (DAC)
2024 IEEE International Workshop on LLM-Aided Design (LAD)
2023 - 2024 ACM/IEEE International Conference on Computer-Aided Design (ICCAD)
2023 ACM/IEEE Supercomputing Conference (SC)
2023 Great Lakes Symposium on VLSI (GLSVLSI)
2023 Conference on Machine Learning and Systems (MLSys)
2016 - 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA)
2018 - 2022 IEEE International Symposium On Field-Programmable Custom Computing Machines (FCCM)

Session Chair / Competition Judge

Technical Session Chair: 2024 ICCAD: Architectural Mapping
Technical Session Chair: 2024 ICCAD: Applications and Architectures
Technical Session Chair: 2023 ICCAD: Sustainable AI Training at the Large and Tiny Scales
Competition Judge: 2023 ACM Student Research Competition at ICCAD
Competition Judge: 2023 Ph.D. Forum at FCCM
Competition Judge: 2022 ACM Student Research Competition at ICCAD

Teaching

Guest Lecturer: ELEC 515: Embedded Machine Learning: FPGA for AI Inference (Rice University, Fall 2020)

Topic: Hardware Accelerator Design and Development

Guest Lecturer: IEEE Council on Electronic Design Automation (CEDA) Lecture Series

Topic: FPGA-based Accelerator Design for AI Inference

Head Teaching Assistant: ECE 498 ICC: IoT and Cognitive Computing (UIUC, Spring 2020)
Teaching Assistant: ECE 498 ICC: IoT and Cognitive Computing (UIUC, Spring 2019)

About Me

News

Publications

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

Awards & Fellowships

Google Gold Perfy Award

Google Silver Perfy Award

Google Ph.D. Fellowship

ACM Student Research Competition Winner Award (ICCAD)

Mavis Future Faculty Fellowship (MF3)

Rambus Computer Engineering Fellowship

Sundaram Seshu International Student Fellowship

1st Place Winner Award (GPU Track) DAC System Design Contest

1st Place Winner Award (FPGA Track) DAC System Design Contest

Best Poster Award (ICML ODML-CDNNR Workshop)

Best Paper Award (ICCAD)

Best Poster Award (IBM AI Horizons Network)

3rd Place Winner Award (FPGA Track) DAC System Design Contest

Best Graduate Thesis Award

China National Scholarship

1st Class Graduate Scholarship (Top 10%)

Best Undergraduate Thesis Award

Service

Peer Reviewer

Journal Reviewer

Conference Technical Program Committee / Reviewer

Session Chair / Competition Judge

Teaching