TY - JOUR
T1 - AOS: an automated overclocking system for high performance CNN accelerator through timing delay measurement on FPGA
AU - Jiang, Weixiong
AU - Yu, Heng
AU - Chen, Fupeng
AU - Ha, Yajun
PY - 2023/1/10
Y1 - 2023/1/10
N2 - With the inherent algorithmic error resilience of conventional neural networks (CNN) and the worst-case design methodologies of current electronic design automation tools, overclocking-based timing speculation is a promising technique to improve the performance of CNN accelerators on FPGA by removing unnecessary timing margins. To avoid potential timing errors, timing delay measurement should be used during overclocking. However, current approaches are not yet good at measuring paths with more intense variability factors such as jitter, and lack an automated process for testing circuit delays. In this paper, we first propose 2-dimension multi-frame fusion to deal with the sampling jitter, then present a timing delay measurement-based automatic overclocking system (AOS) running on heterogeneous FPGA for high-performance CNN accelerators. On the FPGA side, AOS is composed of timing delay monitors (TDM) that can measure all types of timing paths, a TDM controller that converts the sampled values of TDMs into timing delay in terms of the ratio of path delay to the clock period. On the CPU side, AOS converts the path delay from clock period ratio to absolute delay value and decides the frequency of the accelerator in the next iteration. We demonstrate AOS with a SkyNet accelerator on the Xilinx ZCU104 board and achieve 657FPS at 436MHz without accuracy degradation, which is 1.41× performance compared to the baseline.
AB - With the inherent algorithmic error resilience of conventional neural networks (CNN) and the worst-case design methodologies of current electronic design automation tools, overclocking-based timing speculation is a promising technique to improve the performance of CNN accelerators on FPGA by removing unnecessary timing margins. To avoid potential timing errors, timing delay measurement should be used during overclocking. However, current approaches are not yet good at measuring paths with more intense variability factors such as jitter, and lack an automated process for testing circuit delays. In this paper, we first propose 2-dimension multi-frame fusion to deal with the sampling jitter, then present a timing delay measurement-based automatic overclocking system (AOS) running on heterogeneous FPGA for high-performance CNN accelerators. On the FPGA side, AOS is composed of timing delay monitors (TDM) that can measure all types of timing paths, a TDM controller that converts the sampled values of TDMs into timing delay in terms of the ratio of path delay to the clock period. On the CPU side, AOS converts the path delay from clock period ratio to absolute delay value and decides the frequency of the accelerator in the next iteration. We demonstrate AOS with a SkyNet accelerator on the Xilinx ZCU104 board and achieve 657FPS at 436MHz without accuracy degradation, which is 1.41× performance compared to the baseline.
KW - Delays
KW - Clocks
KW - Field programmable gate arrays
KW - Monitoring
KW - Time division multiplexing
KW - Registers
KW - Measurement uncertainty
UR - https://doi.org/10.1109/TCAD.2023.3235803
U2 - 10.1109/TCAD.2023.3235803
DO - 10.1109/TCAD.2023.3235803
M3 - Article
SN - 0278-0070
JO - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
JF - IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
ER -