TY - GEN
T1 - TAIT
T2 - 58th ACM/IEEE Design Automation Conference, DAC 2021
AU - Jiang, Weixiong
AU - Yu, Heng
AU - Liu, Xinzhe
AU - Sun, Hao
AU - Li, Rui
AU - Ha, Yajun
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/12/5
Y1 - 2021/12/5
N2 - Both parameter quantization and depthwise convolution are essential measures to provide high-accuracy, lightweight, and resource-friendly solutions when deploying deep neural networks (DNNs) onto edge-AI devices. However, combining the two methodologies may lead to adverse effects: It either suffers from significant accuracy loss or long finetuning time. Besides, contemporary quantization methods are only selectively applied to weight and activation values but not bias and scaling factor values, making them less practical for ASIC/FPGA accelerators. To solve these issues, we propose a novel quantization framework that is effectively optimized for depthwise convolution networks. We discover that the uniformity of the value range within a tensor can serve as a predictor for the tensor's quantization error. Under the guidance of this predictor, we develop a mechanism called Tunable Activation Imbalance Transfer (TAIT), which tunes the value range uniformity between an activated feature map and its latter weights. Moreover, TAIT fully supports full-integer quantization. We demonstrate TAIT on SkyNet and deploy it on FPGA. Compared to the state-of-the-art, our quantization framework and system design achieve 2.2%+ IoU, 2.4 × speed, and 1.8 × energy efficiency improvements, without any requirement of finetuning.
AB - Both parameter quantization and depthwise convolution are essential measures to provide high-accuracy, lightweight, and resource-friendly solutions when deploying deep neural networks (DNNs) onto edge-AI devices. However, combining the two methodologies may lead to adverse effects: It either suffers from significant accuracy loss or long finetuning time. Besides, contemporary quantization methods are only selectively applied to weight and activation values but not bias and scaling factor values, making them less practical for ASIC/FPGA accelerators. To solve these issues, we propose a novel quantization framework that is effectively optimized for depthwise convolution networks. We discover that the uniformity of the value range within a tensor can serve as a predictor for the tensor's quantization error. Under the guidance of this predictor, we develop a mechanism called Tunable Activation Imbalance Transfer (TAIT), which tunes the value range uniformity between an activated feature map and its latter weights. Moreover, TAIT fully supports full-integer quantization. We demonstrate TAIT on SkyNet and deploy it on FPGA. Compared to the state-of-the-art, our quantization framework and system design achieve 2.2%+ IoU, 2.4 × speed, and 1.8 × energy efficiency improvements, without any requirement of finetuning.
UR - http://www.scopus.com/inward/record.url?scp=85114648944&partnerID=8YFLogxK
U2 - 10.1109/DAC18074.2021.9586109
DO - 10.1109/DAC18074.2021.9586109
M3 - Conference contribution
AN - SCOPUS:85114648944
T3 - Proceedings - Design Automation Conference
SP - 1027
EP - 1032
BT - 2021 58th ACM/IEEE Design Automation Conference, DAC 2021
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 5 December 2021 through 9 December 2021
ER -