CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition

Zhihan Wang, Feng Hou, Ruili Wang

Research output: Journal PublicationConference articlepeer-review

Abstract

In this paper, we propose a novel Continual Learning approach, which is Randomly Layer-wise Tuning (CLRL-Tuning) of a pre-trained Automatic Speech Recognition (ASR) model. CLRL-Tuning tackles the randomness of subsequent datasets by updating the parameters of randomly selected encoder layers of the pre-trained model (such as wav2vec 2.0) for every training epoch. CLRL-Tuning is different from the previous approaches in that it neither uses previous datasets, nor expands/runs previous models. Furthermore, we perform experiments to evaluate our approach compared with four strong baselines, including Knowledge Distillation and Gradient Episodic Memory. Our approach achieves significant improvements over the baselines in average word error rate (WER) for the wav2vec 2.0 model. Additionally, we implement ablation studies for our approach by tuning one, three, six and full encoder layers of the model, and experimental results show only tuning one encoder layer of the model at each training epoch is the most effective way to mitigate catastrophic forgetting.

Original languageEnglish
Pages (from-to)1279-1283
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
Publication statusPublished - 2023
Externally publishedYes
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • automatic speech recognition
  • continual learning
  • fine-tuning
  • lifelong learning
  • partial layers tuning
  • pre-trained model
  • self-adaptation

ASJC Scopus subject areas

  • Language and Linguistics
  • Human-Computer Interaction
  • Signal Processing
  • Software
  • Modelling and Simulation

Fingerprint

Dive into the research topics of 'CLRL-Tuning: A Novel Continual Learning Approach for Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this