TY - GEN
T1 - Syllable nucleus durations estimation using linear regression based ensemble model
AU - Lu, Jingli
AU - Wang, Ruili
AU - De Silva, Liyanage C.
AU - Gao, Yang
PY - 2009
Y1 - 2009
N2 - Unlike conventional automatic continuous speech segmentation models that deal with each boundary time-mark individually, in this paper, we propose an interval-data-based Linear Regression Model for syllable nucleus Durations Estimation (LRM-DE), which treats syllable boundary time-marks in pairs. This characteristic of LRM-DE makes it more suitable for estimating syllable durations for English sentences, which can be used for sentence stress detection. LRM-DE combines the outcomes of multiple base automatic speech segmentation machines (ASMs) to generate final boundary time-marks that miminize the average distance of the predicted and reference boundary-pairs of syllable nuclei. Experimental results show that on TIMIT dataset, LRM-DE reduces the average difference between the predicted syllable nucleus durations and their reference ones from 13.64ms (the best result of a single ASM) to 11.81ms. Also, LRM-DE improves the syllable nucleus segmentation accuracy from 81.59% to 83.98% within a tolerance of 20ms.
AB - Unlike conventional automatic continuous speech segmentation models that deal with each boundary time-mark individually, in this paper, we propose an interval-data-based Linear Regression Model for syllable nucleus Durations Estimation (LRM-DE), which treats syllable boundary time-marks in pairs. This characteristic of LRM-DE makes it more suitable for estimating syllable durations for English sentences, which can be used for sentence stress detection. LRM-DE combines the outcomes of multiple base automatic speech segmentation machines (ASMs) to generate final boundary time-marks that miminize the average distance of the predicted and reference boundary-pairs of syllable nuclei. Experimental results show that on TIMIT dataset, LRM-DE reduces the average difference between the predicted syllable nucleus durations and their reference ones from 13.64ms (the best result of a single ASM) to 11.81ms. Also, LRM-DE improves the syllable nucleus segmentation accuracy from 81.59% to 83.98% within a tolerance of 20ms.
KW - Automatic speech segmentation
KW - Ensemble model
KW - Multiple linear regression
UR - http://www.scopus.com/inward/record.url?scp=70349209384&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2009.4960717
DO - 10.1109/ICASSP.2009.4960717
M3 - Conference contribution
AN - SCOPUS:70349209384
SN - 9781424423545
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 4849
EP - 4852
BT - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings, ICASSP 2009
T2 - 2009 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2009
Y2 - 19 April 2009 through 24 April 2009
ER -