000 03925nam a2200193 4500
008 230223s2021 |||||||| |||| 00| 0 eng d
022 _lphd
041 _aeng
082 _a006.454378242
_bJAV
100 _eAU
_aJaved, Muhammad
_9693004
245 _aAutomatic Speech Segmentation Through Forward and Inverse Characteristics of The Vocal Tract (PhD Thesis)
260 _aKarachi,
_bNED University of Engineering and Technology Department of Electrical Engineering
_c2021
300 _aXI, 129 p.
_b: ill
520 _aAbstract : Speech segmentation refers to the splitting of the continuous speech signal into syllable, word and phoneme segments. Time-aligned segmented and labeled speech at phoneme level is used to develop a large corpus. Precise time-aligned corpus at phoneme level finds its significance in linguistic research and building automatic speech recognition (ASR), speaker verification and speech synthesis systems. Manual segmentation is considered to be more accurate than automatic segmentation, because humans are better in locating the boundaries of distinct events based on inherent temporal and spectral cues in the speech signal. But for the development of large corpus, !t is very time-consuming, painstaking, laborious, costly and human-resource intensive. For this reason, currently the use of automatic means of phonetic time-alignment is inevitable. The accuracy of automatic phonetic alignment method is based on the hypothesis that it can reach a level close to human performance by utilizing the evidence in the way human experts do. This thesis presents an unsupervised or implicit method of automatic speech segmentation to identify the phoneme boundaries in utterances. A framework is designed by employing short-time spcctral and temporal speech characteristics and based on Cosine distance scores (CDS). This framework informs the user about the performance of various speech processing techniques, thus simplifying the selection process of the appropriate technique from among the many available options. Furthermore, a systematic step-by-step study of the phonetic spectral characteristic of the TIMIT dataset [1] is made. Based on the spectral characteristics of each phoneme, they are grouped into 0-2 kHz, 0-4 kHz and 4-8 kHz band of frequencies. In order to model the vocal tract dynamic :n these three distinct frequency ranges, 'Selective Linear Predictive' (SLP) has been employed. It helps us in extracting spectral information persist in these three frequency ranges individually. In this thesis, we use SLP-based forward and inverse characteristics of the vocal tract in developing the automatic segmentation technique at phoneme level. Based on the developed framework, a nove! feature is formed. The proposed feature combines the best scores of each frequency range and is named as "Extended Forward and Inverse Characteristic of Vocal Tract using selective Linear predictive analysis (EFICV)". The performance of the EFICV is evaluated with the manually marked phoneme boundaries of the TIMIT dataset. The accuracy of the proposed EFICV system is found to be 61.13 %, 81.4 %, 85.82% and 88.85 % in 10 msec, 20 msec, 30 msec and 105 msec respectively, in agreement with TIMIT boundaries. The error rate is found to be 17.96 %. The percent improvement in boundaries with respect to the state-of-the-art in 5 msec, 10 msee, 15 msec, 20 msec, 25 msec, and 30 msec accuracy range is 4.27 %, 14.04 %, 12.59 %, 9.6 % and 6.60 % respectively. Whereas the error rate in 30 msec is reduced to 30.7%. The results show that the developed EFICV technique outperforms the current state-of-the-art schemes in terms of both accuracy and error rate.
650 0 _9674020
_aAutomtic Speech Recognition Thesis
650 0 _9882807
_aImplicit Segmentation Thesis
856 _uhttps://eaklibrary.neduet.edu.pk:8443/catalog/bk/books/toc/*.pdf
942 _2ddc
_n0
_cPHD
999 _c701343
_d701343