Automatic Speech Segmentation Through Forward and Inverse Characteristics of The Vocal Tract (PhD Thesis) (Record no. 701343)

MARC details
000 -LEADER
fixed length control field 03925nam a2200193 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 230223s2021 |||||||| |||| 00| 0 eng d
022 ## - INTERNATIONAL STANDARD SERIAL NUMBER
ISSN-L phd
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title English
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 006.454378242
Item number JAV
100 ## - MAIN ENTRY--PERSONAL NAME
Relator term author
Personal name Javed, Muhammad
9 (RLIN) 693004
245 ## - TITLE STATEMENT
Title Automatic Speech Segmentation Through Forward and Inverse Characteristics of The Vocal Tract (PhD Thesis)
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc. Karachi,
Name of publisher, distributor, etc. NED University of Engineering and Technology Department of Electrical Engineering
Date of publication, distribution, etc. 2021
300 ## - PHYSICAL DESCRIPTION
Extent XI, 129 p.
Other physical details : ill
520 ## - SUMMARY, ETC.
Summary, etc. Abstract :<br/> <br/>Speech segmentation refers to the splitting of the continuous speech signal into syllable, word and phoneme segments. Time-aligned segmented and labeled speech at phoneme level is used to develop a large corpus. Precise time-aligned corpus at phoneme level finds its significance in linguistic research and building automatic speech recognition (ASR), speaker verification and speech synthesis systems. Manual segmentation is considered to be more accurate than automatic segmentation, because humans are better in locating the boundaries of distinct events based on inherent temporal and spectral cues in the speech signal. But for the development of large corpus, !t is very time-consuming, painstaking, laborious, costly and human-resource intensive. For this reason, currently the use of automatic means of phonetic time-alignment is inevitable. The accuracy of automatic phonetic alignment method is based on the hypothesis that it can reach a level close to human performance by utilizing the evidence in the way human experts do. <br/>This thesis presents an unsupervised or implicit method of automatic speech segmentation to identify the phoneme boundaries in utterances. A framework is designed by employing short-time spcctral and temporal speech characteristics and based on Cosine distance scores (CDS). This framework informs the user about the performance of various speech processing techniques, thus simplifying the selection process of the appropriate technique from among the many available options. Furthermore, a systematic step-by-step study of the phonetic spectral characteristic of the TIMIT dataset [1] is made. Based on the spectral characteristics of each phoneme, they are grouped into 0-2 kHz, 0-4 kHz and 4-8 kHz band of frequencies. In order to model the vocal tract dynamic :n these three distinct frequency ranges, 'Selective Linear Predictive' (SLP) has been employed. It helps us in extracting spectral information persist in these three frequency ranges individually. <br/><br/>In this thesis, we use SLP-based forward and inverse characteristics of the vocal tract in developing the automatic segmentation technique at phoneme level. Based on the developed framework, a nove! feature is formed. The proposed feature combines the best scores of each frequency range and is named as "Extended Forward and Inverse Characteristic of Vocal Tract using selective Linear predictive analysis (EFICV)". <br/><br/>The performance of the EFICV is evaluated with the manually marked phoneme boundaries of the TIMIT dataset. The accuracy of the proposed EFICV system is found to be 61.13 %, 81.4 %, 85.82% and 88.85 % in 10 msec, 20 msec, 30 msec and 105 msec respectively, in agreement with TIMIT boundaries. The error rate is found to be 17.96 %. The percent improvement in boundaries with respect to the state-of-the-art in 5 msec, 10 msee, 15 msec, 20 msec, 25 msec, and 30 msec accuracy range is 4.27 %, 14.04 %, 12.59 %, 9.6 % and 6.60 % respectively. Whereas the error rate in 30 msec is reduced to 30.7%. The results show that the developed EFICV technique outperforms the current state-of-the-art schemes in terms of both accuracy and error rate. <br/><br/>
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
9 (RLIN) 674020
Topical term or geographic name entry element Automtic Speech Recognition Thesis
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
9 (RLIN) 882807
Topical term or geographic name entry element Implicit Segmentation Thesis
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="https://eaklibrary.neduet.edu.pk:8443/catalog/bk/books/toc/*.pdf">https://eaklibrary.neduet.edu.pk:8443/catalog/bk/books/toc/*.pdf</a>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme Dewey Decimal Classification
Suppress in OPAC No
Koha item type PHD Thesis
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Physical Form Damaged status Not for loan Home library Current library Shelving location Date acquired Stock Type Total Checkouts Full call number Barcode Date last seen Accession Date Koha item type
    Dewey Decimal Classification Text, Hardcover     Government Document Section Government Document Section Govt Publication Section 23/02/2023 Donation   006.454378242 JAV 97710 23/02/2023 23/02/2023 Reference Collection
    Dewey Decimal Classification Text, Hardcover     Government Document Section Government Document Section Govt Publication Section 23/02/2023 Donation   006.454378242 JAV 97711 23/02/2023 23/02/2023 Reference Collection