Normal view MARC view ISBD view

Automatic Speech Segmentation Through Forward and Inverse Characteristics of The Vocal Tract (PhD Thesis) (Record no. 701343)

MARC details
000 -LEADER
fixed length control field	03925nam a2200193 4500
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	230223s2021 \|\|\|\|\|\|\|\| \|\|\|\| 00\| 0 eng d
022 ## - INTERNATIONAL STANDARD SERIAL NUMBER
ISSN-L	phd
041 ## - LANGUAGE CODE
Language code of text/sound track or separate title	English
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	006.454378242
Item number	JAV
100 ## - MAIN ENTRY--PERSONAL NAME
Relator term	author
Personal name	Javed, Muhammad
9 (RLIN)	693004
245 ## - TITLE STATEMENT
Title	Automatic Speech Segmentation Through Forward and Inverse Characteristics of The Vocal Tract (PhD Thesis)
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc.	Karachi,
Name of publisher, distributor, etc.	NED University of Engineering and Technology Department of Electrical Engineering
Date of publication, distribution, etc.	2021
300 ## - PHYSICAL DESCRIPTION
Extent	XI, 129 p.
Other physical details	: ill
520 ## - SUMMARY, ETC.
Summary, etc.	Abstract :<br/> <br/>Speech segmentation refers to the splitting of the continuous speech signal into syllable, word and phoneme segments. Time-aligned segmented and labeled speech at phoneme level is used to develop a large corpus. Precise time-aligned corpus at phoneme level finds its significance in linguistic research and building automatic speech recognition (ASR), speaker verification and speech synthesis systems. Manual segmentation is considered to be more accurate than automatic segmentation, because humans are better in locating the boundaries of distinct events based on inherent temporal and spectral cues in the speech signal. But for the development of large corpus, !t is very time-consuming, painstaking, laborious, costly and human-resource intensive. For this reason, currently the use of automatic means of phonetic time-alignment is inevitable. The accuracy of automatic phonetic alignment method is based on the hypothesis that it can reach a level close to human performance by utilizing the evidence in the way human experts do. <br/>This thesis presents an unsupervised or implicit method of automatic speech segmentation to identify the phoneme boundaries in utterances. A framework is designed by employing short-time spcctral and temporal speech characteristics and based on Cosine distance scores (CDS). This framework informs the user about the performance of various speech processing techniques, thus simplifying the selection process of the appropriate technique from among the many available options. Furthermore, a systematic step-by-step study of the phonetic spectral characteristic of the TIMIT dataset [1] is made. Based on the spectral characteristics of each phoneme, they are grouped into 0-2 kHz, 0-4 kHz and 4-8 kHz band of frequencies. In order to model the vocal tract dynamic :n these three distinct frequency ranges, 'Selective Linear Predictive' (SLP) has been employed. It helps us in extracting spectral information persist in these three frequency ranges individually. <br/><br/>In this thesis, we use SLP-based forward and inverse characteristics of the vocal tract in developing the automatic segmentation technique at phoneme level. Based on the developed framework, a nove! feature is formed. The proposed feature combines the best scores of each frequency range and is named as "Extended Forward and Inverse Characteristic of Vocal Tract using selective Linear predictive analysis (EFICV)". <br/><br/>The performance of the EFICV is evaluated with the manually marked phoneme boundaries of the TIMIT dataset. The accuracy of the proposed EFICV system is found to be 61.13 %, 81.4 %, 85.82% and 88.85 % in 10 msec, 20 msec, 30 msec and 105 msec respectively, in agreement with TIMIT boundaries. The error rate is found to be 17.96 %. The percent improvement in boundaries with respect to the state-of-the-art in 5 msec, 10 msee, 15 msec, 20 msec, 25 msec, and 30 msec accuracy range is 4.27 %, 14.04 %, 12.59 %, 9.6 % and 6.60 % respectively. Whereas the error rate in 30 msec is reduced to 30.7%. The results show that the developed EFICV technique outperforms the current state-of-the-art schemes in terms of both accuracy and error rate. <br/><br/>
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
9 (RLIN)	674020
Topical term or geographic name entry element	Automtic Speech Recognition Thesis
650 #0 - SUBJECT ADDED ENTRY--TOPICAL TERM
9 (RLIN)	882807
Topical term or geographic name entry element	Implicit Segmentation Thesis
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	<a href="https://eaklibrary.neduet.edu.pk:8443/catalog/bk/books/toc/.pdf">https://eaklibrary.neduet.edu.pk:8443/catalog/bk/books/toc/.pdf</a>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme	Dewey Decimal Classification
Suppress in OPAC	No
Koha item type	PHD Thesis

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Physical Form	Damaged status	Not for loan	Home library	Current library	Shelving location	Date acquired	Stock Type	Total Checkouts	Full call number	Barcode	Date last seen	Accession Date	Koha item type
		Dewey Decimal Classification	Text, Hardcover			Government Document Section	Government Document Section	Govt Publication Section	23/02/2023	Donation		006.454378242 JAV	97710	23/02/2023	23/02/2023	Reference Collection
		Dewey Decimal Classification	Text, Hardcover			Government Document Section	Government Document Section	Govt Publication Section	23/02/2023	Donation		006.454378242 JAV	97711	23/02/2023	23/02/2023	Reference Collection

Print
Suggest for purchase
Send to device
Save record
BIBTEX Dublin Core ISBD MARCXML RIS
More searches

Search for this title in:
Other Databases (Google Scholar) Open Library (openlibrary.org) Pakistan Union Catalogue (pastic.gov.pk) Other Libraries (WorldCat)