Deep Learning for Noise Robust Distant Speech Recognition (PhD Thesis)

Khan, Danish ur Rehman

Deep Learning for Noise Robust Distant Speech Recognition (PhD Thesis) - Karachi : NED University of Engineering and Technology Department of Electronic Engineering, 2023 - xxi, 22-159 p. : ill

Includes Bibliographical References

Abstract
This thesis covers an important knowledge gap concerning Distant and noisy Robust Speech recognition. In addition, the study aims to research and develop a procedure to improve the speech recognition with distant and noisy scenarios. The information can be extracted from both clear and inarticulate speech signals by taking help of speech signal analysis processing and for signal exploration machine learning algorithms give vigorous analytical tools.
This dissertation comprises of three parts, the first explores the best feature to do the analysis. The features are extracted then threshold is applied for every feature to develop an algorithm. The second part comprises of implementing an algorithm which serves as a classifier for Distant and noisy speech. To analyze the developed algorithm efficiency, third part comprises of comparing it with conventional algorithms.
In this research we extracted, analyzed 14 signal features of TensorFlow speech commands dataset without noise, and mean and elevated one which includes 14 features MFCCs, RMS, CENS, "Mel-scaled spectrogram", "Spectral centroid", "Tonal centroid"(tonnetz), Spectral contrast, poly features, STFT, Chroma STFT, ZPR, LPCC, roll-off frequency, Rasta-PLP and Pitch of speech-by-speech and analysis processing, selecting the highly associated aspect of distant and noisy speech then we transformed feature dataset for machine learning model implementation we applied Deep and Machine learning (Convolutional neural network and LSTM, "Random Forest", KNN, SVM, Voting Model) with comparison of all features on Simple , Noisy, very noisy dataset as well as ensemble model and comparison of all models with ensemble.
We applied machine and deep ensemble model, compared all models with ensemble and did recognition also compared the features and result for speech both with and without noise and distance respectively.
The major findings described in the thesis indicate that:
1. MFCCs, "Mel-scaled spectrogram", "Poly feature" and ZCR: 4 features introduced in this study. These features have significant effects on the classification accuracy of the algorithm.
2. The developed Robust Ensemble algorithm serves as the classifier for distant and noisy speech recognition.
3. The implemented algorithm shows 97% performance in distant and noisy speech recognition.
4. Machine and deep ensemble model applied, compared with all models with ensemble and did recognition also compared .. It is shown that improved classification accuracy of almost 97% in most of the distant and noisy speech recognition with minor tradeoff.
5. Correlation method provided the reduced feature set which shows the improved performance.
6. Further improvement can be achieved by using Robust Ensemble deep learning algorithms on large data set





Deep Learning Thesis
Distant Speech Recognition Thesis
LSTM Thesis

006.454378242 / KHA