[This article belongs to Volume - 38, Issue - 05]

LASSO Based Acoustic modelling for Tamil text to Speech Synthesis

Speech synthesis is the field of computer science that focuses on designing computer systems that produce written text. It is possible for a computer to convert written text into voice by using a telephone or microphone. The process of generating synthesized speech is, by definition, a speech synthesis process. In this research, a novel Tamil Text -to-Speech based deep learning approach has been proposed. Initially, the features from the text are extracted using Principal Component Analysis (PCA) and the features from speech signals are extracted using Mel Frequency Cepstral Coefficient (MFCC). These feature sets are fused by least absolute shrinkage and selection operator (LASSO) and the Convolutional Deep Belief Network (CDBN) is used to train the extracted features. Real-time data will be used in the suggested method, which is implemented in Python. The performance of the proposed method is evaluated in terms of False Positive rate, precision, accuracy, specificity, F1 score, and recall. The outcome of the proposed technique yields best and accurate results compared with the existing deep neural networks. The proposed method improves accuracy by 10.25% over the existing HMM method.