## Banca de QUALIFICAÇÃO: CASSIO TRINDADE BATISTA

Uma banca de QUALIFICAÇÃO de DOUTORADO foi cadastrada pelo programa.

DISCENTE: CASSIO TRINDADE BATISTA

DATA: 18/02/2020

HORA: 09:00

LOCAL: Sala FC-02 ICEN/UFPA

TÍTULO:

Towards Utterance Copy via Deep Neuroevolution: From Supervised to Unsupervised Training of DNN Weights Using Genetic Algorithms

PALAVRAS-CHAVES:

Utterance copy. Neuroevolution. Deep learning. Genetic algorithm. Speech synthesis.

PÁGINAS: 60

GRANDE ÁREA: Ciências Exatas e da Terra

ÁREA: Ciência da Computação

SUBÁREA: Sistemas de Computação

RESUMO:

Utterance copy, also known as speech imitation, is the task of estimating the parameters of an input, target speech signal in order to artificially reconstruct another signal with the same prop- erties at the output. This can be considered a difficult inverse problem, since the input-output relationship is often non-linear, apart from having several parameters to be estimated and ad- justed. This work describes the development of an application to learn how to estimate the input parameters of a formant-based speech synthesizer called Klatt. Formant-based synthesizers do not reach state-of-art performance for text-to-speech (TTS) applications, but are an important tool for linguists studies due to the high interpretability of its input parameters. Some success has been already achieved by applying a supervised-trained long short-term memory (LSTM) neural network to learn how to estimate Klatt’s input parameters. However, when compared to a baseline software called WinSnoori with respect to some similarity measures such as PESQ, RMSE, SNR and LSD, results from LSTMs trained on synthetic data only, proved efficient in estimating the parameters of synthetic voices only — as expected. The need for an architecture that allows the use of instances of natural voices right at the training stage was then evident. Thus, we switched from a supervised to an unsupervised approach, and decided to investigate whether applying traditional search optimization methods for updating the weights of a multi- variate regressor may improve results. In this work, genetic algorithms (GA) are being studied as an alternative to the conventional training algorithms for deep neural networks (DNN) such as back-propagation (BP) and stochastic gradient descent (SGD), or even high-level optimizers such as Adam. This combination between GAs and DNNs is known as deep neuroevolution (DNE). Unlike BP-SGD, which depends on the Klatt parameters from synthetic speech as prior knowledge in order to compute the error gradients, DNE can take advantage of the unavailabil- ity of such ground-truth labels in natural speech data. Finally, we conjecture that using a GA instead of a gradient-based method for training a DNN, on the other hand, would result in better scores when dealing with natural voices, in spite of the computational resources required for it.

MEMBROS DA BANCA:

Presidente - 2659210 - NELSON CRUZ SAMPAIO NETO

Interno - 1176325 - ALDEBARO BARRETO DA ROCHA KLAUTAU JUNIOR

Interno - 2323064 - FILIPE DE OLIVEIRA SARAIVA

Externo ao Programa - 3132807 - REGINALDO CORDEIRO DOS SANTOS FILHO

Externo à Instituição - HELENA DE MEDEIROS CASELI