Notícias

Banca de DEFESA: ANA LARISSA DA SILVA DIAS

Uma banca de DEFESA de MESTRADO foi cadastrada pelo programa.
DISCENTE: ANA LARISSA DA SILVA DIAS
DATA: 09/07/2021
HORA: 09:00
LOCAL: EVENTO REMOTO POR VIDEOCONFERÊNCIA
TÍTULO:

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools


PALAVRAS-CHAVES:

Phonetic alignment; Acoustic modeling; Kaldi; Brazilian Portuguese; Deep Neural Networks


PÁGINAS: 60
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
SUBÁREA: Sistemas de Computação
RESUMO:

Phonetic analysis of speech, in general, requires the alignment of audio samples to its phonetic transcription. This task could be performed manually for a couple of files, but as the corpus grows large it becomes unfeasibly time-consuming, emphasizing the need for computational tools that perform such speech- phonemes alignment automatically. Currently, there are several open-source forced alignment tools available. The majority of them is build upon HTK, a widespread speech recognition toolkit, but the most recent aligners use Kaldi, a toolkit that has been the state of the art for open-source speech recognition providing deep neural network (DNN) framework to acoustic modeling among others advantages over HTK. Although there are several free forced aligners available, they often provide acoustic models only for languages that have more available free speech resources, such as English. Therefore, the Falabrasil, a research group on speech processing and natural language for Brazilian Portuguese (BP) from Federal University of Pará, works hard in order to mitigate the gap of resources for BP providing phonetic dictionaries, language models, acoustic models, voice and text corpora, and among others, a HTK-based automatic phonetic alignment tool for BP called UFPAlign. Hence, due to the scarce availability of tools for BP and to update the FalaBrasil’s HTK-based aligner, we describe the evolution process towards creating a free phonetic alignment tool for BP using Kaldi toolkit. The updated UFPAlign works under Linux environments and provides a user-friendly graphical interface as a plugin to Praat, a free software package for speech analysis in phonetics. The contributions of this work are then twofold: developing resources to perform forced alignment in BP through UFPAlign, including five acoustic models build via Kaldi, under open licenses; and bringing forth a comparison to other two phonetic aligners that provide resources for BP, the outdated HTK-based UFPAlign and Montreal Forced Aligner (MFA), the latter being also Kaldi-based. Evaluation took place in terms of the phone boundary metric over a dataset of 200 hand- aligned utterances, and results show that Kaldi-based aligners perform better overall, and that the models from the updated UFPAlign are more accurate than MFA’s. Furthermore, complex deep-learning-based approaches did not seem to improve performance compared to simpler models.


MEMBROS DA BANCA:
Presidente - 2659210 - NELSON CRUZ SAMPAIO NETO
Interno - 2378314 - JEFFERSON MAGALHAES DE MORAIS
Externo ao Programa - 1269902 - RENATO HIDAKA TORRES
Notícia cadastrada em: 30/06/2021 09:47
SIGAA | Centro de Tecnologia da Informação e Comunicação (CTIC) - (91)3201-7793 | Copyright © 2006-2024 - UFPA - morango.ufpa.br.morango1