Notícias

Banca de QUALIFICAÇÃO: ANA LARISSA DA SILVA DIAS

Uma banca de QUALIFICAÇÃO de MESTRADO foi cadastrada pelo programa.
DISCENTE: ANA LARISSA DA SILVA DIAS
DATA: 29/10/2020
HORA: 09:30
LOCAL: Apresentação por Videoconferência
TÍTULO:

Towards a Free, Forced Phonetic Aligner for Brazilian Portuguese Using Kaldi Tools


PALAVRAS-CHAVES:

Phonetic alignment; Acoustic modeling; Kaldi; Brazilian Portuguese; Deep Neural Networks


PÁGINAS: 48
GRANDE ÁREA: Ciências Exatas e da Terra
ÁREA: Ciência da Computação
SUBÁREA: Sistemas de Computação
RESUMO:

Throughout the human being evolution the predominant form of communication was through the speech sound produced and perceived by the vocal-auditory apparatus. The smallest units of the speech sound are called phonemes that distinguish one word from another in a language. Thus, a valuable method for language science research and speech technologies is the phonetic alignment, which is the process of identifying time-boundaries between phonemes in a speech signal. This task could be performed manually for a couple of files, but as the corpus grows large it becomes unfeasibly time-consuming, which emphasizes the need for computational tools that perform the alignment task automatically. Forced phonetic alignment is how is called the process that automates the alignment task. There are some known techniques to perform forced alignment, but the most explored is the approach based on automatic speech recognition (ASR) that uses acoustic models to represent the relationship between an audio signal and the phonemes or other linguistic speech units. Therefore, forced alignment requires only a recorded audio and its respective transcription as input and as result often produce a TextGrid, which is a text file containing the alignment information that can be visualized in speech analysis software such as Praat. Currently, there are several open-source forced alignment tools available. The majority of them is build upon HTK, a widespread speech recognition toolkit, but the most recent aligners use Kaldi toolkit, which currently represents the state of the art for speech recognition providing deep neural network (DNN) framework to acoustic modeling among others advantages over HTK. Although there are several free forced aligners available, they often provide acoustic models only for languages that have more available free speech resources, such as English. Therefore, the Falabrasil, a research group on speech processing and natural language for Brazilian Portuguese (BP) from Federal University of Pará, works hard in order to mitigate the gap of resources for BP providing phonetic dictionaries, language models, acoustic models, voice and text corpora, and among others, a HTK-based automatic phonetic alignment tool for BP, called UFPAlign. In this context, due to the scarce availability of phonetic alignment tools for Brazilian Portuguese (BP) and to update the FalaBrasil’s UFPAlign, this work describes the evolution process towards creating a free phonetic alignment tool for BP using Kaldi toolkit. It was developed as a plugin for Praat, providing a user-friendly graphical interface. Five acoustic models were trained with Kaldi and tested in phonetic alignment, where the evaluation took place in terms of the phone boundary metric. The results show that its performance is similar to some Kaldi-based aligners for other languages, and superior to the outdated UFPAlign based on HTK toolkit


MEMBROS DA BANCA:
Presidente - 2659210 - NELSON CRUZ SAMPAIO NETO
Interno - 2378314 - JEFFERSON MAGALHAES DE MORAIS
Externo ao Programa - 1738946 - FABIOLA PANTOJA OLIVEIRA ARAUJO
Notícia cadastrada em: 15/10/2020 11:35
SIGAA | Centro de Tecnologia da Informação e Comunicação (CTIC) - (91)3201-7793 | Copyright © 2006-2024 - UFPA - bacaba.ufpa.br.bacaba1