Препроцессинг речевых данных с целью обучения нейронной сети

Щенникова Е.В.; Флрина Д.Ю.; Навошин Р.Е.

Preprocessing speech data to train a neural network

Schennikova E.V., Flerina D.Y., Navashin R.E.

Incoming article date: 09.08.2023

This article analyzes data processing problems for training a neural network. The first stage of model training - feature extraction - is discussed in detail. The article discusses the method of mel-frequency cepstral coefficients. The spectrum of the voice signal was plotted. By multiplying the vectors of the signal spectrum and the window function, we found the signal energy that falls into each of the analysis windows. Next, we calculated the mel-frequency cepstral coefficients. The use of a chalk scale helps in audio analysis tasks and is used in training neural networks when working with speech. The use of mel-cepstral coefficients significantly improved the quality of recognition due to the fact that it made it possible to see the most informative coefficients. These coefficients have already been used as input to the neural network. The method with mel-frequency cepstral coefficients made it possible to reduce the input data for training, increase productivity, and improve recognition clarity.

Keywords: machine learning, data preprocessing, audio analysis, mel-cepstral coefficients, feature extraction, voice signal spectrum, Fourier transform, Hann window, discrete cosine transform, short Fourier transform