MATHEMATICS FOUNDATION AND MFCCS – AUDIO FEATURE EXTRACTION

Thế Cường Nguyễn , Thanh Vi Nguyễn , Ngọc Hải Trương

Main Article Content

Abstract

The most significant types of information that people use daily are image and language (sound and text). Images and language are the most crucial data to employ as raw materials when developing real-world applications in the field of artificial intelligence (AI). On these kinds of data, machine learning (ML) algorithms will be trained. However, feature extraction is the process of converting an image, text, or audio file into a matrix or vector for use in machine learning algorithms. Processing visual or linguistic input can be done in a variety of ways. Because audio data are not presented as visuals or text, most researchers find them to be unclear. Little attention has been paid to the mathematical foundations of audio data processing. The mathematical foundation and the MFCCs (Mel-Frequency Cepstral Coefficients) approach to extract the features of the audio data are discussed in this article.

 

Article Details

References

Ahmed Sajjad, Ayesha Shirazi, Nagma Tabassum, Mohd Saquib, & Naushad Sheikh (2017). Speaker Identification and Verification Using MFCCs and SVM. International Research Jounal of Engineering and Technology (IRJET), 4(2).
Archek Praveen Kumar, Ratnadeep Roy, Sanyog Rawat, & Prathibha Sudhakaran (2017). Continuous Telugu Speech Recognition throught Combined Feature Extraction by MFCCs and DWPD Using HMM based DNN Techniques. International Journal of Pure and Applied Mathematics, 114(11).
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Bodke, R. D., & Satone, M. P. (2018). A review on Speech Feature Techniques and Classification Techniques. International Journal of Trend in Scientific research and Development, 2(4).
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Kollias, S., Fellenz, W., Taylor, J. (2001). Emotion recognition in humancomputer interaction. in IEEE Signal Process.
Gulbakshee J. Dharmale, & Dipti D. Patil (2019). Evaluation of Phonetic System for Speech Revognition on Smartphone. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(10).
Lyons, R. G. (2022). Understanding digital signal processing's frequency domain. RF Design magazine.
Manas Jain, Shruthi Narayan, Pratibha Balaji, Bharath K. P., Abhijit Bhowmick, Karthik, R, & Rajesh Kumar Muthu (2020). Speech Emotion Recognition using Support Vector Machine. Electrical Engineering and Systems Science, Audio and Speech Processing.
Md. Sahidullah, & Goutam Saha (2012). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication, 54(4), 543-565.
Mohammed Hussein, Alkassab, M., Mohammed, H., Abdulaziz Hind, & Jagmagji Ahmed (2018). Speech Recognition System with Different Methods of Feature Extraction. International Journal of Innovative Research in Computer and Communication Engineering, 6(3).
Mohammad Hasan Rahmani, Farshad Almasganj, & Seyyed Ali Seyyedsalehi (2018). Audio-visual feature fusion via deep neural networks for automatic speech recognition. Digital Signal Processing.
Philip Jackson, & Sanaul Haq (12 April 2015). Justdreamweaver.com. Retrieved from https://www.kaggle.com/datasets/ejlok1/surrey-audiovisual-expressed-emotion-savee?resource=download
Sinith, M. S., Aswathi, E., Deepa, T. M., Shameema, C. P., & Shiny, R. (2015). Emotion Recognition from Audio Signals using Support Vector Machine. in IEEE Recent Advances in Intelligent Computational Systems, Trivandrum.