[ Papers ]

Journal of the Korean Society of Manufacturing Technology Engineers - Vol. 30, No. 5, pp.366-371

ISSN: 2508-5107 (Online)

Print publication date 15 Oct 2021

Received 24 Aug 2021 Revised 06 Oct 2021 Accepted 08 Oct 2021

DOI: https://doi.org/10.7735/ksmte.2021.30.5.366

STFT를 사용한 음성 신호 기반의 감정 분류 연구

신영하^a ; 송 규^a ; 윤찬녕^a ; 조우진^a ; 박형주^a ; 장동영^a^{, *}

A Classification Study on the Emotional Recognition Based on the Speech Signals Using STFT

Young-ha Shin^a ; Kyu Song^a ; Chan-nyeong Yun^a ; Woo-jin Cho^a ; Hyung-joo Park^a ; Dong-young Jang^a^{, *}

aKorea Electronics-Machinery Convergence Technology Institute

Correspondence to: ^*Tel.: +82-02-970-7094 E-mail address: dyjang@kemcti.re.kr (Dong-young Jang).

Abstract

The human voice has various characteristics, such as, loudness, pitch, speaking rate, etc. This research presents the classification method of human emotions using voice signals transformed using the short-time Fourier transform (STFT). The STFT can know the frequency component at a desired time point which can be verified using three criteria. Using the 1st criteria, that is, the frequency of the maximum sound intensity (MSI), the emotions can be classified into two groups normal/angry and happy. It is impossible to distinguish between the emotions using the 2^nd criteria, which is, the dwell time of the MSI. Using the 3^rd criteria, that is, the onset of the MSI, the two groups normal, and angry/happy are identified. Therefore, the 1^st and 3^rd criteria can be used to classify three emotions. These results can provide valuable insight for future research on the classification of human emotions.

Keywords:

Emotional speech, Speech recognition, Emotion recognition, Speech signals

Acknowledgments

이 연구는 산업통상자원부의 재원으로 한국산업기술진흥원의 지원을 받아 수행한 연구 과제입니다(No. S2640869).

References

Park, C. H., Sim, J. Y., Lee, D. W., Sim, K. B., 2001, Analyzing the Element of Emotion Recognition from Speech, Proc. Korean Inst. Intell. Syst. Conf., 199-202.
Kim, J. M., Kwon, C. H., 2014, Qualitative Classification of Voice Quality of Normal Speech and Derivation of its Correlation with Speech Features, Phonetics and Speech Sciences, 6:1 71-76. [https://doi.org/10.13064/KSSS.2014.6.1.071]
Lee, J. I., Kang, H. G., 2013, On the Importance of Tonal Features for Speech Emotion Recognition, J. Broadcast Eng., 18:5 713-721. [https://doi.org/10.5909/JBE.2013.18.5.713]
Kim, Y. G., Bae, Y. C., 2000, Design of Emotion Recognition Model Using Fuzzy Logic, Proc. Korean Inst. Intell. Syst. Conf.,, 268-282.
Moriyama, T., Oazwa, S., 1999, Emotion Recognition and Synthesis System on Speech, Proc. IEEE Intl. Conference on Multimedia Computing and Systems, 840-844.
Kim, J. H., Lee, S. P., 2021, Multi-modal Emotion Recognition using Speech Features and Text Embedding, Trans. Korean. Inst. Elect. Eng., 70:1 108-113. [https://doi.org/10.5370/KIEE.2021.70.1.108]
Ekman, P., Friesen, W. V., 1982, Emotion in the Human Face, 2^nd Ed., Cambridge University Press, New York, USA.
Plutchik, R., 2003, Emotions and Life: Perspectives from Psychology, Biology, and Evolution, APA Books, USA.
Timothy, J. L., 2019, viewed 10 September 2019, Big Feels and How to Talk About Them, <https://www.healthline.com/health/list-of-emotions, >.
Kim, M. S., Moon. J. S., 2019, Speaker Verification Model Using Short-Time Fourier Transform and Recurrent Neural Network, Journal of The Korea Institute of Information Security & Cryptology, 29:6 1393-1401. [https://doi.org/10.13089/JKIISC.2019.29.6.1393]
Cha, Y. T., Lee, Y. H., Choi, S. J., 2020, Simulation of EI Shaping using STFT for Manipulator Vibration Reduction of Special-purpose Equipment Responding Disaster, Proc. Korean Soc. Manuf. Technol. Eng. Autumn Conf., 153-153.
Ingale, R., 2014, Harmonic Analysis Using FFT and STFT, International Journal of Signal Processing, Image Processing and Pattern Recognition, 7:4 345-362. [https://doi.org/10.14257/ijsip.2014.7.4.33]
Berglund, B., Lindvall., T., Schwela, D. H., 1995, viewed 13 October 2021, Guidelines for Community Noise, World Health Organization <https://www.who.int/docstore/peh/noise/Comnoise-1.pdf>.