[ Article ]

Journal of the Korean Society of Manufacturing Technology Engineers - Vol. 33, No. 1, pp.27-34

ISSN: 2508-5107 (Online)

Print publication date 15 Feb 2024

Received 12 Jan 2024 Revised 25 Jan 2024 Accepted 26 Jan 2024

DOI: https://doi.org/10.7735/ksmte.2024.33.1.27

골전도 헤드폰 형태로 추출된 골전도 음성 신호의 딥러닝 활용

송희주^a ; 유선아^a ; 손세강^a ; 장웅기^a ; 황향희^a ; 김현욱^a ; 김병희^a^{, *} ; 이형석^a^{, *}

Application of Deep Learning Models for Bone-Conducted Speech Signals Extracted in the Form of Bone Conduction Headphones

Heeju Song^a ; Seona Yu^a ; Shikang Sun^a ; Woong Ki Jang^a ; Hyang-Hee Hwang^a ; Hyun-Ouk Kim^a ; Byeong-Hee Kim^a^{, *} ; Hyungseok Lee^a^{, *}

aDepartment of Smart Health Science and Technology, Kangwon National University

Correspondence to: ^*Tel.: +82-33-250-6374 E-mail address: kbh@kangwon.ac.kr (Byeong-Hee Kim). Correspondence to: ^*Tel.: +82-33-250-6309 E-mail address: ahl@kangwon.ac.kr (Hyungseok Lee).

Abstract

In this study, we used deep learning to align bone-conducted speech signals with air-conducted speech signals, aiming to replace traditional air conduction microphones in voice-based services capturing surrounding sounds. We fabricated headphones, placing bone conduction microphones on the rami (the branches of a bone in the jaw area), in line with traditional bone conduction headphone configurations. Using LSTM, CNN, and CRNN models, we created databases that aligned bone-conducted speech signals with their air-conducted counterparts and tested them with bone-conducted speech signals captured via our custom-made headphones. The CNN model demonstrated superior performance in accurately distinguishing three English words (“apple,” “hello,” and “pass”), including their voiceless pronunciations. In conclusion, our study shows that deep learning models can effectively use bone-conducted speech signals extracted from the rami for automatic speech recognition (ASR), paving the way for future ASR technology that precisely recognizes only the speaker’s voice.

Keywords:

Bone conduction, Bone-conducted speech signals, Automatic speech recognition, Deep learning

Acknowledgments

본 논문은 2023년도 교육부의 재원으로 한국연구재단의 지원을 받아 수행된 지자체-대학 협력기반 지역혁신 사업의 결과임 (2022RIS-005).

References

De Sousa, K. C., Swanepoel, D. W., Moore, D. R., Myburgh, H. C., Smits, C., 2020, Improving Sensitivity of the Digits-in-noise Test using Antiphasic Stimuli, Ear Hear., 41:2 442-450. [https://doi.org/10.1097/AUD.0000000000000775]
Møller, A. R., 2012, Hearing: Anatomy, Physiology, and Disorders of the Auditory system, Plural Publishing, San Diego, CA.
Freeman, S., Sichel, J. Y., Sohmer, H., 2000, Bone Conduction Experiments in Animals-evidence for a Non-osseous Mechanism, Hear. Res., 146:1-2 72-80. [https://doi.org/10.1016/s0378-5955(00)00098-8]
Zhang, L., Tan, S., Wang, Z., Ren, Y., Wang, Z., Yang, J., 2020, Viblive: A Continuous liveness Detection for Secure Voice User Interface in IoT Environment, ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference, 884-896. [https://doi.org/10.1145/3427228.3427281]
McBride, M., Tran, P., Letowski, T., Patrick, R., 2011, The Effect of Bone Conduction Microphone Locations on Speech Intelligibility and Sound Quality, Appl. Ergon., 42:3 495-502. [https://doi.org/10.1016/j.apergo.2010.09.004]
Zhou, Y., Chen, Y., Ma, Y., Liu, H., 2020, A Real-time Dual-microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor, Sensors, 20:18 5050. [https://doi.org/10.3390/s20185050]
Yu, C., Hung, K. H., Wang, S. S., Tsao, Y., Hung, J. W., 2020, Time-domain Multi-modal Bone/Air Conducted Speech Enhancement, IEEE Signal Process. Lett., 27 1035-1039. [https://doi.org/10.1109/LSP.2020.3000968]
Putta, V. S., Selwin Mich Priyadharson, A., Sundramurthy, V. P., 2022, Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures, Comput. Intell. Neurosci., 2022 4473952. [https://doi.org/10.1155/2022/4473952]
Sak, H., Senior, A. W., Beaufays, F., 2014, Long Short-term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling, Interspeech, 2014 338-342. [https://doi.org/10.21437/Interspeech.2014-80]
Soliman, A., Mohamed, S., Abdelrahman, I. A., 2021, Isolated Word Speech Recognition using Convolutional Neural Network, 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 1-6. [https://doi.org/10.1109/ICCCEEE49695.2021.9429684]
Alashban, A. A., Qamhan, M. A., Meftah, A. H., Alotaibi, Y. A., 2022, Spoken Language Identification System Using Convolutional Recurrent Neural Network, Appl. Sci., 12:18 9181. [https://doi.org/10.3390/app12189181]
Murakami, Y., Kurita, H., 2022, Robust Phonetic Features in Bone-Conducted Speech Communication, ICIC Express Letters, 16:5 513-520. [https://doi.org/10.24507/icicel.16.05.513]
Ayvaz, U., Gürüler, H., Khan, F., Ahmed, N., Whangbo, T., Bobomirzaevich, A., 2022, Automatic Speaker Recognition Using Mel-frequency Cepstral Coefficients through Machine Learning, CMC-Comput. Mat. Contin., 71:3 5511-5521. [https://doi.org/10.32604/cmc.2022.023278]
Defossez, A., Synnaeve, G., Adi, Y., 2020, Real Time Speech Enhancement in the Waveform Domain, Interspeech, 2020 3291-3295. [https://doi.org/10.21437/Interspeech.2020-2409]
Kabal, P., Ramachandran, R. P., 1986, The Computation of Line Spectral Frequencies Using Chebyshev Polynomials, IEEE Transactions on Acoustics, Speech, and Signal Processing, 34:6 1419-1426. [https://doi.org/10.1109/TASSP.1986.1164983]
Lee, D., Kim, G., Han, W., 2017, Analysis of Subway Interior Noise at Peak Commuter Time, J. Audiol. Otol., 21:2 61-65. [https://doi.org/10.7874/jao.2017.21.2.61]
Belkin, M., Ma, S., Mandal, S., 2018, To Understand Deep Learning We Need to Understand Kernel Learning, Proceedings of the 35th International Conference on Machine Learning, 80 541-549.
Gers, F. A., Schmidhuber, E., 2001, LSTM Recurrent Networks Learn Simple Context-free and Context-sensitive Languages, IEEE Trans. Neural Netw., 12:6 1333-1340. [https://doi.org/10.1109/72.963769]
Bai, Z., Zhang, X. L., 2021, Speaker Recognition Based on Deep Learning: An Overview, Neural Netw., 140 65-99. [https://doi.org/10.1016/j.neunet.2021.03.004]
Vincent, E., Watanabe, S., Nugraha, A. A., Barker, J., Marxer, R., 2017, An Analysis of Environment, Microphone and Data Simulation Mismatches in Robust Speech Recognition, Comput. Speech Lang., 46 535-557. [https://doi.org/10.1016/j.csl.2016.11.005]

Heeju Song

MS Candidate in the Department of Smart Health Science and Technology, Kangwon National University. Her research interest is Organ-on-a-chip and 3D Bioprinting.

E-mail: s.canary462@gmail.com

Seona Yu

MS Student in the Department of Smart Health Science and Technology, Kangwon National University. Her research interest is Nanobio Engineering.

E-mail: 89seona@kangwon.ac.kr

Shikang Sun

Ph. D Student in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is the Sociology of Sports and Leisure.

E-mail: ssk960911@gmail.com

Woong Ki Jang

Ph.D in the Department of of Smart Health Science and Technology, Kangwon National University. His research interest is Micro/Nanoscale Surface Texturing Technologies and the Design of Medical Devices and AI Application System Design.

E-mail: wkddndrl@kangwon.ac.kr

Hyang-Hee Hwang

Professor in the Department of Smart Health Science and Technology, Kangwon National University. Her research interest is the Sociology of Sports and Leisure.

E-mail: phyhee@kangwon.ac.kr

Hyun-Ouk Kim

Professor in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is Nanobio Engineering.

E-mail: kimhoman@kangwon.ac.kr

Byeong-Hee Kim

Professor in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is Micro and Nano System Design, the Precision Control of Machine Tools, the Design of Medical Devices, and AI Application System Design.

E-mail: kbh@kangwon.ac.kr

Hyungseok Lee

Professor in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is 3D Bioprinting, Tissue Engineering, and Wearable Devices.

E-mail: ahl@kangwon.ac.kr