골전도 헤드폰 형태로 추출된 골전도 음성 신호의 딥러닝 활용
Abstract
In this study, we used deep learning to align bone-conducted speech signals with air-conducted speech signals, aiming to replace traditional air conduction microphones in voice-based services capturing surrounding sounds. We fabricated headphones, placing bone conduction microphones on the rami (the branches of a bone in the jaw area), in line with traditional bone conduction headphone configurations. Using LSTM, CNN, and CRNN models, we created databases that aligned bone-conducted speech signals with their air-conducted counterparts and tested them with bone-conducted speech signals captured via our custom-made headphones. The CNN model demonstrated superior performance in accurately distinguishing three English words (“apple,” “hello,” and “pass”), including their voiceless pronunciations. In conclusion, our study shows that deep learning models can effectively use bone-conducted speech signals extracted from the rami for automatic speech recognition (ASR), paving the way for future ASR technology that precisely recognizes only the speaker’s voice.
Keywords:
Bone conduction, Bone-conducted speech signals, Automatic speech recognition, Deep learningAcknowledgments
본 논문은 2023년도 교육부의 재원으로 한국연구재단의 지원을 받아 수행된 지자체-대학 협력기반 지역혁신 사업의 결과임 (2022RIS-005).
References
- De Sousa, K. C., Swanepoel, D. W., Moore, D. R., Myburgh, H. C., Smits, C., 2020, Improving Sensitivity of the Digits-in-noise Test using Antiphasic Stimuli, Ear Hear., 41:2 442-450. [https://doi.org/10.1097/AUD.0000000000000775]
- Møller, A. R., 2012, Hearing: Anatomy, Physiology, and Disorders of the Auditory system, Plural Publishing, San Diego, CA.
- Freeman, S., Sichel, J. Y., Sohmer, H., 2000, Bone Conduction Experiments in Animals-evidence for a Non-osseous Mechanism, Hear. Res., 146:1-2 72-80. [https://doi.org/10.1016/s0378-5955(00)00098-8]
- Zhang, L., Tan, S., Wang, Z., Ren, Y., Wang, Z., Yang, J., 2020, Viblive: A Continuous liveness Detection for Secure Voice User Interface in IoT Environment, ACSAC '20: Proceedings of the 36th Annual Computer Security Applications Conference, 884-896. [https://doi.org/10.1145/3427228.3427281]
- McBride, M., Tran, P., Letowski, T., Patrick, R., 2011, The Effect of Bone Conduction Microphone Locations on Speech Intelligibility and Sound Quality, Appl. Ergon., 42:3 495-502. [https://doi.org/10.1016/j.apergo.2010.09.004]
- Zhou, Y., Chen, Y., Ma, Y., Liu, H., 2020, A Real-time Dual-microphone Speech Enhancement Algorithm Assisted by Bone Conduction Sensor, Sensors, 20:18 5050. [https://doi.org/10.3390/s20185050]
- Yu, C., Hung, K. H., Wang, S. S., Tsao, Y., Hung, J. W., 2020, Time-domain Multi-modal Bone/Air Conducted Speech Enhancement, IEEE Signal Process. Lett., 27 1035-1039. [https://doi.org/10.1109/LSP.2020.3000968]
- Putta, V. S., Selwin Mich Priyadharson, A., Sundramurthy, V. P., 2022, Regional Language Speech Recognition from Bone-Conducted Speech Signals through Different Deep Learning Architectures, Comput. Intell. Neurosci., 2022 4473952. [https://doi.org/10.1155/2022/4473952]
- Sak, H., Senior, A. W., Beaufays, F., 2014, Long Short-term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling, Interspeech, 2014 338-342. [https://doi.org/10.21437/Interspeech.2014-80]
- Soliman, A., Mohamed, S., Abdelrahman, I. A., 2021, Isolated Word Speech Recognition using Convolutional Neural Network, 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE), 1-6. [https://doi.org/10.1109/ICCCEEE49695.2021.9429684]
- Alashban, A. A., Qamhan, M. A., Meftah, A. H., Alotaibi, Y. A., 2022, Spoken Language Identification System Using Convolutional Recurrent Neural Network, Appl. Sci., 12:18 9181. [https://doi.org/10.3390/app12189181]
- Murakami, Y., Kurita, H., 2022, Robust Phonetic Features in Bone-Conducted Speech Communication, ICIC Express Letters, 16:5 513-520. [https://doi.org/10.24507/icicel.16.05.513]
- Ayvaz, U., Gürüler, H., Khan, F., Ahmed, N., Whangbo, T., Bobomirzaevich, A., 2022, Automatic Speaker Recognition Using Mel-frequency Cepstral Coefficients through Machine Learning, CMC-Comput. Mat. Contin., 71:3 5511-5521. [https://doi.org/10.32604/cmc.2022.023278]
- Defossez, A., Synnaeve, G., Adi, Y., 2020, Real Time Speech Enhancement in the Waveform Domain, Interspeech, 2020 3291-3295. [https://doi.org/10.21437/Interspeech.2020-2409]
- Kabal, P., Ramachandran, R. P., 1986, The Computation of Line Spectral Frequencies Using Chebyshev Polynomials, IEEE Transactions on Acoustics, Speech, and Signal Processing, 34:6 1419-1426. [https://doi.org/10.1109/TASSP.1986.1164983]
- Lee, D., Kim, G., Han, W., 2017, Analysis of Subway Interior Noise at Peak Commuter Time, J. Audiol. Otol., 21:2 61-65. [https://doi.org/10.7874/jao.2017.21.2.61]
- Belkin, M., Ma, S., Mandal, S., 2018, To Understand Deep Learning We Need to Understand Kernel Learning, Proceedings of the 35th International Conference on Machine Learning, 80 541-549.
- Gers, F. A., Schmidhuber, E., 2001, LSTM Recurrent Networks Learn Simple Context-free and Context-sensitive Languages, IEEE Trans. Neural Netw., 12:6 1333-1340. [https://doi.org/10.1109/72.963769]
- Bai, Z., Zhang, X. L., 2021, Speaker Recognition Based on Deep Learning: An Overview, Neural Netw., 140 65-99. [https://doi.org/10.1016/j.neunet.2021.03.004]
- Vincent, E., Watanabe, S., Nugraha, A. A., Barker, J., Marxer, R., 2017, An Analysis of Environment, Microphone and Data Simulation Mismatches in Robust Speech Recognition, Comput. Speech Lang., 46 535-557. [https://doi.org/10.1016/j.csl.2016.11.005]
MS Candidate in the Department of Smart Health Science and Technology, Kangwon National University. Her research interest is Organ-on-a-chip and 3D Bioprinting.
E-mail: s.canary462@gmail.com
MS Student in the Department of Smart Health Science and Technology, Kangwon National University. Her research interest is Nanobio Engineering.
E-mail: 89seona@kangwon.ac.kr
Ph. D Student in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is the Sociology of Sports and Leisure.
E-mail: ssk960911@gmail.com
Ph.D in the Department of of Smart Health Science and Technology, Kangwon National University. His research interest is Micro/Nanoscale Surface Texturing Technologies and the Design of Medical Devices and AI Application System Design.
E-mail: wkddndrl@kangwon.ac.kr
Professor in the Department of Smart Health Science and Technology, Kangwon National University. Her research interest is the Sociology of Sports and Leisure.
E-mail: phyhee@kangwon.ac.kr
Professor in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is Nanobio Engineering.
E-mail: kimhoman@kangwon.ac.kr
Professor in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is Micro and Nano System Design, the Precision Control of Machine Tools, the Design of Medical Devices, and AI Application System Design.
E-mail: kbh@kangwon.ac.kr
Professor in the Department of Smart Health Science and Technology, Kangwon National University. His research interest is 3D Bioprinting, Tissue Engineering, and Wearable Devices.
E-mail: ahl@kangwon.ac.kr