American Journal of Circuits, Systems and Signal Processing
Articles Information
American Journal of Circuits, Systems and Signal Processing, Vol.1, No.3, Aug. 2015, Pub. Date: Jul. 10, 2015
Using Spectro-Temporal Features for Environmental Sounds Recognition
Pages: 60-68 Views: 3034 Downloads: 1453
Authors
[01] Souli Sameh, Signal, Image and Pattern Recognition Research Unit, Dept. of Genie Electrique, National School of Engineering, Belvedere, Tunisia.
[02] Zied Lachiri, Dept. of Physique and Instrumentation, National School of Engineering, Centre Urbain, Tunisia.
Abstract
The paper presents the task of recognizing environmental sounds for audio surveillance and security applications. A various characteristics have been proposed for audio classification, including the popular Mel-frequency cepstral coefficients (MFCCs) which give a description of the audio spectral shape. However, it exist some temporal-domain features. These last have been developed to characterize the audio signals. Here, we make an empirical feature analysis for environmental sounds classification and propose to use the log-Gabor-filters algorithm to obtain effective time-frequency characteristics. The Log-Gabor filters-based method utilizes time-frequency decomposition for feature extraction, resulting in a flexible and physically interpretable set of features. The Log-Gabor filters-based feature is adopted to supplement the MFCC features to yield higher classification accuracy for environmental sounds. Extensive experiments are performed to prove the effectiveness of these joint features for environmental sound recognition. Besides, we provide empirical results showing that our method is robust for audio surveillance Applications.
Keywords
Environmental Sounds, MFCC, Log-Gabor Filters, Spectrogram, SVM Multiclass
References
[01] V. Peltonen, J. Tuomi, A. Klapuri, J. Huopaniemi, and T. Sorsa, “Computational audiroty scene recognition,” presented at the IEEE Int. Conf. Acoustics, Speech Signal Processing, FL, May 2002.
[02] M. Vacher, D. Istrate, L. Besacier, J. F. Serignat, and E. Castelli, “Sound detection and classification for medical telesurvey,” in Proc. IASTED Biomedical Conf., Innsbruck, Autriche, Feb. 2004, pp. 395–399.
[03] A Dufaux, L Besacier, M Ansorge, and F Pellandini, Automatic Sound Detection and Recognition For Noisy Environment. In Proceedings of European Signal Processing Conference (EUSIPCO), 1033-1036, (2000).
[04] A Fleury, N Noury, M Vacher, H Glasson and J.F Serignat, Sound and speech detection and classification in a Health Smart Home. 30th IEEE Engineering in Medicine and Biology Society (EMBS), 4644-4647(2008).
[05] D Mitrovic, M Zeppelzauer, H Eidenberger, Analysis of the Data Quality of Audio Descriptions of Environmental Sounds. Journal of Digital Information Management (JDIM), 5(2), 48-54 (2007).
[06] K El-Maleh, A Samouelian, and P Kabal, Frame-level noise classification in mobile environments. In Proc. ICASSP, 237–240, (1999).
[07] D Istrate, Détection et reconnaissance des sons pour la surveillance médicale. PhD thesis, INPG, France, 2003.
[08] A. Bregman, Auditory Scene Analysis. Cambridge, MA: MIT Press, 1990.
[09] M. P. Cooke, Modeling Auditory Processing and Organisation. Cambridge, U.K.: Cambridge University Press, 1993.
[10] L. He, M. Lech, N. Maddage, N. Allen, “Stress and Emotion Recognition Using Log-Gabor Filter”, Affective Computing and Intelligent Interaction and Workshops, ACII, 3rd International Conference on, 2009, pp.1-6.
[11] L. He, M. Lech, N. C. Maddage and N Allen, “Stress Detection Using Speech Spectrograms and Sigma-pi Neuron Units”, Int. Conf. on Natural Computation, 2009, pp.260-264.
[12] M. Kleinschmidt, “Methods for capturing spectro-temporal modulations in automatic speech recognition”, Electrical and Electronic Engineering Acoustics, Speech and Signal Processing Papers, Acta Acustica, Vol.88, No. 3, 2002, pp. 416-422.
[13] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. von der Malsburg, R.P. Wurtz, and W. Konen. Distortion invariant object recognition in the dynamic link architecture. Transactions on Computers. vol. 42, no. 3, pp. 300-311, 1993.
[14] L. Wiskott and C. von der Malsburg. Recognizing faces by dynamic link matching. In Axel Wismüller and Dominik R. Dersch, editors, Symposionüber biologische Informations verarbeitung und NeuronaleNetze- SINN '95, pp. 6368, München, 1996.
[15] O. Ayinde and Y.H. Yang. Face recognition approach based on rank correlation of Gabor-filtered images. Pattern Recognition, Vol. 35, no. 6, pp: 1275-1289, June 2002.
[16] C. J. Lee and S. D. Wang. Fingerprint feature extraction using Gabor filters. Electronics Letters, 1999.
[17] A. K. Jain and F. Farrokhnia. Unsupervised texture segmentation using Gabor filters. Pattern Recogn., vol. 24, no. 12, pp.1167-1186, 1991.
[18] J. Daugman. How iris recognition works. Circuits and Systems for Video Technology, IEEE Transactions on, vol. 14, no.1, pp. 21-30, Jan. 2004.
[19] M. Zhou and H. Wei. Face verification using gabor wavelets and adaboost. In ICPR '06: Proceedings of the 18th International Conference on Pattern Recognition, p. 404-407, Washington, DC, USA, IEEE Computer Society, 2006.
[20] S. Souli, Z. Lachiri, Multiclass Support Vector Machines for Environmental Sounds Classification in visual domain based on Log-Gabor Filters, International Journal of Speech Technology (IJST), vol.16, no.2, pp.203-213, Springer Link, 2013.
[21] S Chu, S Narayanan, and C.C.J Kuo, Environmental Sound Recognition with Time-Frequency Audio Features. IEEE Trans. on Speech, Audio, and Language Processing. 17(6), 1142-1158, (2009).
[22] S. Souli, Z. Lachiri, On the Use of Time–Frequency Reassignment and SVM-based classifier for Audio Surveillance Applications, International Journal of Image, Graphics and Signal Processing (IJIGSP), vol6, n°12, 2014.
[23] Dennis, J. and Tran, H.D. and Li, H. (2011). Spectrogram Image Feature for Sound Event Classification in Mismatched Conditions. Signal Processing Letters, IEEE, 18: 130-133.
[24] F. Auger and P. Flandrin, “Improving the Readability of Time-Frequency and Time- Scale Representations by the Reassignment Method”, IEEE Trans. Signal Proc.,Vol.40, No.5,1995 pp.1068-1089.
[25] Kleinschmidt, M. (2002). Methods for capturing spectro-temporal modulations in automatic speech recognition. Electrical and Electronic Engineering Acoustics, Speech and Signal Processing Papers, Acta Acustica, 88:416-422.
[26] Kleinschmidt, M. (2003) .Localized spectro-temporal features for auto-matic speech recognition. In Proc. Eurospeech, pp. 2573-2576.
[27] Rabaoui, A. Davy, M. Rossignol, S. and Ellouze, N. (2008). Using One-Class SVMs and Wavelets for Audio Surveillance. IEEE Transactions on Information Forensics And Security. 3: 763-775.
[28] V. Espinosa-Duro, M. Faundez-Zanuy. Face Identification by Means of a Neural Net Classifier. Proceedings of IEEE 33rd Annual, International Carnahan Conf. on Security Technology, pp. 182-186, 1999.
[29] S.M. Lajevardi, M. Lech. Facial Expression Recognition Using a Bank of Neural Networks and logarithmic Gabor Filters. DICTA08, Canberra, Australia, 2008.
[30] D.J. Field. Statistics of natural, Relations between the images and the response properties of cortical cells. Jour. of the Optical Society of America, pp. 23792394,1987.
[31] Leonardo Software website. [Online]. Available: http ://www.leonardosoft.com. Santa Monica, CA 90401.
[32] Real World Computing Paternship, Cd-sound scene database in real acoustical environments, 2000, http://tosa.mri.co.jp/ sounddb/indexe.htm.
[33] Christopher M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 2003.
[34] V Vladimir, and N Vapnik, An Overview of Statistical Learning Theory. IEEE Transactions on Neural Networks, 10(5), 988-999, (1999).
[35] V Vapnik, and O Chapelle, Bounds on Error Expectation for Support Vector Machines. Journal Neural Computation, MIT Press Cambridge, MA, USA, 12(9), 2013-2036, (2000).
[36] B Scholkopf, and A Smola, Learning with Kernels, (MIT Press, 2001).
[37] C.-W Hsu, C.-J Lin, A comparison of methods for multi-class support vector machines. J. IEEE Transactions on Neural Networks, 13(2), 415-425, (2002).
[38] Lawrence Rabiner and Biing-Hwang Juang, Fundamentals of Speech Recognition, Prentice-Hall, 1993.
[39] D. Mitrovic, M. Zeppelzauer, H. Eidenberger, “Towards an Optimal Feature Set for Environmental Sound Recognition”, Technical Report TR-188-2, 2006
600 ATLANTIC AVE, BOSTON,
MA 02210, USA
+001-6179630233
AIS is an academia-oriented and non-commercial institute aiming at providing users with a way to quickly and easily get the academic and scientific information.
Copyright © 2014 - American Institute of Science except certain content provided by third parties.