Audio-based event detection poses a number of different challenges that are not encountered in other fields, such as image detection. Challenges such as ambient noise, low Signal-to-Noise Ratio (SNR) and microphone distance are not yet fully understood. If the multimodal approaches are to become better in a range of fields of interest, audio analysis will have to play an integral part. Event recognition in autonomous vehicles (AVs) is such a field at a nascent stage that can especially leverage solely on audio or can be part of the multimodal approach. In this manuscript, an extensive analysis focused on the comparison of different magnitude representations of the raw audio is presented. The data on which the analysis is carried out is part of the publicly available MIVIA Audio Events dataset. Single channel Short-Time Fourier Transform (STFT), mel-scale and Mel-Frequency Cepstral Coefficients (MFCCs) spectrogram representations are used. Furthermore, aggregation methods of the aforementioned spectrogram representations are examined; the feature concatenation compared to the stacking of features as separate channels. The effect of the SNR on recognition accuracy and the generalization of the proposed methods on datasets that were both seen and not seen during training are studied and reported.

Document type: Article

Full document

The PDF file did not load properly or your web browser does not support viewing PDF files. Download directly to your device: Download PDF document

Original document

The different versions of the original document can be found in:

https://doaj.org/toc/2079-9292 under the license cc-by
under the license https://creativecommons.org/licenses/by/4.0/
Back to Top

Document information

Published on 01/01/2020

Volume 2020, 2020
DOI: 10.3390/electronics9101593
Licence: Other

Document Score


Views 5
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?