Abstract

Emotion recognition is crucial for advancing human–computer interaction (HCI) by enabling systems to interpret complex affective states. While Electroencephalogram (EEG) signals provide direct insights into neural activity, facial expressions offer external emotional cues. However, unimodal systems often struggle with robustness and generalization across diverse subjects. This study presents a Hierarchical Convolutional Neural Network (HCNN) framework that integrates EEG and facial expressions through multi-level convolutional feature extraction and featurelevel fusion. The proposed model combines deep hierarchical representations with handcrafted temporal–frequency and texture-based descriptors to form a unified feature vector. Experiments on the MAHNOB-HCI and DEAP datasets show that the HCNN achieves accuracies of 91.40% and 88.09%, outperforming CNN-, LSTM-, and SVM-based methods. The results demonstrate the model’s ability to effectively capture complementary cross-modal correlations while reducing feature redundancy and computational complexity. The HCNN framework shows great promise for real-time emotion recognition applications, offering a scalable, interpretable, and data-efficient solution for multimodal emotion recognition in next-generation HCI systems.


Document

The PDF file did not load properly or your web browser does not support viewing PDF files. Download directly to your device: Download PDF document
Back to Top
GET PDF

Document information

Published on 04/05/26
Accepted on 04/05/26
Submitted on 03/05/26

Volume Online First, 2026
DOI: 10.23967/j.rimni.2026.10.72094
Licence: CC BY-NC-SA license

Document Score

0

Views 34
Recommendations 0

Share this document

claim authorship

Are you one of the authors of this document?