EEG-Based Music Emotion Prediction Using Supervised Feature Extraction for MIDI Generation

Oscar Gomez-Morales¹, Hernan Perez-Nastar², Andrés Marino Álvarez-Meza², Héctor Torres-Cardona³, Germán Castellanos-Dominguez²

Affiliations

¹ Faculty of Systems and Telecommunications, Universidad Estatal Península de Santa Elena, Avda. La Libertad, Santa Elena 7047, Ecuador.
² Signal Processing and Recognition Group, Universidad Nacional de Colombia, Manizales 170003, Colombia.
³ Transmedia Research Center, Universidad de Caldas, Manizales 170003, Colombia.

PMID: 40096343
PMCID: PMC11902679
DOI: 10.3390/s25051471

EEG-Based Music Emotion Prediction Using Supervised Feature Extraction for MIDI Generation

Oscar Gomez-Morales et al. Sensors (Basel). 2025.

. 2025 Feb 27;25(5):1471.

doi: 10.3390/s25051471.

Authors

Oscar Gomez-Morales¹, Hernan Perez-Nastar², Andrés Marino Álvarez-Meza², Héctor Torres-Cardona³, Germán Castellanos-Dominguez²

Affiliations

¹ Faculty of Systems and Telecommunications, Universidad Estatal Península de Santa Elena, Avda. La Libertad, Santa Elena 7047, Ecuador.
² Signal Processing and Recognition Group, Universidad Nacional de Colombia, Manizales 170003, Colombia.
³ Transmedia Research Center, Universidad de Caldas, Manizales 170003, Colombia.

PMID: 40096343
PMCID: PMC11902679
DOI: 10.3390/s25051471

Abstract

Advancements in music emotion prediction are driving AI-driven algorithmic composition, enabling the generation of complex melodies. However, bridging neural and auditory domains remains challenging due to the semantic gap between brain-derived low-level features and high-level musical concepts, making alignment computationally demanding. This study proposes a deep learning framework for generating MIDI sequences aligned with labeled emotion predictions through supervised feature extraction from neural and auditory domains. EEGNet is employed to process neural data, while an autoencoder-based piano algorithm handles auditory data. To address modality heterogeneity, Centered Kernel Alignment is incorporated to enhance the separation of emotional states. Furthermore, regression between feature domains is applied to reduce intra-subject variability in extracted Electroencephalography (EEG) patterns, followed by the clustering of latent auditory representations into denser partitions to improve MIDI reconstruction quality. Using musical metrics, evaluation on real-world data shows that the proposed approach improves emotion classification (namely, between arousal and valence) and the system's ability to produce MIDI sequences that better preserve temporal alignment, tonal consistency, and structural integrity. Subject-specific analysis reveals that subjects with stronger imagery paradigms produced higher-quality MIDI outputs, as their neural patterns aligned more closely with the training data. In contrast, subjects with weaker performance exhibited auditory data that were less consistent.

Keywords: EEG; kernel methods; music emotion recognition; piano-roll algorithm.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Proposed deep learning framework for EEG-based emotion prediction using supervised feature extraction for MIDI generation. Stages: (i) segment-wise preprocessing; (ii) supervised deep feature extraction for emotion classification; and (iii) affective-based MIDI prediction and feature alignment.

**Figure 2**
Visualization of emotion labels and MIDI feature representations. (a) Emotion labels set by Subject $# 1$ , where the x-axis represents arousal and the y-axis represents valence. (b) Two-dimensional t-SNE projection ( $n_c o m p o n e n t s = 2$ , perplexity $= 5$ ) of the piano-roll arrays, illustrating clustering of MIDI features based on emotion labels. Colors indicate the class of each audio stimulus. (c) Embedding space obtained from the bottleneck representation of the piano-roll autoencoder trained with CKA loss. Dots correspond to training data, while crosses (×) represent test data. Of note, the axes are resized to provide better visual perception of the plotted values.

**Figure 3**
Comparison between original MIDI (**top**) and reconstructed MIDI (**bottom**). Of note, the axes are resized to provide better visual perception of the plotted values.

**Figure 4**
Probability density functions (PDFs) of model characteristics for all subjects in fold 1. Each subfigure corresponds to a specific feature extracted from the MIDI data. (a) Feature 0: description of the feature (e.g., pitch range). (b) Feature 1: description of the feature (e.g., total used pitch). (c) Feature 2: description of the feature (e.g., average IOI). (d) Feature 3: description of the feature (e.g., pitch-class histogram).

**Figure 5**
Violin plots comparing metrics for the best- and worst-performing subjects.

See this image and copyright information in PMC

References

1. Lopez Duarte A.E. A Progressive-Adaptive Music Generator (PAMG): An Approach to Interactive Procedural Music for Videogames; Proceedings of the FARM 2024: 12th ACM SIGPLAN International Workshop on Functional Art, Music, Modelling, and Design; New York, NY, USA. 2 September 2024; pp. 65–72.
1. Chi X., Wang Y., Cheng A., Fang P., Tian Z., He Y.Y., Liu Z., Qi X., Pan J., Zhang R., et al. MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions. arXiv. 20242407.20962
1. Chen Y., Sun Y. The Usage of Artificial Intelligence Technology in Music Education System Under Deep Learning. IEEE Access. 2024;12:130546–130556. doi: 10.1109/ACCESS.2024.3459791. - DOI
1. Ramaswamy M., Philip J.L., Priya V., Priyadarshini S., Ramasamy M., Jeevitha G., Mathkor D.M., Haque S., Dabaghzadeh F., Bhattacharya P., et al. Therapeutic use of music in neurological disorders: A concise narrative review. Heliyon. 2024;10:e35564. doi: 10.1016/j.heliyon.2024.e35564. - DOI - PMC - PubMed
1. El-Haddad C., Laouris Y. Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces. Theoretical and Practical Issues: Third COST 2102 International Training School, Caserta, Italy, 15–19 March 2010, Revised Selected Papers. Springer; Berlin/Heidelberg, Germany: 2011. The ability of children with mild learning disabilities to encode emotions through facial expressions; pp. 387–402.

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

EEG-Based Music Emotion Prediction Using Supervised Feature Extraction for MIDI Generation

Affiliations

EEG-Based Music Emotion Prediction Using Supervised Feature Extraction for MIDI Generation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous