Transformer-Based Models for Predicting Molecular Structures from Infrared Spectra Using Patch-Based Self-Attention

Wenjin Wu1,2, Aleš Leonardis1, Jianbo Jiao1, Jun Jiang2 Linjiang Chen1,2

1MIx Group, University of Birmingham, 2University of Science and Technology of China

The Journal of Physical Chemistry A

Code Paper
Table of Contents

Abstract

Infrared (IR) spectroscopy, a type of vibrational spectroscopy, provides extensive molecular structure details and is a highly effective technique for chemists to determine molecular structures. However, analysing experimental spectra has always been challenging due to the specialised knowledge required and the variability of spectra under different experimental conditions.

Here, we propose a Transformer-based model with a patch-based self-attention spectrum embedding layer, designed to prevent the loss of spectral information while maintaining simplicity and effectiveness. To further enhance the model’s understanding of IR spectra, we introduce a data augmentation approach, which selectively introduces vertical noise only at absorption peaks.

Our approach not only achieves state-of-the-art performance on simulated datasets but also attains a top-1 accuracy of 55% on real experimental spectra, surpassing the previous state-of-the-art by approximately 10%. Additionally, our model demonstrates proficiency in analysing intricate and variable fingerprint regions, effectively extracting critical structural information.

Method

Model

We introduce a patch-based self-attention spectrum embedding layer to increase the sampling resolution of spectral data while ensuring that no critical information is lost during the sampling process. Specifically, we sample 3,200 points from each infrared (IR) spectrum within the range of 400–3,982 cm-1. Each spectrum is then divided into patches, which are processed through a self-attention mechanism. This approach improves computational efficiency and allows the model to capture spectral dependencies.

Additionally, we introduce a data augmentation method for spectral data, called Adaptive Noise. This technique selectively applies vertical noise to regions containing absorption peaks, ensuring that the introduced noise does not alter the intrinsic spectral information and encourages the model to focus on peak positions more.

Experiments

Experiment

We evaluate our model on two simulated datasets, QM9S and the IBM dataset, as well as one real experimental dataset, the NIST dataset. Our method achieves superior performance compared to the current state-of-the-art (SOTA) approaches across all three datasets. These results demonstrate the effectiveness and generalizability of our method in accurately predicting molecular structures from IR spectra. Moreover, the consistent performance improvements across both simulated and real-world datasets highlight the robustness of our approach. Experiment

Although there are significant differences between computed spectra and experimental spectra that cannot be ignored, our method achieves comparable performance across both datasets. The t-SNE visualization further illustrates that the Transformer encoder effectively reduces the discrepancy between computed and experimental IR spectra in the latent space during fine-tuning.

BibTeX

@article{wu2025transformer,
 title={Transformer-Based Models for Predicting Molecular Structures from Infrared Spectra Using Patch-Based Self-Attention},
 author={Wu, Wenjin and Leonardis, Aleš and Jiao, Jianbo and Jiang, Jun and Chen, Linjiang},
 journal={The Journal of Physical Chemistry A},
 volume = {129},
 number = {8},
 pages = {2077-2085},
 year={2025},
 doi = {10.1021/acs.jpca.4c05665},
 publisher={ACS Publications}
}