Latent Diffusion-Based 3D Molecular Recovery

from Vibrational Spectra

Wenjin Wu1,2, Aleš Leonardis1, Linjiang Chen1,2 Jianbo Jiao1,

1MIx Group, University of Birmingham, United Kingdom
2University of Science and Technology of China, China

Code Paper
Table of Contents
Recovered Molecule

Abstract

Infrared (IR) spectroscopy, a type of vibrational spectroscopy, is widely used for molecular structure determination and provides critical structural information for chemists. However, existing approaches for recovering molecular structures from IR spectra typically rely on one-dimensional SMILES strings or two-dimensional molecular graphs, which fail to capture the intricate relationship between spectral features and three-dimensional molecular geometry. Recent advances in diffusion models have greatly enhanced the ability to generate molecular structures in 3D space. Yet, no existing model has explored the distribution of 3D molecular geometries corresponding to a single IR spectrum.

In this work, we introduce IR-GeoDiff, a latent diffusion model that recovers 3D molecular geometries from IR spectra by integrating spectral information into both node and edge representations of molecular structures. We evaluate IR-GeoDiff from both spectral and structural perspective, demonstrating its ability to recover the molecular distribution corresponding to a given IR spectrum. Furthermore, an attention-based analysis reveals that the model is able to focus on characteristic functional group regions in IR spectra, qualitatively consistent with common chemical interpretation practices.

Contributions

IR-GeoDiff

Model

Spectral features $S$ are first extracted by a Transformer-based spectral classifier $\tau_\theta$. Then, the encoder $\mathcal{E}_\phi$ maps molecular geometries into a latent representation $\mathbf{z}_\mathrm{x}$, which is perturbed through a forward diffusion process and denoised by an equivariant network $\epsilon_\theta$ conditioned on the spectral features which are injected into node and edge representations via cross-attention. Finally, the decoder $\mathcal{D}_\delta$ reconstructs the 3D geometry $G$ from the denoised latent representation $\mathbf{z}_\mathrm{x}^T$.

Evaluation Metrics

The goal for IR-GeoDiff is to recover molecular structures consistent with a given IR spectrum instead of encouraging diversity. Here, we evaluate the recovery performance of IR-GeoDiff from both structural and spectral perspectives.

Examples

Reference Molecule
simg: 1.00, SIS: 0.974, SIS*: 0.969
Reference Molecule
simg: 1.00, SIS: 0.985, SIS*: 0.984

Attention-based Analysis

We further investigate how the proposed model works. We find that both spectral-edge cross-attention module and atom-edge cross-attention module with the model are able to focus on spectral regions associated with characteristic functional groups. This behaviour qualitatively resembles how chemists interpret IR spectra.

Examples here include two representative vibrational modes: the stretching vibration of the carbon-nitrogen triple bond (C$\equiv$N) and the stretching vibration of the O-H bond in a hydroxy group. These correspond to absorption peaks typically observed around 2,250 and 3,600 cm-1 respectively.

For molecules a and b, which each contain only a single type of functional group, the model correctly identifies the corresponding characteristic spectral peak. Furthermore, for molecule c containing two different functional groups, different cross-attention layers attend to different spectral regions associated with each group. This layer-wise specialisation indicates that its ability to disentangle and localise multiple spectral signatures within a single molecule.

Analysis on Exceptions

Figure A here presents the joint distribution of graph similarity simg and SIS across 1,000 test spectra, with 50 sampled molecules per spectrum. A sharp density peak appears near the top-right corner, indicating a cluster of molecules that are near-exact matches to the reference.

In practical applications, chemists typically identify compounds by visually comparing the IR spectra of unknown samples with those of known references. Motivated by this, we analyse the distribution of graph similarity across different SIS ranges, as shown in Figure B. While graph similarity generally increases with SIS, the correlation between the two is moderate. Notably, there exists a non-negligible number of cases where molecules exhibit high graph similarity but low SIS, and vice versa.

High graph similarity but low SIS

Reference Molecule
Recovered Molecule
simg: 1.00, SIS: 0.378, SIS*: 0.268

Samples with high simg but low SIS, primarily caused by conformational changes, especially some of which result in the formation of intramolecular hydrogen bonds. These interactions can shift the vibrational frequencies of associated functional groups, leading to notable discrepancies in the IR spectra despite high structural similarity.

High SIS but low graph similarity

Reference Molecule
Recovered Molecule
simg: 0.167, SIS: 0.896, SIS*: 0.970

Samples with low simg but high SIS typically arises from mismatches in the molecular scaffolds, particularly when molecules lack distinctive functional groups and consist of carbon and hydrogen atoms. In such cases, the IR spectral signals reflecting differences in carbon backbone topology are often subtle and difficult to interpret, highlighting the limited ability of IR spectroscopy to resolve differences in molecular skeletons.

Future work could incorporate additional spectral modalities, particularly nuclear magnetic resonance (NMR), which offers complementary information. For instance, 1H-NMR and 13C-NMR spectra are highly informative about molecular backbones and can also reveal conformational details.