Benchmark Problems

Progress in computational chemistry depends on shared, reproducible evaluation targets. This section collects notes on benchmark problems and datasets used to assess new methods, from classic analytical potential energy surfaces like the Muller-Brown surface to standardized generative modeling platforms like MOSES. These resources matter because they define what “better” means in practice, and understanding their design choices is essential for interpreting results reported in the literature.

Computational Chemistry

Activity cliffs benchmark showing method rankings by RMSE on cliff compounds, with SVM plus ECFP outperforming deep learning approaches

Exposing Limitations of Molecular ML with Activity Cliffs

This paper benchmarks 24 machine and deep learning methods on activity cliff compounds (structurally similar molecules with large potency differences) across 30 macromolecular targets. Traditional ML with molecular fingerprints consistently outperforms graph neural networks and SMILES-based transformers on these challenging cases, especially in low-data regimes.

Computational Chemistry

Density plot showing training vs generated physicochemical property distribution

Molecular Sets (MOSES): A Generative Modeling Benchmark

MOSES introduces a comprehensive benchmarking platform for molecular generative models, offering standardized datasets, evaluation metrics, and baselines. By providing a unified measuring stick, it aims to resolve reproducibility challenges in chemical distribution learning.

Computational Chemistry

Müller-Brown Potential Energy Surface showing the three minima and two saddle points

The Müller-Brown Potential: A 2D Benchmark Surface

A two-dimensional analytical potential energy surface introduced in 1979 for testing optimization algorithms. It features three minima and curved transition pathways that evaluate an algorithm’s ability to navigate non-trivial topologies.