
GutenOCR: A Grounded Vision-Language Front-End for Documents
GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Optimizing Sequence Models for Dynamical Systems
We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Neural Scaling of Deep Chemical Models
Frey et al. discover empirical power-law scaling relations for both chemical language models (ChemGPT, up to 1B parameters) and equivariant GNN interatomic potentials, finding that neither domain has saturated with respect to model size, data, or compute.

Genetic Algorithms as Baselines for Molecule Generation
This position paper demonstrates that genetic algorithms (GAs) perform surprisingly well on molecular generation benchmarks, often outperforming complex deep learning methods. The authors propose the GA criterion: new molecule generation algorithms should demonstrate a clear advantage over GAs.

MolGenSurvey: Systematic Survey of ML for Molecule Design
MolGenSurvey systematically reviews ML models for molecule design, organizing the field by molecular representation (1D/2D/3D), generative method (deep generative models vs. combinatorial optimization), and task type (8 distinct generation/optimization tasks). It catalogs over 100 methods, unifies task definitions via input/output/goal taxonomy, and identifies key challenges including out-of-distribution generation, oracle costs, and lack of unified benchmarks.

SMINA Docking Benchmark for De Novo Drug Design Models
Proposes a benchmark for de novo drug design using SMINA docking scores across eight drug targets, revealing that popular generative models fail to outperform random ZINC subsets.

Tartarus: Realistic Inverse Molecular Design Benchmarks
Tartarus introduces a modular suite of realistic molecular design benchmarks grounded in computational chemistry simulations. Benchmarking eight generative models reveals that no single algorithm dominates all tasks, and simple genetic algorithms often outperform deep generative models.

Tied Two-Way Transformers for Diverse Retrosynthesis
This paper couples a retrosynthesis transformer with a forward reaction transformer through parameter sharing, cycle consistency checks, and multinomial latent variables. The combined approach reduces top-1 SMILES invalidity to 0.1% on USPTO-50K, improves top-10 accuracy to 78.5%, and achieves 87.3% pathway coverage on a multi-pathway in-house dataset.

BARTSmiles: BART Pre-Training for Molecular SMILES
BARTSmiles pre-trains a BART-large model on 1.7 billion SMILES strings from ZINC20 and achieves the best reported results on 11 classification, regression, and generation benchmarks.

Language Models Learn Complex Molecular Distributions
This study benchmarks RNN-based chemical language models against graph generative models on three challenging tasks: high penalized LogP distributions, multi-modal molecular distributions, and large-molecule generation from PubChem. The LSTM language models consistently outperform JTVAE and CGVAE.

LIMO: Latent Inceptionism for Targeted Molecule Generation
LIMO combines a SELFIES-based VAE with a novel stacked property predictor architecture (decoder output as predictor input) and gradient-based reverse optimization on the latent space. It is 6-8x faster than RL baselines and 12x faster than sampling methods while generating molecules with nanomolar binding affinities, including a predicted KD of 6e-14 M against the human estrogen receptor.

Regression Transformer: Prediction Meets Generation
The Regression Transformer (RT) reformulates regression as conditional sequence modelling, enabling a single XLNet-based model to both predict continuous molecular properties and generate novel molecules conditioned on desired property values.