
GutenOCR: A Grounded Vision-Language Front-End for Documents
GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Optimizing Sequence Models for Dynamical Systems
We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Block-Recurrent Transformers for Long Sequences
A transformer architecture that applies a recurrent cell over blocks of tokens, achieving linear complexity in sequence length while outperforming Transformer-XL baselines on PG19, arXiv, and GitHub datasets.

Ewald Message Passing for Molecular Graphs
Proposes Ewald message passing, a Fourier-space scheme inspired by Ewald summation that captures long-range interactions in molecular graphs. The method is architecture-agnostic and improves energy MAEs by 10% on OC20 and 16% on OE62 across four baseline GNN models.

Lagrangian Neural Networks for Physics
Lagrangian Neural Networks (LNNs) use neural networks to parameterize arbitrary Lagrangians, enabling energy-conserving learned dynamics without canonical coordinates. Unlike Hamiltonian approaches, LNNs handle relativistic systems and extend to graphs via Lagrangian Graph Networks.

Liquid-S4: Input-Dependent State-Space Models
Liquid-S4 extends the S4 framework by incorporating a linearized liquid time-constant formulation that introduces input-dependent state transitions. This yields an additional convolutional kernel capturing input correlations, improving generalization across long-range sequence tasks.

RWKV: Linear-Cost RNN with Transformer Training
RWKV is a novel sequence model that achieves transformer-level performance while maintaining linear time and constant memory complexity during inference, scaled up to 14 billion parameters.

InChI: The International Chemical Identifier
InChI (International Chemical Identifier) is an open standard from IUPAC that represents molecular structures as hierarchical, layered strings optimized for database interoperability, unique identification, and web search via its hashed InChIKey.

MarkushGrapher-2: End-to-End Markush Recognition
An 831M-parameter encoder-decoder model that jointly encodes image, OCR text, and layout information through a two-stage training strategy, achieving state-of-the-art multimodal Markush structure recognition while remaining competitive on standard molecular structure recognition.

Materials Representations for ML Review
A comprehensive review of how solid-state materials can be numerically represented for machine learning, spanning structural features, graph neural networks, compositional descriptors, transfer learning, and generative models for inverse design.

NaViT: Native Resolution Vision Transformer
NaViT applies sequence packing (Patch n’ Pack) to Vision Transformers, enabling training on images of arbitrary resolution and aspect ratio while improving training efficiency by up to 4x over standard ViT.

BioT5: Cross-Modal Integration of Biology and Chemistry
BioT5 uses SELFIES representations and separate tokenization to pre-train a unified T5 model across molecules, proteins, and text, achieving state-of-the-art results on 10 of 15 downstream tasks.