Method

Diagram showing block-recurrent transformer architecture with vertical and horizontal processing directions

Block-Recurrent Transformers for Long Sequences

A transformer architecture that applies a recurrent cell over blocks of tokens, achieving linear complexity in sequence length while outperforming Transformer-XL baselines on PG19, arXiv, and GitHub datasets.

Machine Learning

Diagram showing NaViT packing variable-resolution image patches into a single sequence

NaViT: Native Resolution Vision Transformer

NaViT applies sequence packing (Patch n’ Pack) to Vision Transformers, enabling training on images of arbitrary resolution and aspect ratio while improving training efficiency by up to 4x over standard ViT.

Computational Chemistry

Adaptive grid merging visualization for benzene molecule showing multi-resolution spatial discretization

Beyond Atoms: 3D Space Modeling for Molecular Pretraining

ICML 2025 paper introducing SpaceFormer, a Transformer architecture that challenges the atom-centric paradigm by modeling the continuous 3D space surrounding molecules using adaptive multi-resolution grids, ranking first in 10 of 15 molecular property prediction tasks.