Hi, I’m Hunter.

I’m an AI Research Scientist & Engineer at Roots.ai, bridging abstract ML research and production deployment. I specialize in Large Language Models (LLMs) and Vision-Language Models (VLMs) for document processing, and conduct research in physics-informed AI for scientific simulation. I take ideas from papers to working code, building open-source tools and real-world systems. More about me →
Document Processing
GutenOCR Mascot

GutenOCR: A Grounded Vision-Language Front-End for Documents

GutenOCR is a family of vision-language models designed to serve as a ‘grounded OCR front-end’, providing high-quality text transcription and explicit geometric grounding.

Document Processing
Statistics of the PubMed-OCR dataset including number of articles, pages, words, and bounding boxes.

PubMed-OCR: PMC Open Access OCR Annotations

PubMed-OCR provides 1.5M pages of scientific articles with comprehensive OCR annotations and bounding boxes to support layout-aware modeling and document analysis.

Computational Chemistry
Müller-Brown Potential Energy Surface showing the three minima and two saddle points

Müller-Brown Potential: A PyTorch ML Testbed

A high-performance, GPU-accelerated PyTorch testbed for ML-MD algorithms featuring JIT-compiled analytical Jacobian force kernels achieving 3-10x speedup over autograd, robust Langevin dynamics with Velocity-Verlet integration, and modular architecture designed as ground-truth validation for novel machine learning approaches in molecular dynamics.

Scientific Computing
Velocity Autocorrelation Function showing the signature negative region characteristic of liquid dynamics and the cage effect discovered by Rahman

Modernizing Rahman''s 1964 Argon Simulation

A digital restoration of Rahman’s seminal 1964 molecular dynamics paper using LAMMPS and a production-grade Python analysis pipeline featuring intelligent decorator-based caching, fully vectorized NumPy computations for O(N^2) operations, and modern tooling (uv, type hints, Makefile automation) transforming academic scripts into reproducible research toolkit.

Document Processing
Stream accuracy versus relative throughput for Mistral-7B and XGBoost models

LLMs for Insurance Document Automation

We explore LLM applications for page stream segmentation in insurance document processing, demonstrating that parameter-efficient fine-tuning achieves strong accuracy but revealing significant calibration challenges that limit deployment confidence.

Time Series Forecasting
Forecasting comparison of different neural architectures on the Multiscale Lorenz-96 system

Optimizing Sequence Models for Dynamical Systems

We systematically ablate core mechanisms of Transformers and RNNs, finding that attention-augmented Recurrent Highway Networks outperform standard Transformers on forecasting high-dimensional chaotic systems.

Document Processing
Diagram showing page stream segmentation workflow: an input stream of pages is processed through binary classification of page pairs to predict document breaks, producing segmented output documents

LLMs for Page Stream Segmentation

We create TabMe++, an enhanced page stream segmentation benchmark with commercial-grade OCR, and show that parameter-efficiently fine-tuned decoder-based LLMs like Mistral-7B achieve 80% straight-through processing rates, dramatically outperforming encoder-based models.

Computational Biology
Molecular visualization of a methionine dipeptide structure from MD simulation

Mini-Protein Trajectory Generation

An automated GROMACS pipeline for generating high-fidelity molecular dynamics datasets suitable for machine learning, simulating capped dipeptides across nine residue types with 0.1 ps resolution and atomic force extraction optimized for training Neural Network Potentials.

Computational Social Science
Diagram of the Universal Message schema showing fields like ID, Text, Author, and Reply Sets that normalize data across platforms

PyConversations: Social Media Conversational Analysis

Research project that investigated how different NLP models perform on social media data, finding that domain-specific approaches often outperform large pre-trained models. Includes PyConversations, a Python module for analyzing conversations across social media platforms.

AI Safety
A nonsensical trigger sequence 'WTC theoriesclimate Flat Hubbard Principle' is fed into GPT-2, which then generates Flat Earth conspiracy text

GPT-2 Susceptibility to Universal Adversarial Triggers

We demonstrate that universal adversarial triggers can control both the topic and stance of GPT-2’s generated text, revealing security vulnerabilities in deployed language models and proposing constructive applications for bias auditing.

Computational Chemistry

Molecular Sets (MOSES): A Generative Modeling Benchmark

MOSES introduces a comprehensive benchmarking platform for molecular generative models, offering standardized datasets, evaluation metrics, and baselines.

Document Processing
Chart showing the trade-off between accuracy and throughput in document automation

The Reliability Trap: The Limits of 99% Accuracy

We explore the ‘Silent Failure’ mode of LLMs in production: the limits of 99% accuracy for reliability, how confidence decays in long documents, and why standard calibration techniques struggle to fix it.