Research Overview

My research spans machine learning theory and practical applications, focusing on natural language processing, AI safety, and real-world challenges in computational science and industry. I’m particularly interested in exploring how to make AI systems more robust and interpretable across diverse domains—from document automation to social media analysis.

Quick Navigation:


Peer-Reviewed Publications

These publications represent my contributions to machine learning and NLP, exploring both practical applications and fundamental research questions.

“Page Stream Segmentation with LLMs: Challenges and Applications in Insurance Document Automation”
Hunter Heidenreich, Ratish Dalvi, Nikhil Verma, Yosheb Getachew
31st International Conference on Computational Linguistics: Industry Track (COLING ‘25), pp. 305-317
📄 Read Paper

Explores LLM applications for document processing automation in insurance, using parameter-efficient fine-tuning for page stream segmentation with analysis of calibration challenges in high-stakes applications.

“The earth is flat and the sun is not a star: The susceptibility of GPT-2 to universal adversarial triggers”
Hunter Scott Heidenreich, Jake Ryland Williams
AAAI/ACM Conference on AI, Ethics, and Society (AIES ‘21), pp. 566-573
📄 Read Paper

Examines universal adversarial triggers in natural language models, showing how specific text sequences can manipulate GPT-2’s outputs on controversial topics—highlighting important vulnerabilities in language model deployment.

“Latent semantic network induction in the context of linked example senses”
Hunter Heidenreich, Jake Williams
5th Workshop on Noisy User-generated Text (W-NUT 2019), pp. 170-180
📄 Read Paper

Explores data-driven wordnet construction using Wiktionary, showing that semantic networks can be effectively induced from noisy, user-annotated lexical resources.

Preprints & Working Papers

My ongoing research explores applications and questions across multiple domains in machine learning and NLP.

Document Processing & Automation

Work on improving document processing workflows through machine learning.

“Large Language Models for Page Stream Segmentation”
Hunter Heidenreich, Ratish Dalvi, Rohith Mukku, Nikhil Verma, Neven Pičuljan
arXiv preprint arXiv:2408.11981 (2024)
📄 Read Paper

Introduces the TABME++ benchmark and evaluates LLM performance on page stream segmentation, demonstrating superior performance of decoder-based models with parameter-efficient fine-tuning.

Forecasting & Dynamical Systems

Research on neural architectures for temporal and spatiotemporal prediction tasks.

“Deconstructing recurrence, attention, and gating: Investigating the transferability of transformers and gated recurrent neural networks in forecasting of dynamical systems”
Hunter S Heidenreich, Pantelis R Vlachas, Petros Koumoutsakos
arXiv preprint arXiv:2410.02654 (2024)
📄 Read Paper

Ablation study examining key architectural components of neural networks for spatiotemporal forecasting, finding that neural gating and attention improve RNN performance while recurrence is detrimental to transformers.

NLP Foundations & Word Embeddings

Research on representation learning and the mathematical foundations of embedding methods.

“EigenNoise: A Contrastive Prior to Warm-Start Representations”
Hunter Scott Heidenreich, Jake Ryland Williams
arXiv preprint arXiv:2205.04376 (2022)
📄 Read Paper

Novel initialization scheme for word vectors based on dense co-occurrence modeling, achieving competitive performance with GloVe without requiring pre-training data.

“To Know by the Company Words Keep and What Else Lies in the Vicinity”
Jake Ryland Williams, Hunter Scott Heidenreich
arXiv preprint arXiv:2205.00148 (2022)
📄 Read Paper

Analytical model of statistics learned by Word2Vec and GloVe, deriving the first known solution to Word2Vec’s softmax-optimized skip-gram algorithm and analyzing bias detection in word embeddings.

Social Media & Computational Social Science

Studies of social phenomena and information patterns using computational approaches.

“NewsTweet: a dataset of social media embedding in online journalism”
Munif Ishad Mujib, Hunter Scott Heidenreich, Colin J Murphy, Giovanni C Santia, Asta Zelenkauskaite, Jake Ryland Williams
arXiv preprint arXiv:2008.02870 (2020)
📄 Read Paper

Large-scale dataset and analysis pipeline studying social media embedding in digital news, finding 13% of news stories include embedded tweets with insights into newsworthiness patterns.

“Investigating Coordinated ‘Social’ Targeting of High-Profile Twitter Accounts”
Hunter Scott Heidenreich, Munif Ishad Mujib, Jake Ryland Williams
arXiv preprint arXiv:2008.02874 (2020)
📄 Read Paper

Investigation of coordinated manipulation campaigns targeting high-profile Twitter accounts, revealing bot networks and compromised accounts during the 2020 U.S. presidential election period.

Datasets & Experiments

In addition to publications, I work on practical machine learning applications and create resources for the research community.

US Policy Area Classification Challenge
Developed machine learning models to classify congressional bills by policy area with 87%+ accuracy.
Read the full analysis →