This section collects research notes on computational chemistry. The bulk of the notes cover optical chemical structure recognition (OCSR), the problem of extracting structured chemical information from images of molecular diagrams. The remaining notes span molecular dynamics simulations, molecular representations, chemical language models that learn from molecular string notations, LLMs for chemistry covering multimodal and reasoning models adapted for chemical tasks, molecular modeling, curated datasets, and a small set of classic papers in the field.
LLMs for Chemistry
An archive of notes covering large language models and multimodal models adapted for chemistry tasks. Unlike chemical language models that learn directly from molecular string representations (SMILES, SELFIES), these models build on general-purpose LLM or VLM backbones and are fine-tuned for chemical reasoning, multimodal molecular understanding, or scientific document processing.