A substantial fraction of chemical knowledge is recorded as 2D diagrams in journals, patents, and textbooks. Optical Chemical Structure Recognition (OCSR) is the task of extracting machine-readable molecular representations from those images: strings like SMILES (a compact text encoding of molecular structure) and InChI (a standardized identifier for chemical substances), or molecular graphs that encode atoms as nodes and bonds as edges. For a longer introduction to the field and its motivations, see the What is OCSR? post.
The notes are organized into eight sub-groups:
- Rule-Based Systems cover the original OCSR pipeline (1990s to mid-2010s): vectorize an image, classify atoms and bonds with hand-coded rules, and compile a connection table. Tools like Kekulé, CLiDE, OSRA, and ChemReader defined this era.
- Image-to-Sequence Models reframe recognition as image captioning, using encoder-decoder architectures to generate SMILES, InChI, or SELFIES strings. DECIMER, Img2Mol, and SwinOCSR are representative examples.
- Image-to-Graph Models predict molecular graphs directly, detecting atoms and bonds as nodes and edges. This includes MolGrapher, MolScribe, and ABC-Net.
- Vision-Language Models represent the latest generation, building on large pretrained vision-language backbones for improved generalization. MolParser, GTR-CoT, MolNexTR, and SubGrapher fall here.
- Hand-Drawn Structure Recognition addresses the distinct challenge of interpreting molecules sketched by hand, from early structural analysis to modern deep learning augmentation strategies.
- Online Recognition processes real-time pen strokes on tablets and touch devices, using stroke order and timing for chemical symbol and expression recognition.
- Benchmarks and Reviews collects survey papers, the TREC-Chem 2011 and CLEF-IP 2012 competition reports and system descriptions, and comparative analyses of OCSR tools.
- Markush Structures covers detection and parsing of the generic chemical representations used in patents to claim compound families.
For orientation, the two survey papers in the benchmarks group are the best starting points: Rajan et al. 2020 covers the rule-based era and benchmarks the transition period, while Musazade et al. 2022 picks up the thread with deep learning methods.