A substantial fraction of chemical knowledge is recorded as 2D diagrams in journals, patents, and textbooks. Optical Chemical Structure Recognition (OCSR) is the task of extracting machine-readable molecular representations from those images: strings like SMILES (a compact text encoding of molecular structure) and InChI (a standardized identifier for chemical substances), or molecular graphs that encode atoms as nodes and bonds as edges. For a longer introduction to the field and its motivations, see the What is OCSR? post.

The notes are organized into eight sub-groups:

For orientation, the two survey papers in the benchmarks group are the best starting points: Rajan et al. 2020 covers the rule-based era and benchmarks the transition period, while Musazade et al. 2022 picks up the thread with deep learning methods.