Read and extract text and other content from PDFs in C# (port of PDFBox)
OCR engine for all the languages
Document Layout Analysis resources repos for development with PdfPig.
Conversions between various OCR formats
An OCR evaluation tool
Text Overlay plugin for Mirador 3
ALTO XML schema - latest and all former versions
Python tools for performing various operations on ALTO XML files
Kitodo.Presentation is a feature-rich framework for building a METS- or ...