🔍 Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo) with 🧠 LLM.
🔍 BetterOCR combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.
Before | After (✨ latest at v1.2.0) |
---|---|
Pororo is used only if the language options (lang
) specified include either 🇺🇸 English (en
) or 🇰🇷 Korean (ko
). Also additional dependencies listed in [tool.poetry.group.pororo.dependencies]
must be available. (If not, it'll automatically be excluded from enabled engines.)
Full Changelog: https://github.com/junhoyeo/BetterOCR/compare/v1.1.2...v1.2.0
🔍 BetterOCR combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.
Full Changelog: https://github.com/junhoyeo/BetterOCR/compare/v1.1.1...v1.1.2
🔍 BetterOCR combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.
detect_boxes
's fallback logic (when LLM output format is invalid)Full Changelog: https://github.com/junhoyeo/BetterOCR/compare/v1.1.0...v1.1.1
🔍 BetterOCR combines results from multiple OCR engines with an 🧠 LLM to correct & reconstruct the output.
detect_boxes
) had been implemented by @junhoyeo in https://github.com/junhoyeo/BetterOCR/pull/1
Original | Detected |
---|---|
Full Changelog: https://github.com/junhoyeo/BetterOCR/commits/v1.1.0
🔍 Better text detection by combining multiple OCR engines with 🧠 LLM.
OCR still sucks! ... Especially when you're from the other side of the world (and face a significant lack of training data in your language) — or just not thrilled with noisy results.
BetterOCR combines results from multiple OCR engines with an LLM to correct & reconstruct the output.
Head over to 💯 Examples to view performace by languages (🇺🇸, 🇰🇷, 🇮🇳).
Coming Soon: improved interface, async support, box detection, and more.