Improve OCR Results with Sparrow (running on Streamlit/Python and Ngrok)
Описание
OCR can often generate results in a different order. But to produce a dataset for data extraction ML model fine-tuning (for example - Donut), fields in all documents must be ordered correctly. Our solution (open-source), Sparrow, for data annotation/labeling includes functionality for OCRed field reordering. In this video, I explain and show how it works.
Sparrow - data extraction from documents with ML:
https://github.com/katanaml/sparrow
Sparrow UI running on Hugging Face Spaces:
https://katanaml-org-sparrow-ui.hf.space
0:00 Introduction
0:40 Sparrow
1:15 OCRed Results Reordering
4:50 Deployment with NGROK
5:40 Deployment with Hugging Face Spaces
8:15 Code
9:27 NGROK
10:30 Summary
CONNECT:
- Subscribe to this YouTube channel
- Twitter: https://twitter.com/andrejusb
- LinkedIn: https://www.linkedin.com/in/andrej-baranovskij/
- Medium: https://medium.com/@andrejusb
#machinelearning #python #data
Рекомендуемые видео



















