EasyOCR — free paperless office
Paper-free life made simple 👌 and free 💰 = ❤️
I created a EasyOCR to streamline my process of batch processing documents for text recognition using free/open software. EasyOCR packs all dependencies up in a single docker container and makes it easy to setup everywhere.
Just 3 simple steps
- Scan using App
- OCR using free software
- Store and Find
Step 1: Scan using App
I am using my phone to scan any document, because the photo quality is more than enough and there are good and free apps to do so.
I chose the iOS Version of Scanbot (https://scanbot.io).
The free version supports, permanent flashlight to take bright enough pics, simple manual cropping and sharing to Dropbox as PDF.
So I take pics of the document, crop the pages, and save the assembled PDF to my dropbox.
Step 2: OCR
I found out that most commercial applications and even hardware scanners that do OCR use this one open source lib: https://github.com/tesseract-ocr/tesseract
Since this is just a library I am using an app called OCRmyPDF that reads in files as PDF and does the heavy lifting for me: https://github.com/jbarlow83/OCRmyPDF
To not have to worry about the setup I build myself a github repo that uses docker: https://github.com/Extrawurst/easy-ocr
EasyOCR takes a src
folder and batch processes all PDFs by OCR’ing them and saving them in a dst
folder:
See example usage here:
Step 3: Storing and Searching
There are a lot of options and considerations here: Cloud, NAS, external drives, encrypted, redundant and so on. It really depends on the usecase. I am using a regular cloud provider with an encripted drive.
Regarding redundancy I am syncing the drive to my laptop and to a NAS at home.
Finder (the standard file explorer on mac os) is my tool for full-text search and does it’s job. It might not scale well but against my expectations I really do not search that often.
The key for me is a good file naming convention and folder structure:
Filenames are like: 2019-01-26-descriptive-label.pdf
— often for letters i add the sender to the filename already.
Folders convention: A folder per year was enough so far.
Conclusion
I don’t want to go back to hoarding bags of paper. I like the convenience of having everything save and sound and still stream lined and efficient. Paperless also plays nice with minimalism — out of sight out of mind 👍