EasyOCR — free paperless office

Stephan D.
3 min readAug 5, 2019

--

Paper-free life made simple 👌 and free 💰 = ❤️

I created a EasyOCR to streamline my process of batch processing documents for text recognition using free/open software. EasyOCR packs all dependencies up in a single docker container and makes it easy to setup everywhere.

Just 3 simple steps

  1. Scan using App
  2. OCR using free software
  3. Store and Find

Step 1: Scan using App

I am using my phone to scan any document, because the photo quality is more than enough and there are good and free apps to do so.

I chose the iOS Version of Scanbot (https://scanbot.io).

The free version supports, permanent flashlight to take bright enough pics, simple manual cropping and sharing to Dropbox as PDF.

So I take pics of the document, crop the pages, and save the assembled PDF to my dropbox.

Step 2: OCR

I found out that most commercial applications and even hardware scanners that do OCR use this one open source lib: https://github.com/tesseract-ocr/tesseract

Since this is just a library I am using an app called OCRmyPDF that reads in files as PDF and does the heavy lifting for me: https://github.com/jbarlow83/OCRmyPDF

To not have to worry about the setup I build myself a github repo that uses docker: https://github.com/Extrawurst/easy-ocr

EasyOCR takes a src folder and batch processes all PDFs by OCR’ing them and saving them in a dst folder:

See example usage here:

Step 3: Storing and Searching

There are a lot of options and considerations here: Cloud, NAS, external drives, encrypted, redundant and so on. It really depends on the usecase. I am using a regular cloud provider with an encripted drive.

Regarding redundancy I am syncing the drive to my laptop and to a NAS at home.

Finder (the standard file explorer on mac os) is my tool for full-text search and does it’s job. It might not scale well but against my expectations I really do not search that often.

The key for me is a good file naming convention and folder structure:

Filenames are like: 2019-01-26-descriptive-label.pdf — often for letters i add the sender to the filename already.

Folders convention: A folder per year was enough so far.

Conclusion

I don’t want to go back to hoarding bags of paper. I like the convenience of having everything save and sound and still stream lined and efficient. Paperless also plays nice with minimalism — out of sight out of mind 👍

--

--

Stephan D.
Stephan D.

Written by Stephan D.

Founder of gameroasters and @liveask1 , #indie #gamedev, #rust enthusiast, worked previously at @innogames , @ubisoft

No responses yet