TextCleaner is a script created by Fred Weinhaus. It’s used to clean scanned documents and to make the final image more readable for OCR process.
Today I will show you how to install and run TextCleaner along with a simple command to pre-process an image.
First step is to download ImagMagick. Scroll down to find Windows binaries and then install the program.
Next download Cygwin.
Cygwin is a command line interface environment similar to Unix, but designed for Microsoft-based platforms. Cygwin allows the development and testing of Windows-based applications on a Unix-like platform. Thus, Unix applications can be developed in a Cygwin environment to be executed easily on the Windows platform with little original source code alteration. In addition, Windows applications could be launched from within Cygwin as well as using Cygwin tools and applications in the context of Windows systems.
Actually, it is the only way I found to execute the scripts I need on Windows.
Run the installer (I reccomend to leave all with default values, especially the installation folder) and pay attention when you arrive at this screen:
Here, be sure to select Full in View menu, write bc in Search and select the first element (make sure it is under math category) in the list. This is an important package useful to correctly run most of the scripts that use ImageMagick (and not only). Go ahead and finish the installation.
After that there are two important steps you have to do: add some folder to environment path and create a new variable. Let’s start with this last one.
Under System variables create a new variable and add as name SHELLOPTS and as value igncr.
Then add these paths. For ImageMagick add the path where you installed it.
Under User variables, add the next paths.
Now, last step is to download the script you want to use: in this case we will download TextCleaner script from Fred Weinhaus site.
Once downloaded, take it and paste in C:\cygwin64\bin (or where you installed Cygwin). Then rename the script from textcleaner to textcleaner.exe
That said, launch your Terminal from the CygwinTerminal shortcut. Thats all. Now it’s only about use scripts and know commands.
We will use this for our demonstration:
textcleaner -g -e normalize -f 5 -o 10 -s 2 C:/Users/Link/Desktop/card.jpg C:/Users/Link/Desktop/clean-card.png
I will not explain here what the parameters are or their values. I will only say that it’s structured like this:
scriptname -parameter(s) input_image_location.extension output_image_location.extension
Thats the result of the script:
Now, doing OCR on the processed image should not be so difficult right ?
This script and many others developed by Fred Weinhaus are free only for non-commercial use, otherwise you will need to contact him for a license.