# Need help converting yellow scans to white



## JustJoe (Oct 26, 2012)

Although I prefer a hardcopy, I've downloaded quite a few old books and catalogs while doing old-tool research. It seems that 99 times out of 100 the original was yellowed with age and the people doing the scan just tossed it through the machine without tweaking the color settings so I'm left with a pdf that looks like this:









(That is actually one of the "better" ones.) I need a way to tweak these so the background is white. It would also be nice, if possible, to sharpen the text so I can read it without getting dizzy. Most of these books/catalogs are in pdf form. Is there a truly *FREE* conversion software that I can use to rescan a pdf and fix the colors at the same time? I wouldn't mind even if it was some sort of OCR that pulled out the text as long as it left space for the images to be cut/pasted too.


----------



## Tim457 (Jan 11, 2013)

This is actually harder than it seems. That yellow color on the computer is made up of a mix of colors and so is the black. When you remove the yellow to white, it removes those same colors from the black and the black actually gets worse, then you have to use other techniques to try to smooth the text back out without messing it up and merging parts of letters, etc.

That said there is a tool called scan tailor that is designed to help with that, but it still takes some tweaking. Here's the standard version. (you probably want the .exe one) and if that doesn't work, you can try the experimental version.

There's some people that scan old books and try to make nice readable books at diybookscanner.org if you want to look for still more options.


----------



## HamS (Nov 10, 2011)

The best way to fix this is to use character recognition software and convert the scan to text. Then if you want a pdf instead of text, print the text file to a pdf virtual printer. You will probably have to spend a little time cleaning up the text where characters were not properly recognized. All this kind of software is available as freeware.


----------



## HamS (Nov 10, 2011)

http://www.makeuseof.com/tag/top-5-free-ocr-software-tools-to-convert-your-images-into-text-nb/


----------



## renners (Apr 9, 2010)

Any decent image viewing software should have controls for saturation, brightness/contrast and sharpness. First open your pdf image, desaturate it (gets rid of all the colour), then add contrast, brighten and sharpen it to taste.
'Preview' will do it on a Mac, if you are on Windows, Picasa from Google will do it, or if you bought a digital camera in the last ten years it probably came with Photoshop elements which would do it too.


----------



## JustJoe (Oct 26, 2012)

The problem with any of the OCR software that I've found is the images. That page I showed doesn't have any images, but most of them do, like this:









The OCR gets some or all of the words, depending on how good it is, and then the pics need to be cut/pasted over one by one. And each image I cut/paste still has the yellow background that needs fixed. One or two can be fixed by hand, but not an entire catalog.

With the image software, I'm using irfanview to create and edit images. But these are pdf files with anywhere from 50-500 pages. They don't open in irfanview, or any of the other image editors I've found. I have to take a snapshot of each page in the pdf, paste it into the image software, do all the saturation/contrast editing and then find a way to save all those newly created images back into one coherent book.

That "Scan Tailor" looks promising, but it needs images to input, not a multi-page pdf.


----------



## patcollins (Jul 22, 2010)

Hey Joe

I am hoping that is a low resolution version of your file because I cant read any text on it to begin with.

I found a couple things that may help. I have Adobe Photoshop Elements version 9, which is the cheap consumer one, cost about $75. There is an eraser tool in it that has a couple versions, one is background eraser that lets you set a threshold and erase the background, the other is the magic eraser that tries to do the thinking for you, and it seemed to work quite well with the low resolution image you provided.

Adobe Acrobat Pro may be able to do something with it, I dont have that at home only work.


----------



## renners (Apr 9, 2010)

You might be able to set up an 'Action' in Elements to batch process the pages. You start 'recording' perform saturation, brightness, contrast, unsharp mask, save, stop 'recording'. I think you still need to manually open each page though.


----------



## JustJoe (Oct 26, 2012)

*Adobe Photoshop Elements version 9, *which is the cheap consumer one,* cost about $75*

That's about $75 more than I'm willling to spend on the project. I just found an add-on for my Irfanview that let me open the pdf file, all the pages of it at once. So I'm heading in the right direction.


----------



## patcollins (Jul 22, 2010)

Some good free image editing software, paint.net and GiMP might also be worth checking out.


----------



## JustJoe (Oct 26, 2012)

↑ thanks.


----------



## renners (Apr 9, 2010)

RELEASE THE GIMP!


----------



## DanLyke (Feb 8, 2007)

I came in here to suggest GIMP, but you really want something that'll extract the images, do those operations on them, and put them back into the PDF.

If you're willing to take the time to learn how to make it scream, ImageMagick is the best tool around for scripting operations like this, and I've used it to do things like pick apart the city budget, do a bunch of OCR on it, and recompose selected portions.


----------

