LumberJocks

Need help converting yellow scans to white

  • Advertise with us

« back to Coffee Lounge forum

Forum topic by JustJoe posted 11-18-2013 05:52 PM 616 views 0 times favorited 12 replies Add to Favorites Watch
View JustJoe's profile

JustJoe

1554 posts in 689 days


11-18-2013 05:52 PM

Although I prefer a hardcopy, I’ve downloaded quite a few old books and catalogs while doing old-tool research. It seems that 99 times out of 100 the original was yellowed with age and the people doing the scan just tossed it through the machine without tweaking the color settings so I’m left with a pdf that looks like this:

(That is actually one of the “better” ones.) I need a way to tweak these so the background is white. It would also be nice, if possible, to sharpen the text so I can read it without getting dizzy. Most of these books/catalogs are in pdf form. Is there a truly FREE conversion software that I can use to rescan a pdf and fix the colors at the same time? I wouldn’t mind even if it was some sort of OCR that pulled out the text as long as it left space for the images to be cut/pasted too.

-- This Ad Space For Sale! Your Ad Here! Reach a targeted audience! Affordable Rates, easy financing! Contact an ad represenative today at JustJoe's Advertising Consortium.


12 replies so far

View Tim's profile

Tim

1267 posts in 612 days


#1 posted 11-18-2013 07:47 PM

This is actually harder than it seems. That yellow color on the computer is made up of a mix of colors and so is the black. When you remove the yellow to white, it removes those same colors from the black and the black actually gets worse, then you have to use other techniques to try to smooth the text back out without messing it up and merging parts of letters, etc.

That said there is a tool called scan tailor that is designed to help with that, but it still takes some tweaking. Here's the standard version. (you probably want the .exe one) and if that doesn’t work, you can try the experimental version.

There’s some people that scan old books and try to make nice readable books at diybookscanner.org if you want to look for still more options.

View HamS's profile

HamS

1168 posts in 1040 days


#2 posted 11-18-2013 09:08 PM

The best way to fix this is to use character recognition software and convert the scan to text. Then if you want a pdf instead of text, print the text file to a pdf virtual printer. You will probably have to spend a little time cleaning up the text where characters were not properly recognized. All this kind of software is available as freeware.

-- My mother named me Hamilton, I have been trying to earn my nickname ever since.

View HamS's profile

HamS

1168 posts in 1040 days


#3 posted 11-18-2013 09:10 PM

http://www.makeuseof.com/tag/top-5-free-ocr-software-tools-to-convert-your-images-into-text-nb/

-- My mother named me Hamilton, I have been trying to earn my nickname ever since.

View 404 - Not Found's profile

404 - Not Found

2544 posts in 1620 days


#4 posted 11-18-2013 09:13 PM

Any decent image viewing software should have controls for saturation, brightness/contrast and sharpness. First open your pdf image, desaturate it (gets rid of all the colour), then add contrast, brighten and sharpen it to taste.
‘Preview’ will do it on a Mac, if you are on Windows, Picasa from Google will do it, or if you bought a digital camera in the last ten years it probably came with Photoshop elements which would do it too.

View JustJoe's profile

JustJoe

1554 posts in 689 days


#5 posted 11-18-2013 09:24 PM

The problem with any of the OCR software that I’ve found is the images. That page I showed doesn’t have any images, but most of them do, like this:

The OCR gets some or all of the words, depending on how good it is, and then the pics need to be cut/pasted over one by one. And each image I cut/paste still has the yellow background that needs fixed. One or two can be fixed by hand, but not an entire catalog.

With the image software, I’m using irfanview to create and edit images. But these are pdf files with anywhere from 50-500 pages. They don’t open in irfanview, or any of the other image editors I’ve found. I have to take a snapshot of each page in the pdf, paste it into the image software, do all the saturation/contrast editing and then find a way to save all those newly created images back into one coherent book.

That “Scan Tailor” looks promising, but it needs images to input, not a multi-page pdf.

-- This Ad Space For Sale! Your Ad Here! Reach a targeted audience! Affordable Rates, easy financing! Contact an ad represenative today at JustJoe's Advertising Consortium.

View patcollins's profile

patcollins

995 posts in 1516 days


#6 posted 11-18-2013 10:35 PM

Hey Joe

I am hoping that is a low resolution version of your file because I cant read any text on it to begin with.

I found a couple things that may help. I have Adobe Photoshop Elements version 9, which is the cheap consumer one, cost about $75. There is an eraser tool in it that has a couple versions, one is background eraser that lets you set a threshold and erase the background, the other is the magic eraser that tries to do the thinking for you, and it seemed to work quite well with the low resolution image you provided.

Adobe Acrobat Pro may be able to do something with it, I dont have that at home only work.

View 404 - Not Found's profile

404 - Not Found

2544 posts in 1620 days


#7 posted 11-18-2013 10:49 PM

You might be able to set up an ‘Action’ in Elements to batch process the pages. You start ‘recording’ perform saturation, brightness, contrast, unsharp mask, save, stop ‘recording’. I think you still need to manually open each page though.

View JustJoe's profile

JustJoe

1554 posts in 689 days


#8 posted 11-18-2013 10:52 PM

Adobe Photoshop Elements version 9, which is the cheap consumer one, cost about $75

That’s about $75 more than I’m willling to spend on the project. I just found an add-on for my Irfanview that let me open the pdf file, all the pages of it at once. So I’m heading in the right direction.

-- This Ad Space For Sale! Your Ad Here! Reach a targeted audience! Affordable Rates, easy financing! Contact an ad represenative today at JustJoe's Advertising Consortium.

View patcollins's profile

patcollins

995 posts in 1516 days


#9 posted 11-18-2013 11:30 PM

Some good free image editing software, paint.net and GiMP might also be worth checking out.

View JustJoe's profile

JustJoe

1554 posts in 689 days


#10 posted 11-18-2013 11:33 PM

↑ thanks.

-- This Ad Space For Sale! Your Ad Here! Reach a targeted audience! Affordable Rates, easy financing! Contact an ad represenative today at JustJoe's Advertising Consortium.

View 404 - Not Found's profile

404 - Not Found

2544 posts in 1620 days


#11 posted 11-18-2013 11:43 PM

RELEASE THE GIMP!

View Dan Lyke's profile

Dan Lyke

1474 posts in 2776 days


#12 posted 11-19-2013 12:26 AM

I came in here to suggest GIMP, but you really want something that’ll extract the images, do those operations on them, and put them back into the PDF.

If you’re willing to take the time to learn how to make it scream, ImageMagick is the best tool around for scripting operations like this, and I’ve used it to do things like pick apart the city budget, do a bunch of OCR on it, and recompose selected portions.

-- Dan Lyke, Petaluma California, http://www.flutterby.net/User:DanLyke

Have your say...

You must be signed in to reply.

DISCLAIMER: Any posts on LJ are posted by individuals acting in their own right and do not necessarily reflect the views of LJ. LJ will not be held liable for the actions of any user.

Latest Projects | Latest Blog Entries | Latest Forum Topics

HomeRefurbers.com

Latest Projects | Latest Blog Entries | Latest Forum Topics

GardenTenders.com :: gardening showcase