In Part 1 of this series I looked at basics of scanning a book or article. In this section I look at how to processing the scanned images into OCRed PDF files..
1) Rotating and cropping
Processing can be done straight after scanning or done in a batch later. Using Advanced Tiff Editor rotate the images (using Select All [CONTROL + A] first). Resize the image so you can see the whole page on the screen and use the crop tool to cut it down to the required size.
2) Cleaning up gutters, spots and shadows Using the cut tool, cut out gutter-shadows, dark edges, spots or other marks. Once all the pages are completed, save the file and exit the program.
Converting to an OCRed PDF
Open Adobe Acrobat and use the “Recognise Text in Multiple Files” tool to select all the finished files. Select a 300dpi overlay and run.
Final optimisation and output
The outputted pdf file appears automatically on your disk. Check the file size and the quality. If the image quality is poor or the file seems excessively large (experience will tell you this) run Adobe’s file optimisation option.
I make no claim that is the definitive introduction, but it is based on several years experience acquired using the hardware and software described. Can anyone contribute any insights they have gained in doing similar work? If so, please leave a comment.
This is the first in a 2-part series in which I will be explaining the basics of digitisation. I am assuming no previous experience with any of the hardware of software that will be mentioned.
A Word about Copyright
Before you begin work on any printed material you must ensure that it is either in the Public Domain or that you have the written permission of the current copyright holder. Failure to do so is a breach of International Laws and could result in prosecution. Never assume that you have the right to digitise any material without checking its copyright status and obtaining explicit permission to proceed.
What you will need
Advanced Tiff Editor: This is available for purchase and download from http://www.tiffedit.com/ You will use this software to prepare images of the material in multi-page TIFF format ready to conversion to PDF. This editing software is quite powerful and you will probably only need to use a few of its features.
The software is available for personal use, business use and as a site licence. You can get a 10% discount on this software using this code: 0764E089F0. I am a member of this compnanies affiliate program and so I am able offer this discount from the commission received from sales.
Adobe Acrobat Writer: This software outputs the TIFF images as PDF documents outputs your material as a finished PDF document reader for use. It may be possible to obtain cheaper PDF writing software from other sources, but they must be able to perform the tasks described below to be useful. It is not necessary to purchase the “Professional” version (which is more expensive). All the features you will need are included in the “Standard” version, such as Adobe Acrobat X Standard.
DropBox: This program allows you to back-up and share your files with colleagues easily. The free version gives you 2GB of free storage. Use this link to get an extra 500MB when you download and install the software.
PC / Laptop: In order to run the above software you will need a PC or Laptop running MS Windows XP or newer operating system. Adobe Acrobat is the most demanding piece of software, so you should check the minimum requirements of both the version of Acrobat Writer you purchase and that of the scanner to ensure that your machine is powerful enough to run them.
Scanner: You will require a professional quality flat-bed scanner such as the Canon CanoScan 9000F Color Image Scanner. It is possible to scan material using cheaper scanners, but they are generally much slower and their lamps are not as bright. Using an inferior scanner will increase the length of time spent scanning and result in poorer quality results, especially if the material does not lie completely flat on the scanner platen as in the case of tightly bound journal articles and books. Investing in a good quality scanner will be worth the extra initial cost.
The Scanning Process
1) Running Advanced Tiff Editor
Start the program (with the scanner connected and turned on) and click on the “Acquire” button (it has a picture of a scan on it). This brings up the scanner preview and settings screen. Place the material to be scanned on the platen and click on the “preview” button.
2) Setting scan resolution and colour
Scanning at 300 dpi gives the best balance between image quality and file size. Scan in Black & White unless the bending of the pages means that the text becomes unreadable. In this case Greyscale should be selected, which gives you extra options to improve the image. The increased file size can be reduced using Adobe Acrobat later (see Part 3).
Once you are happy with the settings click the “Scan” button and see what the scan looks like. If it is OK then continue to the next page and repeat until the document is complete.
4) Saving the images
Save the file using a web-safe file name (one that does not have illegal characters like spaces or capital letters). It will save time if you use the same file-name throughout the process, although the extension will change from .tif to .pdf.
In Part 2 I will explain how the scanned image is processed prior to conversion to a PDF file.