Website Development Plans for 2015

Theology on thr Web NewsOver the Christmas holiday I have evaluating how best to spend my time on Website development and have decided on the following plans for 2015 (in no particular order):

Plans for 2015

1) Addition of at least one free-to-download commentary on each book of the Bible

Looking at my visitor logs for 2014 I was clear that the most popular downloads were of commentaries. During 2015 I will be trying to find more public domain works that still have value to the biblical exegete and make them available via the website.

2) Digitisation of a Kymer/English Theological Journal

Continuing in my commitment to make international Biblical Scholarship available I am very pleased to announce that I have received permission to digitise Honeycomb – a Kymer-English journal from Cambodia. I am looking forward to making this available – hopefully by Easter.

3) Continuing the digitisation of public domain materials from long-running theological journals

A number of theological journals have been published for so long that much of their older material is now in the public domain. These include:

Bulletin of the John Rylands Library

Journal of Theological Studies (old series)

Journal of the Transactions of the Victoria Institute

Palestine Exploration Quarterly

4) New Public Domain Material

As of 1st January for those publications covered by the “70-year from death of author” copyright term, those works by authors who died during 1944 and before have entered the public domain. I will be therefore working through my sites and making this material available. These include works by:

James Moffatt [1870-1944]

Robert Martin Pope [1865-1944]

Thomas Banks Strong [1861-1944]

Visitor numbers to the websites continue to climb, and are expected to exceed 2 million  this year. I would like to take this opportunity to thank my readers for their ongoing support and encouragement.

Some Thoughts about Archive.org Digital Texts

A few months ago I was asked to undertake a project for Tyndale House which involved searching through their catalogue for out-of-copyright books and try to link these with electronic versions already on-line. This naturally brought me to archive.org to search through its massive collection of on-line texts. On the basis of that experience I thought that it might be helpful to write down a few thoughts on the subject that may spark a discussion.

First of all, here are some of many positive features of archive.org texts:

  1. There is a huge amount of material available. Well over 90% of the 450 titles I searched for were already there.
  2. This material can be downloaded in a wide range of formats, including, PDF, DJVU, TEXT, HTML  and Kindle compatible files.
  3. The site is supported by an enthusiastic user-base who are constantly adding new material.
Now, some issues that need to be considered.
  1. Some books that are still under copyright in the UK because they were printed there are listed as being in the Public Domain on archive.org because it is hosted in the United States. In order to prevent them being downloaded outside the US Google Books (linked from  archive.org) has blocked non-US IP addresses from accessing them – which of course can always be circumvented using a US-based proxy.
  2. Some material that is in the Public Domain in the UK is being blocked by Google Books..
  3. The first two points serve as a reminder that users cannot rely on the accuracy of the copyright declaration on the site outside of the US – you need to double check everything.
  4. Some scans are incomplete and/or of poor quality.
  5. Scans to PDF are often very large files. By reprocessing the files it is possible to reduce the file size by 50% in one trial I conducted.
  6. The search facility is fine if you know the exact title of the work you are after. However, if you misspell it or get a word wrong then the book you are after will not appear in the results.
  7. Perhaps as a result of (6) the usage statistics listed next to certain titles showing the number of downloads are often surprisingly low.
Please “weigh” rather than just “count” the points above, as the benefits of the site far outweigh the negative issues. For me they indicate a number of opportunities to make this work further:
  • Important UK-published theological books in the Public Domain could be re-scanned and hosted so as to avoid the unnecessary blocks on accessing them.
  • Poor quality scans can be replaced.
  • When serving users on dial-up or slow access Internet connections there is scope for reprocessing selected works and hosting them elsewhere to reduce the file sizes.
  • The site lends itself to being linked with specialist bibliographies (such as those provided by the TheologyOnTheWeb sites) linked directly to material hosted on archive.org. This gets round the problem of searches when the material is not being blocked.
What has been your experience with using archive.org? Can you suggest any other ways in which the wealth of material there can be better used?

An Introduction to Digitisation Part 2 of 2: Processing the Scanned Image

In Part 1 of this series I looked at basics of scanning a book or article. In this section I look at how to processing the scanned  images into OCRed PDF files..


1) Rotating and cropping
Processing can be done straight after scanning or done in a batch later. Using Advanced Tiff Editor rotate the images (using Select All [CONTROL + A] first). Resize the image so you can see the whole page on the screen and use the crop tool to cut it down to the required size.

2) Cleaning up gutters, spots and shadows
Using the cut tool, cut out gutter-shadows, dark edges, spots or other marks. Once all the pages are completed, save the file and exit the program.

Converting to an OCRed PDF

Open Adobe Acrobat and use the “Recognise Text in Multiple Files” tool to select all the finished files. Select a 300dpi overlay and run.

Final optimisation and output

The outputted pdf file appears automatically on your disk. Check the file size and the quality. If the image quality is poor or the file seems excessively large (experience will tell you this) run Adobe’s file optimisation option.

I make no claim that is the definitive introduction, but it is based on several years experience acquired using the hardware and software described. Can anyone contribute any insights they have gained in doing similar work? If so, please leave a comment.

An Introduction to Digitisation Part 1 of 2: Scanning Your Material

This is the first in a 2-part series in which I will be explaining the basics of digitisation. I am assuming no previous experience with any of the hardware of software that will be mentioned.

A Word about Copyright

Before you begin work on any printed material you must ensure that it is either in the Public Domain or that you have the written permission of the current copyright holder. Failure to do so is a breach of International Laws and could result in prosecution. Never assume that you have the right to digitise any material without checking its copyright status and obtaining explicit permission to proceed.

What you will need

Software

Advanced Tiff Editor: This is available for purchase and download from http://www.tiffedit.com/ You will use this software to prepare images of the material in multi-page TIFF format ready to conversion to PDF. This editing software is quite powerful and you will probably only need to use a few of its features.
The software is available for personal use, business use and as a site licence. You can get a 10% discount on this software using this code: 0764E089F0. I am a member of this compnanies affiliate program and so I am able offer this discount from the commission received from sales.
Adobe Acrobat Writer: This software outputs the TIFF images as PDF documents outputs your material as a finished PDF document reader for use. It may be possible to obtain cheaper PDF writing software from other sources, but they must be able to perform the tasks described below to be useful. It is not necessary to purchase the “Professional” version (which is more expensive). All the features you will need are included in the “Standard” version, such as Adobe Acrobat X Standard.
DropBox: This program allows you to back-up and share your files with colleagues easily. The free version gives you 2GB of free storage. Use this link to get an extra 500MB when you download and install the software.

Hardware

PC / Laptop: In order to run the above software you will need a PC or Laptop running MS Windows XP or newer operating system. Adobe Acrobat is the most demanding piece of software, so you should check the minimum requirements of both the version of Acrobat Writer you purchase and that of the scanner to ensure that your machine is powerful enough to run them.

Scanner: You will require a professional quality flat-bed scanner such as the Canon CanoScan 9000F Color Image Scanner. It is possible to scan material using cheaper scanners, but they are generally much slower and their lamps are not as bright. Using an inferior scanner will increase the length of time spent scanning and result in poorer quality results, especially if the material does not lie completely flat on the scanner platen as in the case of tightly bound journal articles and books. Investing in a good quality scanner will be worth the extra initial cost.

The Scanning Process

1) Running Advanced Tiff Editor
Start the program (with the scanner connected and turned on) and click on the “Acquire” button (it has a picture of a scan on it). This brings up the scanner preview and settings screen. Place the material to be scanned on the platen and click on the “preview” button.

2) Setting scan resolution and colour
Scanning at 300 dpi gives the best balance between image quality and file size. Scan in Black & White unless the bending of the pages means that the text becomes unreadable. In this case Greyscale should be selected, which gives you extra options to improve the image. The increased file size can be reduced using Adobe Acrobat later (see Part 3).

3) Scanning
Once you are happy with the settings click the “Scan” button and see what the scan looks like. If it is OK then continue to the next page and repeat until the document is complete.

4) Saving the images
Save the file using a web-safe file name (one that does not have illegal characters like spaces or capital letters). It will save time if you use the same file-name throughout the process, although the extension will change from .tif to .pdf.

In Part 2 I will explain how the scanned image is processed prior to conversion to a PDF file.