Website Development Plans for 2015

Theology on thr Web NewsOver the Christmas holiday I have evaluating how best to spend my time on Website development and have decided on the following plans for 2015 (in no particular order):

Plans for 2015

1) Addition of at least one free-to-download commentary on each book of the Bible

Looking at my visitor logs for 2014 I was clear that the most popular downloads were of commentaries. During 2015 I will be trying to find more public domain works that still have value to the biblical exegete and make them available via the website.

2) Digitisation of a Kymer/English Theological Journal

Continuing in my commitment to make international Biblical Scholarship available I am very pleased to announce that I have received permission to digitise Honeycomb – a Kymer-English journal from Cambodia. I am looking forward to making this available – hopefully by Easter.

3) Continuing the digitisation of public domain materials from long-running theological journals

A number of theological journals have been published for so long that much of their older material is now in the public domain. These include:

Bulletin of the John Rylands Library

Journal of Theological Studies (old series)

Journal of the Transactions of the Victoria Institute

Palestine Exploration Quarterly

4) New Public Domain Material

As of 1st January for those publications covered by the “70-year from death of author” copyright term, those works by authors who died during 1944 and before have entered the public domain. I will be therefore working through my sites and making this material available. These include works by:

James Moffatt [1870-1944]

Robert Martin Pope [1865-1944]

Thomas Banks Strong [1861-1944]

Visitor numbers to the websites continue to climb, and are expected to exceed 2 million  this year. I would like to take this opportunity to thank my readers for their ongoing support and encouragement.

Some Thoughts about Archive.org Digital Texts

A few months ago I was asked to undertake a project for Tyndale House which involved searching through their catalogue for out-of-copyright books and try to link these with electronic versions already on-line. This naturally brought me to archive.org to search through its massive collection of on-line texts. On the basis of that experience I thought that it might be helpful to write down a few thoughts on the subject that may spark a discussion.

First of all, here are some of many positive features of archive.org texts:

  1. There is a huge amount of material available. Well over 90% of the 450 titles I searched for were already there.
  2. This material can be downloaded in a wide range of formats, including, PDF, DJVU, TEXT, HTML  and Kindle compatible files.
  3. The site is supported by an enthusiastic user-base who are constantly adding new material.
Now, some issues that need to be considered.
  1. Some books that are still under copyright in the UK because they were printed there are listed as being in the Public Domain on archive.org because it is hosted in the United States. In order to prevent them being downloaded outside the US Google Books (linked from  archive.org) has blocked non-US IP addresses from accessing them – which of course can always be circumvented using a US-based proxy.
  2. Some material that is in the Public Domain in the UK is being blocked by Google Books..
  3. The first two points serve as a reminder that users cannot rely on the accuracy of the copyright declaration on the site outside of the US – you need to double check everything.
  4. Some scans are incomplete and/or of poor quality.
  5. Scans to PDF are often very large files. By reprocessing the files it is possible to reduce the file size by 50% in one trial I conducted.
  6. The search facility is fine if you know the exact title of the work you are after. However, if you misspell it or get a word wrong then the book you are after will not appear in the results.
  7. Perhaps as a result of (6) the usage statistics listed next to certain titles showing the number of downloads are often surprisingly low.
Please “weigh” rather than just “count” the points above, as the benefits of the site far outweigh the negative issues. For me they indicate a number of opportunities to make this work further:
  • Important UK-published theological books in the Public Domain could be re-scanned and hosted so as to avoid the unnecessary blocks on accessing them.
  • Poor quality scans can be replaced.
  • When serving users on dial-up or slow access Internet connections there is scope for reprocessing selected works and hosting them elsewhere to reduce the file sizes.
  • The site lends itself to being linked with specialist bibliographies (such as those provided by the TheologyOnTheWeb sites) linked directly to material hosted on archive.org. This gets round the problem of searches when the material is not being blocked.
What has been your experience with using archive.org? Can you suggest any other ways in which the wealth of material there can be better used?

An Introduction to Digitisation Part 2 of 2: Processing the Scanned Image

In Part 1 of this series I looked at basics of scanning a book or article. In this section I look at how to processing the scanned  images into OCRed PDF files..


1) Rotating and cropping
Processing can be done straight after scanning or done in a batch later. Using Advanced Tiff Editor rotate the images (using Select All [CONTROL + A] first). Resize the image so you can see the whole page on the screen and use the crop tool to cut it down to the required size.

2) Cleaning up gutters, spots and shadows
Using the cut tool, cut out gutter-shadows, dark edges, spots or other marks. Once all the pages are completed, save the file and exit the program.

Converting to an OCRed PDF

Open Adobe Acrobat and use the “Recognise Text in Multiple Files” tool to select all the finished files. Select a 300dpi overlay and run.

Final optimisation and output

The outputted pdf file appears automatically on your disk. Check the file size and the quality. If the image quality is poor or the file seems excessively large (experience will tell you this) run Adobe’s file optimisation option.

I make no claim that is the definitive introduction, but it is based on several years experience acquired using the hardware and software described. Can anyone contribute any insights they have gained in doing similar work? If so, please leave a comment.