SCROLL DOWN TO READ THE POST

Internet Archive Book Images: Scholar digitizes huge collection of images from

August 29, 2014 by Joyce Valenza Leave a Comment

Attention fans of history, historical images, book art, and archives in general:
Georgetown University scholar, Yahoo fellow and developer Kalev Leetaru is currently working to create a searchable database of 12 million copyright-free images, from 600 million library book pages.
The recently launched Flickr-based Internet Archive Book Images currently hosts 2.6 million images that were automatically added with searchable tags. Ranging from 1500 through 1922, most images are beyond the scope of copyright.
With its focus on images as individual items, Leetaru’s project is different from former large library digitization efforts.
In a BBC article posted today, Leetaru notes:

For all these years all the libraries have been digitising their books, but they have been putting them up as PDFs or text searchable works.
They have been focusing on the books as a collection of words. This inverts that.
Stretching half a millennia, it’s amazing to see the total range of images and how the portrayals of things have changed over time. Most of the images that are in the books are not in any of the art galleries of the world – the original copies have long ago been lost.

The BBC article describes how Leetaru’s process shifted from former digitization practice:

The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the jpeg picture format.
The software also copied the caption for each image and the text from the paragraphs immediately preceding and following it in the book.
Each jpeg and its associated text was then posted to a new Flickr page, allowing the public to hunt through the vast catalogue using the site’s search tool.

Full records show text before and after an image, hyperlinked tags, access to the book in which the image appears and its catalog entry, as well as the ability to gather all the images from the book.
This database is a huge boon for students of history.

Screen Shot 2014-08-29 at 11.06.21 AM — Titanic search

It offers so much fodder for multimedia production, for creating galleries for study. With its wealth of journalistic photographs, charts, portraits, headlines, maps, drawings and editorial cartoons, it also offers vast opportunities to teach visual and historical analysis. Images relating to race, religion and gender will also spark conversation.
Science teachers will appreciate its content and diagrams relating to invention, innovation, as well as its beautiful images of natural history.
Art teachers will appreciate the wealth of decorative images and historical design elements.
This is also a huge boon to libraries.
Leetaru plans to share his code and encourages other library to engage in the process with their own books to constantly expand this universe of images.

Thanks again, to @infodocket for this lead!

Screen Shot 2014-08-29 at 11.55.39 AM — Image from page 29 of “Bell telephone magazine” (1922)

For more public domain historical images, consider:

SCROLL TO KEEP READING THIS POST

Filed under: digitization, flickr, images

About Joyce Valenza

Joyce is an Assistant Professor of Teaching at Rutgers University School of Information and Communication, a technology writer, speaker, blogger and learner. Follow her on Twitter: @joycevalenza

SLJ Blog Network

100 Scope Notes

Related Articles on SLJ

Tools for Exploration | Technology & Machines Series Nonfiction

Get Outside! Outdoor Programming for Tweens and Teens

Travis Jonker Declared April Fools Winner | Top Stories on SLJ

Field Guides to Life | Editorial Series Nonfiction

17 Authors, 18 Books: The Ambitious Kid Lit Collaboration That Created 'Mrs. Z's Class'

Internet Archive Book Images: Scholar digitizes huge collection of images from

About Joyce Valenza

SLJ Blog Network

Endangered Series #30: Nancy Drew

Research and Wishes: A Q&A with Nedda Lewers About Daughters of the Lamp

Cat Out of Water | Review

Mock Newbery Titles So Far: 21 possible Medal contenders

Take Five: New Middle Grade Books in May

The Classroom Bookshelf is Moving

Kate Messner Introduces The Kids in Mrs.’s Z’s Class

Related Posts

A true gift from SHEG: DIY digital literacy assessments and tools for historical thinking

British Library expands The Commons with Public Domain Goodness

Artstor shares more than a million images @ library.artstor.com

LOC introduces its Story Maps

WalkWoke: an iOS app for your student activists (and a media literacy opp)

Archives