Skip navigation

Published October 15, 2020

| 2 Comments | Leave A Reply


Another “Roots & Branches” column on newspapers?

Yes, another column on newspapers!

That’s because I recently learned about a new project from the folks who run Chronicling America, a leading website with free, searchable scans of millions of pages from historical newspapers.

A joint effort of the Library of Congress and the National Endowment for the Humanities, Chronicling America also houses the search tool “U.S. Newspaper Directory, 1690–Present,” which is a way to track down all the newspapers that existed for a particular community, county or state for a certain time period.

But what’s new from Chronicling America is that its “innovator-in-residence”—one of the best job titles I think I’ve ever heard, by the way!—Benjamin Charles Germain Lee has created an experimental web application that allows users to browse more than 1.56 million images extracted from the Chronicling America database of digitized historical newspapers using machine learning.

The larger idea is that users can use this tool to search the images by visual similarity by training their own machine learning classifiers! The first stage of Newspaper Navigator is to extract content such as photographs, illustrations, cartoons, and news topics from the Chronicling America newspaper scans and corresponding OCR (optical character recognition) using emerging machine learning techniques.

The second stage is to reimagine an exploratory search interface over the collection in order to enable a wide range of people to navigate the collection according to their interests. With Newspaper Navigator, Lee hopes to engage the American public, enable new digital humanities and cultural heritage research, and advance computer science research.

Lee’s application targets the following types of newspaper visuals: Photographs, illustrations, maps, comics, editorial cartoons, headlines, and advertisement.

For genealogists, there are a couple of potential payoffs.

Because the OCR searchability has previously been based solely on text, the search tools were “blind” to photographs (and illustrations used before photography) in newspapers unless an accompanying text caption was OCR’d. Because captions are usually set in a different style of type than normal newspaper text, they fail to be searched correctly more often by the OCR. So photos and illustrations involving ancestors may not be found in searches to date.

Secondly, being able to search for items such as editorial cartoons, comics and advertisements can open up doors to larger understanding of the worlds of your ancestors. Is the family lore that grandfather always read “Katzenjammer Kids” in his evening newspaper? Here’s a way to find strips such as that one.

In addition to being the Library of Congress innovator-in-residence, Lee is a second year Ph.D. student in the Paul G. Allen School for Computer Science & Engineering at the University of Washington, where he studies human-AI interaction with his advisor, Professor Daniel Weld.

2 Comments

  1. Eric M. Bender

    4 years ago  

    Aha! When I was in Viet Nam I was photographed with one of my sergeants (also from Albuquerque). His wife sent him the newspaper picture of us. My wife was living with her family (in a different town), so no one in my family (none of whom lived in Albuquerque) nor hers ever saw the newspaper picture.
    I’VE NEVER FOUND IT IN ANY OF MY SEARCHES. I don’t remember exactly when the photo was taken (probably the middle of 1968) and I’m not up to scanning page-by-page for one photo of very little consequence.
    I know those editions are on-line and searchable. I think you’ve explained why that photo doesn’t show up in the searches. — Rick