Artifact Interest Meter: What’s the Story?

matthew a. tucker
5 min readJun 27, 2024

--

Doodles by M.A.Tucker, 24JUN24

“to become a citizen of the United States”. Image by M.A.Tucker.

Images of artifacts from bygone eras can tell fascinating stories. If they can be coaxed to reveal the mysteries they contain, the fragmentary pieces of lore might be re-assembled into a coherent whole. But there are challenges.

An astute observer will note the degraded condition of the 1864 citizenship document in the image. It had separated in places where it had been folded. Someone long ago had used an invention called Scotch Tape to hold it together and preserve it. Unfortunately, that tape had long ago decomposed. There are folds and creases that obscure portions of the text.

This artifact could have just been added to the pile of hundreds of other similar images. But instead, might machine learning get the discovery process moving? The artifact images can be quite challenging to gather the key content — actors, actions, outcomes, dates, locations. How to discover the story each artifact tells and piece all those stories together into a coherent whole?

Artifact Interest Meter concept. Doodle by M.A.Tucker.

An Artifact Interest Meter (AIM) might be handy to gather at least a part of the story. Imagine a tag-team process to reconstruct the contents e.g. actors, actions, outcomes, dates, locations. With an extensive collection of images, semi-automatic cataloging an even crude timeline would go a long way toward allowing a human to focus on the most interesting parts of the story.

The source images are an extensive set of poor quality images — poor quality as a result of the condition of the original documents and due to the challenges of capturing images while not further damaging the artifacts. To accomplish this tag-team approach, the AIM pipeline is composed of a number of discrete steps where models performing various tasks can interact with a human as a guide to better understanding.

transformer

The transformer transforms the source image in various ways in the hope of being able to capture more content:

  • gray scale conversion
  • threshold elimination
  • noise reduction
  • sharpening
  • brightness and contrast
Transformations. Image by M.A.Tucker.

The transformations alter the raw image data to generate images that may improve the predictions by forming better edges where the original is blurred or removing “noise”, random junk, from the image.

scanner

The scanner scans the source and the various transformed images to predict words, letters, order and bounding box locations of the words within each image. As you may note, the predictions are less than perfect: “applicant having declared on oath before this court that he will support the constitution of the united statess nd that he doth absolutely and entirely renounce and abjure all allegiance and fidelity to every foreign princer poentater state or sovereigntys whatsoever nd pars ticularly to the power above named

rater

The rater evaluates the effectiveness of the scanning and transformations to suggest the “best” set of predictions. Comparing the predictions to the “ground truth” (the actual contents laboriously extracted via human eye and hand) quickly shows which transformations perform well. Exact and partial matches between ground truth (the actual text) vs generated predictions show that transformations make a difference!

Percentage of exact & partial matches between ground truth and predictions. Image by M.A.Tucker.

overlayer

The overlayer overlays the predicted text and the predicted bounding box for the text on the image allowing human eyes to evaluate the effectiveness of the method. Overlaying the best predictions can help to extract interesting aspects. For example, it became apparent that the cursive mixed into the stream of printed letters which are quite easy for a human to discern resulted in gibberish predictions. Also, dates are a challenge for the model when they are of the form: On the Seventh day of November in the year of our Lord one thousand eight hundred and Sixty four.

Overlays of predictions. Image by M.A.Tucker.

logger

The logger catalogs the results: what actors, actions and outcomes, dates, locations are revealed by the analysis?

Do the Eyes Have it?

Easy for the eyes-> There are a number of challenges in reconstructing the contents. For example, a crease or fold might obscure a letter or word. Sometimes the human eye sees the mistake in the prediction immediately. In a court document, the word associated with the judge should be HONORABLE, not HORORABLE! In this case, the crease hides the second part of the ”n” letter, resulting in an “r” prediction (for a partial match). Wrong! At least, you’d hope the presiding judge was not HORORABLE! ;}

The judge is HONORABLE, not HORORABLE! Image by M.A.Tucker.

Surprise for the eyes-> But the human eye is not always better. The top most “United States of America” banner is followed by an engraving and then a “State of Iowa” banner. These are quite clear and recognizable so the initial predictions were almost always accurate. However, one transformation resulted in gibberish between the top most “United States of America” banner and the “State of Iowa” banner. Digging into the gibberish, I noted that my eye had not caught a small banner in the engraving. Once noticed, the human eye tracks the waving banner quite easily. The model failed miserably to predict the words but did detect the presence of “something” interesting.

Challenging banner: OUR LIBERTY WE PRIZE AND OUR RIGHTS WE WILL MAINTAIN. Image by M.A.Tucker.

What Now?

It appears the path forward in reconstructing these stories is a combination, tag-team approach with the models and the human ricocheting predictions and insights back and forth to somehow arrive at a coherent story. Who was this fellow? What was he like? And what about his descendants? How did events flow down to the modern era? What’s the story?

Answering that question, Dear Reader, is another story. ;}

What’s the story? Image by M.A.Tucker.

--

--