Next meeting: Bias in metadata

Words have meaning. As catalogers and people who work with metadata, we use words to categorize and organize and describe our collections. We want people to find things. We want researchers to discover that item that will help answer questions, point in the right direction, or show a new path. The words we apply matter. The words we apply also have historical, social, and political significance. The words we use reflect how we understand the world in our time. When we use words to categorize and organize and describe people, groups, and countries, we are reflecting who we are and how we view the world. Our work requires that we strive to recognize and counter those biases to make our collections as useful as possible to the widest audience possible.

Join us for a discussion about the words we use in our metadata work. We’ll review authorized sources, subscription sources, and our own records to reflect on how we apply words to categorize and organize and describe people, groups, and countries. We’ll identify the processes that exist to make changes to those words and examine whether we are serving the world we think we are serving.

DATE: Wednesday, April 25
TIME: 9-10 am
PLACE: Wells Library Room E174
MODERATORS: Julie Hardesty & Jennifer Liss

Summary of March 22 Facts in Metadata Discussion

Julie kicked off the session by showing a digitized photograph from the Libraries’ collections. A woman poses in front of a computer terminal. Session participants identified features of the photo, including the person depicted but questioned the date of the photo, wondering where the existing information presented on the website came from. This exercise highlighted the first theme of the discussion on Facts in Metadata: metadata provenance. When libraries present information to a user, we rarely say anything about where that information is sourced from. Did photograph descriptions come from writing on the back of the photograph? Where the photo descriptions provided by a cataloger?

Jennifer asked attendees to consider an image from a public domain image repository like Unsplash. The photo of an elderly woman crying was retrieved with a search for “grief.” However, when viewing the image metadata, we see that the title assigned by the photographer is “grandma crying moment during a wedding.” Facts (elderly woman, crying) were clear but perhaps the context of the situation (was the woman expressing grief or happiness?) was not. Cultural heritage institutions and for-commercial-use image repositories have different missions and serve different user needs. Unsplash need not concern itself with truth in the descriptions of images (grandma was overcome with tears of happiness, not grief). The company’s goal is to help people find images to use in their own work. Provenance reveals not just who wrote the descriptions but also for which purpose was the description created.

In contrast to the evaluating content on the wilds of the web, Jennifer shared what should have been a routine cataloging situation in which the facts were difficult to parse. The title in question was Hitler’s Munich Man: the Downfall of Admiral Sir Barry Domvile by Martin Connolly (IUB Libraries owns the print and licenses access to the ebook version). A very complete and straightforward looking catalog record was already available. The record contained the following topical subject headings:

World War, 1939-1945—Collaborationists—Great Britain
Traitors—Great Britain—History—20th century
Nazis—Great Britain—History—20th century

For reasons that I can explain in a future post, the first and third Library of Congress Subject Headings (LCSH) strings are invalid as constructed and must be deleted from the record. The second subject heading string is problematic because 1) factually, Domvile was never tried nor convicted of treason and 2) the author doesn’t come to a definitive conclusion on the topic of whether Domvile committed treason (if the author’s argument was that Domvile did commit treason, the second heading might be justified). The subject headings raised many questions about facts (is a sympathizer a collaborationist?) and revealed catalogers’ frustrations with taxonomies that aren’t constructed carefully enough to help us know when it is appropriate to use a subject term.

The topic of how different domains treat subject analysis was raised in discussing the 20% rule. Library catalogers include subject headings if at least 20% of the resource is about that topic. In archival finding aids, collection-level subject lists tend to be extensive (I don’t think archivists observe a 20% rule but I’m happy to be corrected on that matter). Documenting expectations for metadata access is crucial. Even within the cultural heritage domains (libraries, archival and museums), metadata practice varies.

At about halfway through the meeting hour, the discussion began to turn away from fact and toward the topic of bias. Libraries have a specific purpose—if that purpose is misinterpreted by library users it may affect the understanding of the resources and services libraries provide. A participant brought up Emily Drabinski’s article, “Queering the Catalog: Queer Theory and the Politics of Correction,” in which the author argues that because knowledge organization systems are inherently biased and cannot be corrected, libraries are responsible for educating library user in the biases and limitations of the catalog as a dataset. Participants teased out this notion and by the end of the session, the conversation had definitely strayed far from the facts and well into how to respond to bias in metadata.

Join us for our next meeting on Wednesday, April 25. We will pick this topic up and see if we can identify some next steps.

Resources mentioned during this discussion:

Drabinski, Emily. “Queering the Catalog: Queer Theory and the Politics of Correction.” The Library Quarterly: Information, Community, Policy, vol. 83, no. 2, 2013, pp. 94–111.

Haynes, David. “Metadata – have we got the ethics right?” ALA Editions Blog, March 21, 2018.