Summary of April 25 Bias in Metadata Discussion

Our session on Bias in Metadata began with Jennifer sharing the story about the Starbucks racial bias education training session that will be held in 8,000 Starbucks stores on one day as a way to combat implicit bias and prevent another incident like the racial bias incident that occurred in the Philadelphia Starbucks. [1] There is recognition that a one day training isn’t going to fix the problem but it is a place to start. We recognized as a group that we have implicit bias and we need to be proactive in preventing that from impacting the metadata work we do.

We then discussed how bias in metadata effects authority work. The saga of proposing “white privilege” as a Library of Congress (LC) Subject Heading  showed that new terms take a very long time to process (two years and two rejections in this case) and what ends up being accepted is potentially so altered as to be unrecognizable for what it was intended to describe: Privilege (Social Psychology). The question was raised, as when the U.S. Patent Office was giving patents out for web technology when that was first started before really understanding how that technology would be used or was needed, if LC is in a moment of approving/disapproving terms without reaching out to the community to understand the needs for these terms? LC is a conservative body (the term “intersectionality” was used in a book title 20 years before LC approved it as a subject heading) and it is also not the most transparent organization. Highlights and excerpts from monthly meetings are published but not the full transcript so it can be difficult to know why a term was rejected or how to better explain the need for a rejected term. It was also pointed out, however, that LC responds to proposals from within LC the same way it does to external proposals so LC’s own catalogers seem to be just as in the dark as we are regarding how to successfully propose a new subject term.

Discussion also mentioned that LC classification is based on a default perspective of a white male and everything else is “other” – the term “women” being added to further classify something that is otherwise not gendered, for example.

We also discussed specific examples of problematic items in our own digital collections. Derogatory sheet music from the 1920s and 1930s are one example. Subject headings are applied that give geographic-specific subjects. The sheet music is not from that place but is about the place and meant to be discriminatory, insulting, and demeaning. Sharing those subject headings out as geographic-specific locations that could be used for mapping purposes in aggregators like DPLA does not seem appropriate and our mapping of those collections for sharing has kept those subjects as topical subjects only and nothing geographic-specific. When items like these are shared beyond IU, the original collection site and context can be lost and the metadata can be skewed in unexpected ways.

One participant studied applying subject headings to address problematic items like discriminatory and derogatory sheet music from the 20th century to help triangulate topical subjects associated with the item and clarify that aspect of the item. The subject headings that would be used, however, don’t apply to the aboutness of the item (the sheet music is derogatory, it’s not about the derogatoriness). So it’s difficult to use subject headings to express these problems.

Another example was a digitized photograph where the description from the photographer used a racist term as the title to describe the subject of the photograph and a genre heading of “Ethnographic photographs” was applied. The photographer was an amateur photographer so is that an appropriate genre or is that somehow trying to explain the use of the racist term (the ethnography being applied to the photographer and not the subject of the photograph)?

Again, the context is easily lost when this photograph is shared outside of the collection’s original website and the title stands as it is. Should the racist term be corrected or changed? Is there research use for providing this information? Participants offered ideas and examples they have experienced elsewhere – a click-through statement that has to be acknowledged before accessing a collection that contains potentially harmful imagery or terminology; showing something as a direct transcription (racist term in quotes, for example); showing changes over time in how people express themselves and current terminology used.

The discussion then turned to how we can show these kinds of changes in cataloging practice and whether or not we have the authority to declare, for example, that something involves racist content? Our time came to an end with many questions unanswered. We now prepare to meet at the In-house Institute on May 7 to continue this conversation and consider strategies to address historical cataloging problems and ways to head off new problems in our cataloging practice.

[1] Chang, Ailsa. (2018-04-19). “A Lesson In How To Overcome Implicit Bias.” Code Switch: Race and Identity, Remixed. NPR. https://www.npr.org/sections/codeswitch/2018/04/19/604070231/a-lesson-in-how-to-overcome-implicit-bias

Summary of March 22 Facts in Metadata Discussion

Julie kicked off the session by showing a digitized photograph from the Libraries’ collections. A woman poses in front of a computer terminal. Session participants identified features of the photo, including the person depicted but questioned the date of the photo, wondering where the existing information presented on the website came from. This exercise highlighted the first theme of the discussion on Facts in Metadata: metadata provenance. When libraries present information to a user, we rarely say anything about where that information is sourced from. Did photograph descriptions come from writing on the back of the photograph? Where the photo descriptions provided by a cataloger?

Jennifer asked attendees to consider an image from a public domain image repository like Unsplash. The photo of an elderly woman crying was retrieved with a search for “grief.” However, when viewing the image metadata, we see that the title assigned by the photographer is “grandma crying moment during a wedding.” Facts (elderly woman, crying) were clear but perhaps the context of the situation (was the woman expressing grief or happiness?) was not. Cultural heritage institutions and for-commercial-use image repositories have different missions and serve different user needs. Unsplash need not concern itself with truth in the descriptions of images (grandma was overcome with tears of happiness, not grief). The company’s goal is to help people find images to use in their own work. Provenance reveals not just who wrote the descriptions but also for which purpose was the description created.

In contrast to the evaluating content on the wilds of the web, Jennifer shared what should have been a routine cataloging situation in which the facts were difficult to parse. The title in question was Hitler’s Munich Man: the Downfall of Admiral Sir Barry Domvile by Martin Connolly (IUB Libraries owns the print and licenses access to the ebook version). A very complete and straightforward looking catalog record was already available. The record contained the following topical subject headings:

World War, 1939-1945—Collaborationists—Great Britain
Traitors—Great Britain—History—20th century
Nazis—Great Britain—History—20th century

For reasons that I can explain in a future post, the first and third Library of Congress Subject Headings (LCSH) strings are invalid as constructed and must be deleted from the record. The second subject heading string is problematic because 1) factually, Domvile was never tried nor convicted of treason and 2) the author doesn’t come to a definitive conclusion on the topic of whether Domvile committed treason (if the author’s argument was that Domvile did commit treason, the second heading might be justified). The subject headings raised many questions about facts (is a sympathizer a collaborationist?) and revealed catalogers’ frustrations with taxonomies that aren’t constructed carefully enough to help us know when it is appropriate to use a subject term.

The topic of how different domains treat subject analysis was raised in discussing the 20% rule. Library catalogers include subject headings if at least 20% of the resource is about that topic. In archival finding aids, collection-level subject lists tend to be extensive (I don’t think archivists observe a 20% rule but I’m happy to be corrected on that matter). Documenting expectations for metadata access is crucial. Even within the cultural heritage domains (libraries, archival and museums), metadata practice varies.

At about halfway through the meeting hour, the discussion began to turn away from fact and toward the topic of bias. Libraries have a specific purpose—if that purpose is misinterpreted by library users it may affect the understanding of the resources and services libraries provide. A participant brought up Emily Drabinski’s article, “Queering the Catalog: Queer Theory and the Politics of Correction,” in which the author argues that because knowledge organization systems are inherently biased and cannot be corrected, libraries are responsible for educating library user in the biases and limitations of the catalog as a dataset. Participants teased out this notion and by the end of the session, the conversation had definitely strayed far from the facts and well into how to respond to bias in metadata.

Join us for our next meeting on Wednesday, April 25. We will pick this topic up and see if we can identify some next steps.

Resources mentioned during this discussion:

Drabinski, Emily. “Queering the Catalog: Queer Theory and the Politics of Correction.” The Library Quarterly: Information, Community, Policy, vol. 83, no. 2, 2013, pp. 94–111. www.jstor.org/stable/10.1086/669547.

Haynes, David. “Metadata – have we got the ethics right?” ALA Editions Blog, March 21, 2018. http://www.alaeditions.org/blog/310/metadata-have-we-got-ethics-right.