Next meeting: Bias in metadata

Words have meaning. As catalogers and people who work with metadata, we use words to categorize and organize and describe our collections. We want people to find things. We want researchers to discover that item that will help answer questions, point in the right direction, or show a new path. The words we apply matter. The words we apply also have historical, social, and political significance. The words we use reflect how we understand the world in our time. When we use words to categorize and organize and describe people, groups, and countries, we are reflecting who we are and how we view the world. Our work requires that we strive to recognize and counter those biases to make our collections as useful as possible to the widest audience possible.

Join us for a discussion about the words we use in our metadata work. We’ll review authorized sources, subscription sources, and our own records to reflect on how we apply words to categorize and organize and describe people, groups, and countries. We’ll identify the processes that exist to make changes to those words and examine whether we are serving the world we think we are serving.

DATE: Wednesday, April 25
TIME: 9-10 am
PLACE: Wells Library Room E174
MODERATORS: Julie Hardesty & Jennifer Liss

Summary of March 22 Facts in Metadata Discussion

Julie kicked off the session by showing a digitized photograph from the Libraries’ collections. A woman poses in front of a computer terminal. Session participants identified features of the photo, including the person depicted but questioned the date of the photo, wondering where the existing information presented on the website came from. This exercise highlighted the first theme of the discussion on Facts in Metadata: metadata provenance. When libraries present information to a user, we rarely say anything about where that information is sourced from. Did photograph descriptions come from writing on the back of the photograph? Where the photo descriptions provided by a cataloger?

Jennifer asked attendees to consider an image from a public domain image repository like Unsplash. The photo of an elderly woman crying was retrieved with a search for “grief.” However, when viewing the image metadata, we see that the title assigned by the photographer is “grandma crying moment during a wedding.” Facts (elderly woman, crying) were clear but perhaps the context of the situation (was the woman expressing grief or happiness?) was not. Cultural heritage institutions and for-commercial-use image repositories have different missions and serve different user needs. Unsplash need not concern itself with truth in the descriptions of images (grandma was overcome with tears of happiness, not grief). The company’s goal is to help people find images to use in their own work. Provenance reveals not just who wrote the descriptions but also for which purpose was the description created.

In contrast to the evaluating content on the wilds of the web, Jennifer shared what should have been a routine cataloging situation in which the facts were difficult to parse. The title in question was Hitler’s Munich Man: the Downfall of Admiral Sir Barry Domvile by Martin Connolly (IUB Libraries owns the print and licenses access to the ebook version). A very complete and straightforward looking catalog record was already available. The record contained the following topical subject headings:

World War, 1939-1945—Collaborationists—Great Britain
Traitors—Great Britain—History—20th century
Nazis—Great Britain—History—20th century

For reasons that I can explain in a future post, the first and third Library of Congress Subject Headings (LCSH) strings are invalid as constructed and must be deleted from the record. The second subject heading string is problematic because 1) factually, Domvile was never tried nor convicted of treason and 2) the author doesn’t come to a definitive conclusion on the topic of whether Domvile committed treason (if the author’s argument was that Domvile did commit treason, the second heading might be justified). The subject headings raised many questions about facts (is a sympathizer a collaborationist?) and revealed catalogers’ frustrations with taxonomies that aren’t constructed carefully enough to help us know when it is appropriate to use a subject term.

The topic of how different domains treat subject analysis was raised in discussing the 20% rule. Library catalogers include subject headings if at least 20% of the resource is about that topic. In archival finding aids, collection-level subject lists tend to be extensive (I don’t think archivists observe a 20% rule but I’m happy to be corrected on that matter). Documenting expectations for metadata access is crucial. Even within the cultural heritage domains (libraries, archival and museums), metadata practice varies.

At about halfway through the meeting hour, the discussion began to turn away from fact and toward the topic of bias. Libraries have a specific purpose—if that purpose is misinterpreted by library users it may affect the understanding of the resources and services libraries provide. A participant brought up Emily Drabinski’s article, “Queering the Catalog: Queer Theory and the Politics of Correction,” in which the author argues that because knowledge organization systems are inherently biased and cannot be corrected, libraries are responsible for educating library user in the biases and limitations of the catalog as a dataset. Participants teased out this notion and by the end of the session, the conversation had definitely strayed far from the facts and well into how to respond to bias in metadata.

Join us for our next meeting on Wednesday, April 25. We will pick this topic up and see if we can identify some next steps.

Resources mentioned during this discussion:

Drabinski, Emily. “Queering the Catalog: Queer Theory and the Politics of Correction.” The Library Quarterly: Information, Community, Policy, vol. 83, no. 2, 2013, pp. 94–111.

Haynes, David. “Metadata – have we got the ethics right?” ALA Editions Blog, March 21, 2018.

Next meeting: Paths to a Linked Data Catalog

Join the next Metadata Discussion Group meeting, where we’ll welcoming in the new academic year with a discussion about the many possible paths to implementing linked library data. Participants will consider homegrown and vended solutions and think about the implications of when and where to introduce linked data into library data stream.

DATE: Tuesday, September 20
TIME: 9-10 am
PLACE: Wells Library Room 043
TOPIC: Paths to a Linked Data Catalog
MODERATOR: Jennifer Liss

We hope to see you there!

Save the Dates: Fall 2016 meetings

The Metadata Discussion Group at Indiana University Libraries welcomes anyone from the IU community to attend our upcoming meetings.

Meetings will be from 9:00 am – 10:00 am in Room 043 of the Wells Library.

September 20
Paths to a Linked Data Catalog
Moderator: Jennifer Liss

November 29
Moderator: Julie Hardesty

Round Robin Updates – March 2016 meeting summary

The discussion on Moving from MODS to usable RDF, originally planned for March 8, will be rescheduled (date TBD). Stay tuned!

Since the time was blocked off on everyone’s calendars (and since we hadn’t met in a while), group members participated in a round robin. Below is a summary of the updates. Errors and misunderstandings on my part may be corrected in the comments below and I’ll do my best to update the post.


  • Andrea Morrison reported seeing more and more ORCiD identifiers in Library of Congress Name Authority File (NAF) records

Browse Functionality in Progress for IUCAT

  • Rachael Cohen reported on the progress to implement browse for IUCAT
    • Working from code developed by Cornell (they also use Blacklight for their discovery layer)
    • Development team will start with author browse, then tackle Kinsey subject headings, then Library of Congress Subject Headings (LCSH)
  • Spencer Anspach clarified how browse will work:
    • Authorized access points in bibliographical records will be hyperlinked
    • Browse results will display all access points (authorized and unauthorized) that appear in bibliographic records–all results will have a number next to it, denoting how many times that access point is used in the bibliographic records database; some access points will have an icon next to them, which will denote that the access point is the authorized version (NAF, etc.)
      • Clicking on the icon will take a user to the authority record for that authorized access point (MARC 670 fields will NOT display to users)
    • Clicking on any of the access points will conduct a search on that access point
  • Concerns: batch loaded records from vendors (we won’t mention names) do not have authority control; one vendor in particular never includes dates (MARC subfield d) in authorized access points–this will certainly have an adverse impact on browse!
  • Shelf ready materials often do not have authority control–those errors are picked up in post-cataloging, by the Database Management team

Use of BoundingBox Tool Adds Value to Map Records

  • Heiko Mühr reported a pilot (and subsequent adoption) of the BoundingBox Tool
    • The tool allows catalogers to find geographical coordinates for an area and outputs longitude and latitude data in a number of formats (including–but not limited to!–OCLC MARC)
    • After piloting use of the tool in the month of December, catalogers determined that use of the tool did not adversely impact cataloging productivity–the quality of records increased
    • Coordinate data is now required in bibliographic records for sheet maps

Government Publications Metadata Update

  • Andrea Morrison reported on her collaboration with the new GIMMS librarian
    • Working on finding ways to streamline acquisitions and cataloging workflows and provide better metadata services for federal documents

Media Digitization and Preservation Initiative

  • Ronda Sewald reported on progress made in the Media Digitization and Preservation Initiative (MDPI) project
    • Tens of thousands of media objects have been digitized to date
    • Digitization is ongoing
    • Currently sorting out ways to determine the rights for digital objects and how to create metadata (at such a huge scale) for discovery

Medical Subject Headings (MeSH) Deconstruction Update

  • James Castrataro reported that the deconstruction of MeSH headings seems to be complete
    • It was uncertain whether or not MeSH access points could by subdivided chronologically before the December 2015 decision to deconstruct
    • No one present was sure whether or not MeSH access points can be subdivided chronologically at present

Using the Catalog to Support Teaching

  • Bob Noel sought recommendations from the group for providing access to individual vendor streaming video titles (records were batch loaded at the item-level, rather than the collection-level)
    • Participants came up with a handful of possible strategies including, creating a LibGuide where all updates to links, etc., would be handled in one place
    • Interesting discussion about making access as easy as possible for the user (faculty, researchers, and students)