Summary of April 25 Bias in Metadata Discussion

Our session on Bias in Metadata began with Jennifer sharing the story about the Starbucks racial bias education training session that will be held in 8,000 Starbucks stores on one day as a way to combat implicit bias and prevent another incident like the racial bias incident that occurred in the Philadelphia Starbucks. [1] There is recognition that a one day training isn’t going to fix the problem but it is a place to start. We recognized as a group that we have implicit bias and we need to be proactive in preventing that from impacting the metadata work we do.

We then discussed how bias in metadata effects authority work. The saga of proposing “white privilege” as a Library of Congress (LC) Subject Heading  showed that new terms take a very long time to process (two years and two rejections in this case) and what ends up being accepted is potentially so altered as to be unrecognizable for what it was intended to describe: Privilege (Social Psychology). The question was raised, as when the U.S. Patent Office was giving patents out for web technology when that was first started before really understanding how that technology would be used or was needed, if LC is in a moment of approving/disapproving terms without reaching out to the community to understand the needs for these terms? LC is a conservative body (the term “intersectionality” was used in a book title 20 years before LC approved it as a subject heading) and it is also not the most transparent organization. Highlights and excerpts from monthly meetings are published but not the full transcript so it can be difficult to know why a term was rejected or how to better explain the need for a rejected term. It was also pointed out, however, that LC responds to proposals from within LC the same way it does to external proposals so LC’s own catalogers seem to be just as in the dark as we are regarding how to successfully propose a new subject term.

Discussion also mentioned that LC classification is based on a default perspective of a white male and everything else is “other” – the term “women” being added to further classify something that is otherwise not gendered, for example.

We also discussed specific examples of problematic items in our own digital collections. Derogatory sheet music from the 1920s and 1930s are one example. Subject headings are applied that give geographic-specific subjects. The sheet music is not from that place but is about the place and meant to be discriminatory, insulting, and demeaning. Sharing those subject headings out as geographic-specific locations that could be used for mapping purposes in aggregators like DPLA does not seem appropriate and our mapping of those collections for sharing has kept those subjects as topical subjects only and nothing geographic-specific. When items like these are shared beyond IU, the original collection site and context can be lost and the metadata can be skewed in unexpected ways.

One participant studied applying subject headings to address problematic items like discriminatory and derogatory sheet music from the 20th century to help triangulate topical subjects associated with the item and clarify that aspect of the item. The subject headings that would be used, however, don’t apply to the aboutness of the item (the sheet music is derogatory, it’s not about the derogatoriness). So it’s difficult to use subject headings to express these problems.

Another example was a digitized photograph where the description from the photographer used a racist term as the title to describe the subject of the photograph and a genre heading of “Ethnographic photographs” was applied. The photographer was an amateur photographer so is that an appropriate genre or is that somehow trying to explain the use of the racist term (the ethnography being applied to the photographer and not the subject of the photograph)?

Again, the context is easily lost when this photograph is shared outside of the collection’s original website and the title stands as it is. Should the racist term be corrected or changed? Is there research use for providing this information? Participants offered ideas and examples they have experienced elsewhere – a click-through statement that has to be acknowledged before accessing a collection that contains potentially harmful imagery or terminology; showing something as a direct transcription (racist term in quotes, for example); showing changes over time in how people express themselves and current terminology used.

The discussion then turned to how we can show these kinds of changes in cataloging practice and whether or not we have the authority to declare, for example, that something involves racist content? Our time came to an end with many questions unanswered. We now prepare to meet at the In-house Institute on May 7 to continue this conversation and consider strategies to address historical cataloging problems and ways to head off new problems in our cataloging practice.

[1] Chang, Ailsa. (2018-04-19). “A Lesson In How To Overcome Implicit Bias.” Code Switch: Race and Identity, Remixed. NPR. https://www.npr.org/sections/codeswitch/2018/04/19/604070231/a-lesson-in-how-to-overcome-implicit-bias

Summary of March 22 Facts in Metadata Discussion

Julie kicked off the session by showing a digitized photograph from the Libraries’ collections. A woman poses in front of a computer terminal. Session participants identified features of the photo, including the person depicted but questioned the date of the photo, wondering where the existing information presented on the website came from. This exercise highlighted the first theme of the discussion on Facts in Metadata: metadata provenance. When libraries present information to a user, we rarely say anything about where that information is sourced from. Did photograph descriptions come from writing on the back of the photograph? Where the photo descriptions provided by a cataloger?

Jennifer asked attendees to consider an image from a public domain image repository like Unsplash. The photo of an elderly woman crying was retrieved with a search for “grief.” However, when viewing the image metadata, we see that the title assigned by the photographer is “grandma crying moment during a wedding.” Facts (elderly woman, crying) were clear but perhaps the context of the situation (was the woman expressing grief or happiness?) was not. Cultural heritage institutions and for-commercial-use image repositories have different missions and serve different user needs. Unsplash need not concern itself with truth in the descriptions of images (grandma was overcome with tears of happiness, not grief). The company’s goal is to help people find images to use in their own work. Provenance reveals not just who wrote the descriptions but also for which purpose was the description created.

In contrast to the evaluating content on the wilds of the web, Jennifer shared what should have been a routine cataloging situation in which the facts were difficult to parse. The title in question was Hitler’s Munich Man: the Downfall of Admiral Sir Barry Domvile by Martin Connolly (IUB Libraries owns the print and licenses access to the ebook version). A very complete and straightforward looking catalog record was already available. The record contained the following topical subject headings:

World War, 1939-1945—Collaborationists—Great Britain
Traitors—Great Britain—History—20th century
Nazis—Great Britain—History—20th century

For reasons that I can explain in a future post, the first and third Library of Congress Subject Headings (LCSH) strings are invalid as constructed and must be deleted from the record. The second subject heading string is problematic because 1) factually, Domvile was never tried nor convicted of treason and 2) the author doesn’t come to a definitive conclusion on the topic of whether Domvile committed treason (if the author’s argument was that Domvile did commit treason, the second heading might be justified). The subject headings raised many questions about facts (is a sympathizer a collaborationist?) and revealed catalogers’ frustrations with taxonomies that aren’t constructed carefully enough to help us know when it is appropriate to use a subject term.

The topic of how different domains treat subject analysis was raised in discussing the 20% rule. Library catalogers include subject headings if at least 20% of the resource is about that topic. In archival finding aids, collection-level subject lists tend to be extensive (I don’t think archivists observe a 20% rule but I’m happy to be corrected on that matter). Documenting expectations for metadata access is crucial. Even within the cultural heritage domains (libraries, archival and museums), metadata practice varies.

At about halfway through the meeting hour, the discussion began to turn away from fact and toward the topic of bias. Libraries have a specific purpose—if that purpose is misinterpreted by library users it may affect the understanding of the resources and services libraries provide. A participant brought up Emily Drabinski’s article, “Queering the Catalog: Queer Theory and the Politics of Correction,” in which the author argues that because knowledge organization systems are inherently biased and cannot be corrected, libraries are responsible for educating library user in the biases and limitations of the catalog as a dataset. Participants teased out this notion and by the end of the session, the conversation had definitely strayed far from the facts and well into how to respond to bias in metadata.

Join us for our next meeting on Wednesday, April 25. We will pick this topic up and see if we can identify some next steps.

Resources mentioned during this discussion:

Drabinski, Emily. “Queering the Catalog: Queer Theory and the Politics of Correction.” The Library Quarterly: Information, Community, Policy, vol. 83, no. 2, 2013, pp. 94–111. www.jstor.org/stable/10.1086/669547.

Haynes, David. “Metadata – have we got the ethics right?” ALA Editions Blog, March 21, 2018. http://www.alaeditions.org/blog/310/metadata-have-we-got-ethics-right.

Round Robin Updates – March 2016 meeting summary

The discussion on Moving from MODS to usable RDF, originally planned for March 8, will be rescheduled (date TBD). Stay tuned!

Since the time was blocked off on everyone’s calendars (and since we hadn’t met in a while), group members participated in a round robin. Below is a summary of the updates. Errors and misunderstandings on my part may be corrected in the comments below and I’ll do my best to update the post.

ORCHiD

  • Andrea Morrison reported seeing more and more ORCiD identifiers in Library of Congress Name Authority File (NAF) records

Browse Functionality in Progress for IUCAT

  • Rachael Cohen reported on the progress to implement browse for IUCAT
    • Working from code developed by Cornell (they also use Blacklight for their discovery layer)
    • Development team will start with author browse, then tackle Kinsey subject headings, then Library of Congress Subject Headings (LCSH)
  • Spencer Anspach clarified how browse will work:
    • Authorized access points in bibliographical records will be hyperlinked
    • Browse results will display all access points (authorized and unauthorized) that appear in bibliographic records–all results will have a number next to it, denoting how many times that access point is used in the bibliographic records database; some access points will have an icon next to them, which will denote that the access point is the authorized version (NAF, etc.)
      • Clicking on the icon will take a user to the authority record for that authorized access point (MARC 670 fields will NOT display to users)
    • Clicking on any of the access points will conduct a search on that access point
  • Concerns: batch loaded records from vendors (we won’t mention names) do not have authority control; one vendor in particular never includes dates (MARC subfield d) in authorized access points–this will certainly have an adverse impact on browse!
  • Shelf ready materials often do not have authority control–those errors are picked up in post-cataloging, by the Database Management team

Use of BoundingBox Tool Adds Value to Map Records

  • Heiko Mühr reported a pilot (and subsequent adoption) of the BoundingBox Tool
    • The tool allows catalogers to find geographical coordinates for an area and outputs longitude and latitude data in a number of formats (including–but not limited to!–OCLC MARC)
    • After piloting use of the tool in the month of December, catalogers determined that use of the tool did not adversely impact cataloging productivity–the quality of records increased
    • Coordinate data is now required in bibliographic records for sheet maps

Government Publications Metadata Update

  • Andrea Morrison reported on her collaboration with the new GIMMS librarian
    • Working on finding ways to streamline acquisitions and cataloging workflows and provide better metadata services for federal documents

Media Digitization and Preservation Initiative

  • Ronda Sewald reported on progress made in the Media Digitization and Preservation Initiative (MDPI) project
    • Tens of thousands of media objects have been digitized to date
    • Digitization is ongoing
    • Currently sorting out ways to determine the rights for digital objects and how to create metadata (at such a huge scale) for discovery

Medical Subject Headings (MeSH) Deconstruction Update

  • James Castrataro reported that the deconstruction of MeSH headings seems to be complete
    • It was uncertain whether or not MeSH access points could by subdivided chronologically before the December 2015 decision to deconstruct
    • No one present was sure whether or not MeSH access points can be subdivided chronologically at present

Using the Catalog to Support Teaching

  • Bob Noel sought recommendations from the group for providing access to individual vendor streaming video titles (records were batch loaded at the item-level, rather than the collection-level)
    • Participants came up with a handful of possible strategies including, creating a LibGuide where all updates to links, etc., would be handled in one place
    • Interesting discussion about making access as easy as possible for the user (faculty, researchers, and students)

Metadata at IU InfoShare

The Metadata Discussion Group is back! We kicked off our first meeting of the academic year with an infoshare, in which attendees were invited to share a few quick highlights about their metadata work. Attendees discussed project progress, described how their workflows are adapting, and elaborated on how they are coping with new or evolving metadata standards. As the discussion progressed, a number of common metadata challenges emerged.

Couldn’t attend but still wish to share your metadata project? Leave a comment below! Continue reading “Metadata at IU InfoShare”