Metadata Discussion Group data nerdery in the service of resource discovery

homepage for Metadata Discussion Group

Summary of January meeting

Tranforming archival finding aid metadata for searching and sharing

This month, the group discussed how LAMs might address legacy data problems in preparation for a potential move to linked data. Resources to be consulted in preparation for January’s meeting can be found here.

Discussion opened with a quick overview of how XML, SGML and HTML are related. Then the group looked at a few XML documents together by searching for a Library of Congress Subject Heading term at and viewing “MARC/XML” (scroll to the bottom of the page and look under “Alternate Formats”). Group members noted that the MARCXML was:

  1. Verbose
  2. MARCXML preserves MARC tags, subfield codes, and first and second MARC indicators in the form of attributes

Next the group looked at the same subject authority record in RDF/XML (MADS and SKOS) and noted:

  1. Elements are defined by namespaces for that schema
  2. Different schema may be referenced in a single XML document: MADSRDF, OWL, SKOS, etc.
  3. Data strings are normalized for easier processing (e.g., dates are normalized according to a W3C standard)
  4. Language of data strings are declared

Participants then wondered how we would clean up bad data in our MARC records, citing examples listed in the Bowen reading. How much of the cleanup may be done programmatically? Since it’s not feasible to correct every bad record by hand, are there errors we are willing to let slip by? Are there certain problematic fields (such as the now defunct MARC 410 field) that we are willing to let fall off completely? Are we capable of accepting a dumbed-down, lossy mapping away from MARC? One participant suggested that it may be useful to make these decision on a format-by-format basis. For instance, the MARC map format may require a different treatment than the MARC continuing resources record format.

Jennifer demonstrated a few XML conversions using Oxygen XML Editor1. XSLT (eXtensible Stylesheet Language Transformations) is used to turn an XML document (which could be a MARCXML file, a TEI file, etc.) and turn it into some other kind of XML-based file (a HTML webpage, an EAD file, a MODS record, etc.).

Tranforming archival finding aid metadata for searching and sharing
Click to enlarge

The first mapping started with a mixed-materials format MARC record for a collection-level archives title. Jennifer demonstrated how to use an Oxygen plugin2 developed by the IU Digital Library Program to map a MARC record to an EAD/XML file. After generating the EAD file, Jennifer demonstrated how to transform the collection-level EAD file to multiple item-level MODS records using XSLT. Though not demonstrated, it should be noted that the IU DLP takes an added step of mapping MODS to Dublin Core. Those Dublin Core records are then shared with the world via the Open Archives Initiative (OAI) protocols.

Participants agreed that XML is useful in the transformation of metadata records; however, reconciling a record full of MARC fields into separate RDF triple statements is the bigger challenge. One participant noted that standards organizations, such as MARBI, needed to be on board in this effort in order for library linked data to become a reality. Note: LC’s Bibliographic Framework Transition Initiative has made the agenda of MARBI’s ALA Midwinter meeting.

Of a more local concern: will Kuali OLE support linked data? The answer we think is no, it will not. Even if OLE is able to support linked data in the future, is Blacklight capable of utilizing the power of linked data?

  1. Oxygen XML Editor is available on all IUB LIT-managed staff machines. It is also available to all faculty, staff and students for download via IUware.
  2. Oxygen plugins developed by the IUB DLP may be found here.

Author- Jennifer A. Liss

Human. Librarian. Consumes large quantities of data.