Join the next Metadata Discussion Group meeting, where we’ll welcoming in the new academic year with a discussion about the many possible paths to implementing linked library data. Participants will consider homegrown and vended solutions and think about the implications of when and where to introduce linked data into library data stream.
DATE: Tuesday, September 20 TIME: 9-10 am PLACE: Wells Library Room 043 TOPIC: Paths to a Linked Data Catalog MODERATOR: Jennifer Liss
Since the time was blocked off on everyone’s calendars (and since we hadn’t met in a while), group members participated in a round robin. Below is a summary of the updates. Errors and misunderstandings on my part may be corrected in the comments below and I’ll do my best to update the post.
Andrea Morrison reported seeing more and more ORCiD identifiers in Library of Congress Name Authority File (NAF) records
Browse Functionality in Progress for IUCAT
Rachael Cohen reported on the progress to implement browse for IUCAT
Working from code developed by Cornell (they also use Blacklight for their discovery layer)
Development team will start with author browse, then tackle Kinsey subject headings, then Library of Congress Subject Headings (LCSH)
Spencer Anspach clarified how browse will work:
Authorized access points in bibliographical records will be hyperlinked
Browse results will display all access points (authorized and unauthorized) that appear in bibliographic records–all results will have a number next to it, denoting how many times that access point is used in the bibliographic records database; some access points will have an icon next to them, which will denote that the access point is the authorized version (NAF, etc.)
Clicking on the icon will take a user to the authority record for that authorized access point (MARC 670 fields will NOT display to users)
Clicking on any of the access points will conduct a search on that access point
Concerns: batch loaded records from vendors (we won’t mention names) do not have authority control; one vendor in particular never includes dates (MARC subfield d) in authorized access points–this will certainly have an adverse impact on browse!
Shelf ready materials often do not have authority control–those errors are picked up in post-cataloging, by the Database Management team
At the Metadata Discussion Group meeting on March 8 April 5, 2016, we will talk about some of the challenges of mapping a descriptive metadata structure standard (in this case, MODS) from a XML-based expression to one that is RDF-based. This post will explain what MODS is and what it’s used for.
MODS: the ‘Who, What, and When’
The Metadata Object Description Schema (MODS) was published in 2002 by the Library of Congress’ Network Development and MARC Standards Office. The standard is maintained by an editorial committee comprised of library metadata practitioners from North America and Europe.
MODS is a “bibliographic element set” that may be used to describe information resources. MODS consists of 108 elements and subelements (there are 20 top-level or “parent” elements). At this point, I’ll urge you to go read the brief overview of MODS on the Library of Congress’ Standards website.
Go ahead. I’ll wait.
You read that bit about MODS being more or less based on MARC21, right? In the example below, I’ve described a sheet map using MODS elements and MARC tags.
DATA (formulated according to AACR2, if that sort of thing matters to you)
MARC TAG (and mapped MARC data value, when applicable)
Campbell County, Wyoming
Campbell County Chamber of Commerce (Wyo.)
Campbell County Chamber of Commerce
1 map ; 33 x 15 cm
Table 1. Data expressed in MODS elements and MARC tags.
There’s a full mapping of MARC21 tags to MODS elements available, if you’re really curious. This example demonstrates that, although there are a few divergences, MARC21 was built to map almost directly into a MODS element.
MODS encodes descriptive metadata, or information about resources (title, creator, etc.). MODS and MARC21 are examples of data structure standards. Elements or tags are meant to serve as containers for data. Structure standards do not give any directions about how to formulate data—those directions come from data content standards (AACR2, RDA, DACS, etc.). The main purpose for structure standards (Dublin Core, EAD, and TEI are other examples of metadata structure standards) is to encode data so that it can be manipulated by machines. Elements separate discreet information for use in search and browse indices. Data structure standard elements often convey the meaning of the data. The MODS:title element only contains the word or words that are used to refer to a resource. MODS:title will never serve as a container for the resource’s size.
MODS: the ‘Where, Why, and How’
MODS was built “for library applications.” MODS has been chiefly implemented to support discovery of digital library collections. At IUB Libraries, MODS is the metadata standard of choice for the digital objects that are ingested into our digital collections repository, Fedora.
MODS elements are expressed in XML. XML is a metalanguage, which means that XML is an alphabet, of sorts, for expressing other languages. The figure below illustrates the XML syntax (the “alphabet”) by which XML expresses another language. A fake language with a bogus element named “greeting” is encoded in Figure 1.
HTML (the language responsible for displaying this webpage to you right now), EAD, and TEI are also expressed using XML.
From the beginning, MODS was designed to be expressed as an XML schema. Schemata are the sets of rules for how languages work: which elements are valid and what their semantic meanings are, which elements nest within others, whether or not an element can be modified by attributes (e.g., the MODS:titleInfo might have an attribute called “type”), and whether there is a controlled list of values for a given attribute (e.g., the MODS:titleInfo “type” attribute is limited to the values “abbreviated, “translated,” “alternative,” “uniform”).
MODS records are created in a number of ways. You could open up an XML editor and start creating a MODS/XML record. If you want to really get to the know the MODS standard, that wouldn’t be a bad idea. However, if you wish to create metadata for a half a million photographs, editing raw XML won’t be terribly efficient. At IU, we have a few different methods for creating MODS records for digital objects. My favorite is the Image Collections Online cataloging tool. Use of the tool is restricted but I’ve included a screenshot below.
Once a collection manager decides which metadata elements are desired and has consulted with the metadata specialist for digital collections (our own Julie Hardesty), those elements will display in a web form. Data may then be entered without needing to know XML or MODS. In Figure 1, you’ll see a box in the lower right-hand corner “Transform metadata to…” Clicking on that link that says “mods” allows me to download the data that I input into the web form as MOD/XML. You may view the full record for this photograph below.
That’s the 5 cent tour of MODS, as it’s expressed in XML. Questions? Leave a comment below!
A message went out to the Program for Cooperative Cataloging (PCC) listserv announcing the appointment of the PCC Task Group on URIs in MARC. The Task Force was formed to “help fulfill our strategic objective to optimize library data for the web” (email from Kate Harcourt to PCCLIST, 8 September 2015).
The group’s charge is available online (.docx). In brief, the group’s objective is to find ways to “transition from string-based descriptive and authority data.”
Those working in an OCLC environment may have already seen URIs in MARC. In order to transition to linked library data, the German National Library (DNB), regularly adds the MARC subfield 0 to access points in MARC bibliographic records. Below is a screenshot of OCLC record 827841368.
The information in subfield 0 contains a code identifying the authority file in parentheses, immediately followed by the authority record control number for the vocabulary term. DE-588 refers to the German National Library’s Integrated Authority File–Gemeinsame Normdatei (GND), which is available as linked data.
Subjects and names selected from authority files are the first obvious identifiers that might be included in bibliographic records; however, those aren’t the only terms selected from controlled vocabularies. Task Force charge #4 addresses “other” entities and relationships expressed in bibliographic records that would be better treated as identifiers. For instance, all RDA Relationship Designators (Appendices I, J, K, M, and eventually L) are terms that might be referenced with URIs, as all RDA element and vocabulary terms are available in the Library of Congress Linked Data Service and/or the Open Metadata Registry. The URI for “Sequel to“, as it might appear in a subfield i of a 7XX field, would become a machine-actionable link between the 1st book in a series and a 2nd book in a series. So much more of our RDA data (e.g., RDA carrier types or RDA media types) could be referenced by URIs.
Quoting again from the charge: “Providing URIs in MARC records will greatly facilitate the reuse of MARC data as linked data and opens the way for catalogers to work with entity registries and controlled vocabularies from the larger metadata community.”
The Metadata Discussion Group is officially on summer hiatus! In the meantime, we’ll be posting occasional new items. If you have some news to pass along, send us a note.
Uche Ogbuji (@uogbuji) is Partner and Chief Technology Officer of Zepheira. He’s been writing about his work on the LibHub Initiative at the Denver Public Library (DPL). His posts include preliminary observations regarding the impact of converting a library database to published linked data–
If you want to see more library linked data in action, Rachel Fewell of DPL included links in a recent post she wrote, Visible Library.
LibHub aims to use BIBFRAME and Schema.org to make it easier for web crawlers to discover library resources and send users to library websites/catalogs.
When I look at the DPL LibHub “record” for Giraffes, black dragons, and other pianos [click this link and then click on the “No thanks, I’ll stay here” button], I can see that the data is being published on the web as BIBFRAME and Schema.org. If you want to see the markup, hit CTRL+U in your browser then do a find (CTRL+F) for “bf:” and “schema”. You’ll see PURLs. You’ll see some Dublin Core. And lots of something called http://bibfra.me/vocab/lite/ (which is best addressed in a separate post). What you won’t see? Access points (author, subjects, etc.) being associated with their identifiers, such as the Library of Congress Linked Data Service or VIAF. I’d guess that more robust linking is in the works. In any case, it’s good to see more examples of linked library data services being launched.
At this point, I’m fairly certain that only MARC data was used to populate the DPL LibHub dataset (I trust, dear Internet, that you’ll correct me if I’m wrong). DPL uses ContentDM to host their digital collections but I haven’t found any evidence that ContentDM Dublin Core records were included in the conversion. If you find a record from the DPL digital repository in the DPL LibHub dataset, let us know in the comments.
So, do libraries launch datasets on their own in the future? Do we pay for a service to host our data for us? I like the CC-BY license because it requires attribution (metadata provenance is going to be a bigger deal in the LOD world)–is this the way to go? I kept enclosing the word “record” in quotation marks. What do we call the “record” in the linked data environment. Data view?