Join the next Metadata Discussion Group meeting, where we’ll welcoming in the new academic year with a discussion about the many possible paths to implementing linked library data. Participants will consider homegrown and vended solutions and think about the implications of when and where to introduce linked data into library data stream.
DATE: Tuesday, September 20 TIME: 9-10 am PLACE: Wells Library Room 043 TOPIC: Paths to a Linked Data Catalog MODERATOR: Jennifer Liss
The Metadata Discussion Group is officially on summer hiatus! In the meantime, we’ll be posting occasional new items. If you have some news to pass along, send us a note.
Uche Ogbuji (@uogbuji) is Partner and Chief Technology Officer of Zepheira. He’s been writing about his work on the LibHub Initiative at the Denver Public Library (DPL). His posts include preliminary observations regarding the impact of converting a library database to published linked data–
If you want to see more library linked data in action, Rachel Fewell of DPL included links in a recent post she wrote, Visible Library.
LibHub aims to use BIBFRAME and Schema.org to make it easier for web crawlers to discover library resources and send users to library websites/catalogs.
When I look at the DPL LibHub “record” for Giraffes, black dragons, and other pianos [click this link and then click on the “No thanks, I’ll stay here” button], I can see that the data is being published on the web as BIBFRAME and Schema.org. If you want to see the markup, hit CTRL+U in your browser then do a find (CTRL+F) for “bf:” and “schema”. You’ll see PURLs. You’ll see some Dublin Core. And lots of something called http://bibfra.me/vocab/lite/ (which is best addressed in a separate post). What you won’t see? Access points (author, subjects, etc.) being associated with their identifiers, such as the Library of Congress Linked Data Service or VIAF. I’d guess that more robust linking is in the works. In any case, it’s good to see more examples of linked library data services being launched.
At this point, I’m fairly certain that only MARC data was used to populate the DPL LibHub dataset (I trust, dear Internet, that you’ll correct me if I’m wrong). DPL uses ContentDM to host their digital collections but I haven’t found any evidence that ContentDM Dublin Core records were included in the conversion. If you find a record from the DPL digital repository in the DPL LibHub dataset, let us know in the comments.
So, do libraries launch datasets on their own in the future? Do we pay for a service to host our data for us? I like the CC-BY license because it requires attribution (metadata provenance is going to be a bigger deal in the LOD world)–is this the way to go? I kept enclosing the word “record” in quotation marks. What do we call the “record” in the linked data environment. Data view?
At the upcoming March 25 meeting, the group will explore what it means to do business on the web scale. This post is the second in a series of two blogs posts on the topic of making metadata scalable for the web. You can read the first post here.
There are numerous announcements peppering the web that library systems are now incorporating Schema.org to enhance search engine optimization (SEO). VuFind, an open source ILS developed and maintained by Villanova University’s Falvey Memorial Library, recently released VuFind 2.2 with Schema.org microdata integration for their OPAC.
In October, Koha 3.14.0 was released with support for Schema.org microdata in their open source OPAC. Evergreen, another open source ILS is now doing the same. Way back in 2012, OCLC added Schema.org mark-up to their WorldCat bibliographic records–
Exciting times, right? So how exactly is Schema.org enhancing the discoverability of a library’s collection via a web search? I was able to locate three libraries from a list of websites using Schema.org.
Searching “Last climb : the legendary Everest expeditions of George Mallory” in Google equals no hits for GWU Libraries. Nothing. Perhaps I am not understanding the functionality of the FindIt API and how it differs from a traditional OPAC, but I thought something would appear—especially since GWU Libraries took the time to use the following Schema.org itemprop tags:
The Goodreads result shown below was the second item generated from my Google search. Goodreads does use Schema.org—as you see the search generated more enriched data (i.e. ratings, stars, votes, summary, breadcrumb links). Unfortunately, I didn’t see any libraries that had unique information (holdings) display in my Google search—including WorldCat. Ditto for my Bing and Yahoo! searches.
Right now Schema.org seems to be adding value to search results for Google Scholar/Books, Amazon, and Good Reads. But wait—Amazon and Google Scholar/Books are not using Schema.org. [scratch head]
OCLC’s WorldCat has tons of rich bibliographic, relationship, user-contributed reviews, and holdings data and they are using Schema.org—why the heck aren’t their results generating holdings in their search displays? [still scratching head]
I applaud folks like those at GWU Libraries who have jumped in and implemented Schema.org. Why don’t you give it a shot and search for your favorite or most dreaded work. Any luck seeing value added data to your search results?
At the upcoming March 25 meeting, the group will explore what it means to do business on the web scale. This post is one in a series of two blogs posts on the topic of making metadata scalable for the web.
Perhaps you’ve heard of SEO, or search engine optimization. Once meant to refer to strategies for making websites more discoverable to search engines, SEO has evolved into a business sector in its own right. SEO companies sprang up to help businesses “game” search engine algorithms, in order to makes those businesses appear at the top of search result lists. Years of increasing attention to SEO seems to have driven search engines like Google, Yahoo, and Bing into finding ways of leveraging web content to deliver relevant results to searchers. It’s not hard to imagine a future in which it isn’t enough to populate webpages with descriptive metadata about the content, authorship, and characteristics of that webpage. Doing business on the web is beginning to mean that organizations must markup webpage content in a semantically meaningful and machine processable way. This post introduces microdata and Schema.org as a way of telling machines the meaning of text.
Before elaborating on what microdata is, let’s backup and talk about how HTML has conveyed metadata in the past. HTML documents are comprised of two areas, the head element (HTML tag: <head>) and the body element (HTML tag: <body>). The body element is where you put all of the content you want people to see. The text you’re reading right now resides in the <body> tag of this HTML page. HTML body elements include tags for demarcating headings, paragraphs, lists, etc. In other words, HTML marks up syntactic or structural information in a block of text. Without the structure provided by HMTL tags, text would display in browsers as one long continuous clump without line breaks, white space, or font variation.
Though not typically displayed to users, the HTML head element provides information about the webpage such as the type of content and character set encoding (e.g., text/html, UTF-8), the website title (which is visible at the top of the browser window or tab), and sometimes the website author, description, and keywords. These website characteristics appear inside of the <meta> tag, short for metadata. Content within the <meta name=”description”> tag is most often used by search engines Yahoo and Bing for search result display. Yahoo and Bing retrieved the search result snippets shown in Figure 1 from the quoted search “krups ea9000 barista automatic espresso machine black stainless.” For comparison’s sake, I’ve selected the search result for the product as it appears on Zappos.com.
If I look at the HTML source code for the product webpage, I can see the full text of the meta description element, as it appears within the HTML head (Figure 2).
Bing and Yahoo opted to choose the same specific portion of text included in the meta description element. Why did both search engines opt to display this particular section of the description text? Only by looking at proprietary algorithms could we attempt to find a reason.
Google also retrieves the Zappos page for this search; however, Google displays what they call a “rich snippet” (Figure 3). Google’s snippet includes some of the text from the meta description element but it includes other text as well. You’ll notice that the terms I searched for appear in bold text. Google pulled text not only from the <meta> tag in the <head> of the HTML document, it also pulled content from the <body> of the webpage where my search terms appear.
Google also displayed the list and sale price of the product, probably because someone at Google decided that such information is useful to searchers. How did Google know that those numbers were prices and not the number 9000 from the EA9000 model number or the number 23 from the product weight information? Because the prices on the Zappos webpage were encoded in microdata.
Microdata and Schema.org
In the web context, microdata is a HTML specification for embedding semantically meaningful markup chiefly within the HTML body. Microdata isn’t the same thing as metadata, as microdata isn’t restricted to conveying only information about the creation of the text. Microdata becomes part of the web document itself and serves somewhat like an annotation within the HTML body text. Microdata tells machines something more about the meaning of the text. On the Zappos product page, we see a nice display of the list price and sale price in the upper right hand corner of the webpage (Figure 4). Search engine web crawlers mining the same text in the HTML file see that the text “$2,499.99” is tagged with the Schema.orgprice property (Figure 5). Ah, so now we’ve come to it: how are microdata and Schema.org related? Basically, microdata is an HTML specification that allows for the expression of other vocabularies, such as Schema.org, within a webpage. Just as XML provides syntax for expressing TEI or EAD or MODS, microdata provides syntax for expressing Schema.org or RDFa.
I won’t go into the history of Schema.org (I touched upon it in past posts and this post has gotten quite a bit longer than I intended!); however it’s worth noting that the espresso machine example I’ve given above is limited, as Zappos hasn’t deployed Schema.org as extensively in their website as other companies have.
Try searching Google for movie times for a specific theater in Bloomington. At the very top of the search result list you should find structured display of movies, runtimes, MPAA ratings, showtimes, with links to trailers. How does this work? With Schema.org.
Welcome to the semantic web.
In the next of this two-part series, Rachel Wheeler will look at how libraries and library discovery layers are using Schema.org to expose resources.
 The statistical community also uses the term “microdata” to describe individual response data in surveys and censuses–completely different beast!
 I would have spent hours trying to figure out the distinction between microdata, microformats, schema.org, etc. if not for an incredibly thorough description by Aaron Bradley, former cataloger turned web consultant.
In our next meeting, we’ll discuss how businesses and libraries are leveraging microdata and Schema.org to drive traffic to their websites. All are welcome to participate!
DATE: Tuesday, March 25 TIME: 9:30—10:30am PLACE: Wells Library Room 043 TOPIC: The Business of Metadata MODERATOR: Jennifer Liss with special guest, Rachel Wheeler!
RESOURCES YOU MIGHT CONSULT
Each of the resources below may be watched or read in under 30 minutes. In addition, Rachel and Jennifer will be blogging background information on this topic during the week of March 10.
Ring, S. (2013 March). Schema.org pilot project. Minitex Digitization, Cataloging & Metadata Mailing, 1-5. download
Scott, D. (2013 April 12). Microdata: Making metadata matter for machines. [recorded 30 minute presentation] 2013 Evergreen International Conference. Retrieved from https://archive.org/details/Microdata
OCLC recently released a 15 minute video on YouTube that introduces the concepts and technologies behind linked data and how it can benefit libraries and their users. It’s a great video and uses the running example of Raven (as a thing and as a poem) to exemplify how linked data can disambiguate concepts and help improve search results.
Have you recently noticed that when you search for a well-known person on Google that there is a box on the right side of the page highlighting the person? Give it a shot–search for “Jon Stewart” in Google and on the right-hand side will be a box that says, “See results about Jon Stewart” and includes his photo. Click the Jon Stewart link and you will see bio information scraped from Wikipedia, a photo from askactor.com and a listing of some of his books.
How is this done? By using linked data, of course. In 2011 the big three search engines (Google, Bing, and Yahoo!) created Schema.org to create, support, and maintain common schemas that will be recognized by major search providers.
OCLC has worked with the Schema.org folks to make sure the library metadata is added to the Schema.org ontology. In addition, OCLC has added linked data to OCLC WorldCat records (see the recent MDG post). The OCLC linked data for libraries YouTube video elaborates on why linked data is relevant to libraries.
It’s a great YouTube video and I highly recommend it [thumbs up]!