Like me, you may be eyeing your summer reading pile and wondering where the time has gone.
Just this morning, my GLAM pile grew by one: Cataloguing Culture: Legacies of Colonialism in Museum Documentation by Hannah Turner. From the University of British Columbia Press website:
How does material culture become data? Why does this matter, and for whom? As the cultures of Indigenous peoples in North America were mined for scientific knowledge, years of organizing, classifying, and cataloguing hardened into accepted categories, naming conventions, and tribal affiliations – much of it wrong.
Cataloguing Culture examines how colonialism operates in museum bureaucracies. Using the Smithsonian’s National Museum of Natural History as her reference, Hannah Turner organizes her study by the technologies framing museum work over 200 years: field records, the ledger, the card catalogue, the punch card, and eventually the database. She examines how categories were applied to ethnographic material culture and became routine throughout federal collecting institutions.
As Indigenous communities encounter the documentary traces of imperialism while attempting to reclaim what is theirs, this timely work shines a light on access to and return of cultural heritage.
While I haven’t yet read it, I did enjoy listening to this episode of the History Slam podcast in which Sean Graham interviews Turner about the book.
What have you been reading (or meaning to read)? Let us know in the comments. Take care, all.
GLAM – acronym for “galleries, libraries, archives, and museums.”
This is a guest blog post by Karen Koswara, School of Informatics, Computing, and Engineering, Indiana University.
I am Karen Koswara and I am the first Undergraduate Research Opportunities in Computing (UROC) student to be involved in IU Libraries’ Inclusivity and Bias in Metadata Research. As a new Informatics major, I want to explore as many areas as possible until I find a perfect field that I am interested in. Thus, I signed up for many projects through UROC. I remembered that there was a large number of projects from the School of Informatics, Computing, and Engineering (SICE) as listed in UROC’s sign-up page. I was encouraged to choose as many projects that interested me as possible as there was no guarantee that I will get my first preference. Among the many projects that I chose, I remember the word “metadata” in Inclusivity and Bias in library cataloging and Metadata was very appealing to me, thinking it would be amazing to start working with metadata. A few weeks later, I was signed up to work with Julie Hardesty and I had never been more excited. On our first meeting, I met up with Julie to clarify what I will be doing for the research. Julie explained to me that the Metadata Discussion Group never had students involved before and she was interested in exploring inclusivity and bias in metadata from a student point of view. The research would mainly be reading articles, discussing my thoughts and points of view, and look into ideas that will help the Metadata Discussion Group to move forward. I officially signed up for this research after that meeting.
Later in the research after having my second meeting with Julie, I realized many issues came from collections in other languages that Library of Congress Subject Headings (LCSH) classify too broadly, making those collections unreachable. Looking at this situation, it made me think to approach this issue by looking into different cataloging systems in different countries with different languages. My idea is that maybe other countries’ cataloging classification is more specific, and we can apply their classification terms to LCSH. I believe this is possible when I realize a language often borrows words from other languages. For example, in Indonesian language, many words are the same as Dutch as Indonesia was under the Dutch for about three centuries. Indonesian language also uses some English terms. For example, the word “transition” is “transisi” in Indonesian and they have the same meaning.
I first thought of the Chinese library because of the complex Chinese characters. The most widely used cataloging system in China is now known as CLC (Chinese Library Classification) also known as CCL (Classification for Chinese Library). There are also many other classifications in China that are used such as LCPUC (Library Classification of the People’s University of China), LCCAS (Library Classification of the Chinese Academy of Sciences), and MSL (Library Classification for Medium and Small Libraries). Taiwan, Hong Kong, and Macau use a separate cataloging system known as New Classification for Chinese Library. Harvard has established their own cataloging system for Chinese collections known as Harvard-Yenching Classification System which many organizations in the United States adopted.
I also try to think of other organizations that might have their own library of some sort to keep track of their employees’ work such as National Geographic. I thought of National Geographic as I believe they would have some kind of cataloging system to keep track of their researchers’ and photographers’ work. Unfortunately, I was not able to find much since there is not much information available about this on the internet. I did find out that they do have some kind of library to keep images by their photographers.
As a first–time inexperienced researcher, I really hope this point of view helps leads to a new approach and a step forward. I have also learned that in doing this research, one has to look deep into oneself, and see how one portrays a word as well as what it means to them to avoid biases. I believe it would also help future researchers who are completely new to this research to look into a recent graduate student article titled “The Language of Cataloguing: Deconstructing and Decolonizing Systems of Organization in Libraries” by Crystal Vaughan at Dalhousie University in Nova Scotia, Canada to help understand the situation as it helped me very much.
At the Metadata Discussion Group meeting on March 8 April 5, 2016, we will talk about some of the challenges of mapping a descriptive metadata structure standard (in this case, MODS) from a XML-based expression to one that is RDF-based. This post will explain what MODS is and what it’s used for.
MODS: the ‘Who, What, and When’
The Metadata Object Description Schema (MODS) was published in 2002 by the Library of Congress’ Network Development and MARC Standards Office. The standard is maintained by an editorial committee comprised of library metadata practitioners from North America and Europe.
MODS is a “bibliographic element set” that may be used to describe information resources. MODS consists of 108 elements and subelements (there are 20 top-level or “parent” elements). At this point, I’ll urge you to go read the brief overview of MODS on the Library of Congress’ Standards website.
Go ahead. I’ll wait.
You read that bit about MODS being more or less based on MARC21, right? In the example below, I’ve described a sheet map using MODS elements and MARC tags.
DATA (formulated according to AACR2, if that sort of thing matters to you)
MARC TAG (and mapped MARC data value, when applicable)
Campbell County, Wyoming
Campbell County Chamber of Commerce (Wyo.)
Campbell County Chamber of Commerce
1 map ; 33 x 15 cm
Table 1. Data expressed in MODS elements and MARC tags.
There’s a full mapping of MARC21 tags to MODS elements available, if you’re really curious. This example demonstrates that, although there are a few divergences, MARC21 was built to map almost directly into a MODS element.
MODS encodes descriptive metadata, or information about resources (title, creator, etc.). MODS and MARC21 are examples of data structure standards. Elements or tags are meant to serve as containers for data. Structure standards do not give any directions about how to formulate data—those directions come from data content standards (AACR2, RDA, DACS, etc.). The main purpose for structure standards (Dublin Core, EAD, and TEI are other examples of metadata structure standards) is to encode data so that it can be manipulated by machines. Elements separate discreet information for use in search and browse indices. Data structure standard elements often convey the meaning of the data. The MODS:title element only contains the word or words that are used to refer to a resource. MODS:title will never serve as a container for the resource’s size.
MODS: the ‘Where, Why, and How’
MODS was built “for library applications.” MODS has been chiefly implemented to support discovery of digital library collections. At IUB Libraries, MODS is the metadata standard of choice for the digital objects that are ingested into our digital collections repository, Fedora.
MODS elements are expressed in XML. XML is a metalanguage, which means that XML is an alphabet, of sorts, for expressing other languages. The figure below illustrates the XML syntax (the “alphabet”) by which XML expresses another language. A fake language with a bogus element named “greeting” is encoded in Figure 1.
HTML (the language responsible for displaying this webpage to you right now), EAD, and TEI are also expressed using XML.
From the beginning, MODS was designed to be expressed as an XML schema. Schemata are the sets of rules for how languages work: which elements are valid and what their semantic meanings are, which elements nest within others, whether or not an element can be modified by attributes (e.g., the MODS:titleInfo might have an attribute called “type”), and whether there is a controlled list of values for a given attribute (e.g., the MODS:titleInfo “type” attribute is limited to the values “abbreviated, “translated,” “alternative,” “uniform”).
MODS records are created in a number of ways. You could open up an XML editor and start creating a MODS/XML record. If you want to really get to the know the MODS standard, that wouldn’t be a bad idea. However, if you wish to create metadata for a half a million photographs, editing raw XML won’t be terribly efficient. At IU, we have a few different methods for creating MODS records for digital objects. My favorite is the Image Collections Online cataloging tool. Use of the tool is restricted but I’ve included a screenshot below.
Once a collection manager decides which metadata elements are desired and has consulted with the metadata specialist for digital collections (our own Julie Hardesty), those elements will display in a web form. Data may then be entered without needing to know XML or MODS. In Figure 1, you’ll see a box in the lower right-hand corner “Transform metadata to…” Clicking on that link that says “mods” allows me to download the data that I input into the web form as MOD/XML. You may view the full record for this photograph below.
That’s the 5 cent tour of MODS, as it’s expressed in XML. Questions? Leave a comment below!
At the upcoming March 25 meeting, the group will explore what it means to do business on the web scale. This post is the second in a series of two blogs posts on the topic of making metadata scalable for the web. You can read the first post here.
There are numerous announcements peppering the web that library systems are now incorporating Schema.org to enhance search engine optimization (SEO). VuFind, an open source ILS developed and maintained by Villanova University’s Falvey Memorial Library, recently released VuFind 2.2 with Schema.org microdata integration for their OPAC.
In October, Koha 3.14.0 was released with support for Schema.org microdata in their open source OPAC. Evergreen, another open source ILS is now doing the same. Way back in 2012, OCLC added Schema.org mark-up to their WorldCat bibliographic records–
Exciting times, right? So how exactly is Schema.org enhancing the discoverability of a library’s collection via a web search? I was able to locate three libraries from a list of websites using Schema.org.
Searching “Last climb : the legendary Everest expeditions of George Mallory” in Google equals no hits for GWU Libraries. Nothing. Perhaps I am not understanding the functionality of the FindIt API and how it differs from a traditional OPAC, but I thought something would appear—especially since GWU Libraries took the time to use the following Schema.org itemprop tags:
The Goodreads result shown below was the second item generated from my Google search. Goodreads does use Schema.org—as you see the search generated more enriched data (i.e. ratings, stars, votes, summary, breadcrumb links). Unfortunately, I didn’t see any libraries that had unique information (holdings) display in my Google search—including WorldCat. Ditto for my Bing and Yahoo! searches.
Right now Schema.org seems to be adding value to search results for Google Scholar/Books, Amazon, and Good Reads. But wait—Amazon and Google Scholar/Books are not using Schema.org. [scratch head]
OCLC’s WorldCat has tons of rich bibliographic, relationship, user-contributed reviews, and holdings data and they are using Schema.org—why the heck aren’t their results generating holdings in their search displays? [still scratching head]
I applaud folks like those at GWU Libraries who have jumped in and implemented Schema.org. Why don’t you give it a shot and search for your favorite or most dreaded work. Any luck seeing value added data to your search results?
At the upcoming March 25 meeting, the group will explore what it means to do business on the web scale. This post is one in a series of two blogs posts on the topic of making metadata scalable for the web.
Perhaps you’ve heard of SEO, or search engine optimization. Once meant to refer to strategies for making websites more discoverable to search engines, SEO has evolved into a business sector in its own right. SEO companies sprang up to help businesses “game” search engine algorithms, in order to makes those businesses appear at the top of search result lists. Years of increasing attention to SEO seems to have driven search engines like Google, Yahoo, and Bing into finding ways of leveraging web content to deliver relevant results to searchers. It’s not hard to imagine a future in which it isn’t enough to populate webpages with descriptive metadata about the content, authorship, and characteristics of that webpage. Doing business on the web is beginning to mean that organizations must markup webpage content in a semantically meaningful and machine processable way. This post introduces microdata and Schema.org as a way of telling machines the meaning of text.
Before elaborating on what microdata is, let’s backup and talk about how HTML has conveyed metadata in the past. HTML documents are comprised of two areas, the head element (HTML tag: <head>) and the body element (HTML tag: <body>). The body element is where you put all of the content you want people to see. The text you’re reading right now resides in the <body> tag of this HTML page. HTML body elements include tags for demarcating headings, paragraphs, lists, etc. In other words, HTML marks up syntactic or structural information in a block of text. Without the structure provided by HMTL tags, text would display in browsers as one long continuous clump without line breaks, white space, or font variation.
Though not typically displayed to users, the HTML head element provides information about the webpage such as the type of content and character set encoding (e.g., text/html, UTF-8), the website title (which is visible at the top of the browser window or tab), and sometimes the website author, description, and keywords. These website characteristics appear inside of the <meta> tag, short for metadata. Content within the <meta name=”description”> tag is most often used by search engines Yahoo and Bing for search result display. Yahoo and Bing retrieved the search result snippets shown in Figure 1 from the quoted search “krups ea9000 barista automatic espresso machine black stainless.” For comparison’s sake, I’ve selected the search result for the product as it appears on Zappos.com.
If I look at the HTML source code for the product webpage, I can see the full text of the meta description element, as it appears within the HTML head (Figure 2).
Bing and Yahoo opted to choose the same specific portion of text included in the meta description element. Why did both search engines opt to display this particular section of the description text? Only by looking at proprietary algorithms could we attempt to find a reason.
Google also retrieves the Zappos page for this search; however, Google displays what they call a “rich snippet” (Figure 3). Google’s snippet includes some of the text from the meta description element but it includes other text as well. You’ll notice that the terms I searched for appear in bold text. Google pulled text not only from the <meta> tag in the <head> of the HTML document, it also pulled content from the <body> of the webpage where my search terms appear.
Google also displayed the list and sale price of the product, probably because someone at Google decided that such information is useful to searchers. How did Google know that those numbers were prices and not the number 9000 from the EA9000 model number or the number 23 from the product weight information? Because the prices on the Zappos webpage were encoded in microdata.
Microdata and Schema.org
In the web context, microdata is a HTML specification for embedding semantically meaningful markup chiefly within the HTML body. Microdata isn’t the same thing as metadata, as microdata isn’t restricted to conveying only information about the creation of the text. Microdata becomes part of the web document itself and serves somewhat like an annotation within the HTML body text. Microdata tells machines something more about the meaning of the text. On the Zappos product page, we see a nice display of the list price and sale price in the upper right hand corner of the webpage (Figure 4). Search engine web crawlers mining the same text in the HTML file see that the text “$2,499.99” is tagged with the Schema.orgprice property (Figure 5). Ah, so now we’ve come to it: how are microdata and Schema.org related? Basically, microdata is an HTML specification that allows for the expression of other vocabularies, such as Schema.org, within a webpage. Just as XML provides syntax for expressing TEI or EAD or MODS, microdata provides syntax for expressing Schema.org or RDFa.
I won’t go into the history of Schema.org (I touched upon it in past posts and this post has gotten quite a bit longer than I intended!); however it’s worth noting that the espresso machine example I’ve given above is limited, as Zappos hasn’t deployed Schema.org as extensively in their website as other companies have.
Try searching Google for movie times for a specific theater in Bloomington. At the very top of the search result list you should find structured display of movies, runtimes, MPAA ratings, showtimes, with links to trailers. How does this work? With Schema.org.
Welcome to the semantic web.
In the next of this two-part series, Rachel Wheeler will look at how libraries and library discovery layers are using Schema.org to expose resources.
 The statistical community also uses the term “microdata” to describe individual response data in surveys and censuses–completely different beast!
 I would have spent hours trying to figure out the distinction between microdata, microformats, schema.org, etc. if not for an incredibly thorough description by Aaron Bradley, former cataloger turned web consultant.
Do you like browsing for information via Pinterest? A group (which individuals may request to join) is managing a Resource Description & Access (RDA) social media board on Pinterest. There don’t seem to be many people contributing to the board yet, so most of the content originates from one or two websites/blogs.
I wonder whether the visual-bias of Pinterest is well-suited for revealing discussions about RDA, in which content is typically delivered textually. Discussion of RDA–and our knowledge of information organization in general–may be enriched by visualizations that help contextualize the use of this content standard within the broader metadata landscape.
Visual created with easel.ly, a free tool that is pretty fun to use and doesn’t demand a lot of info when creating a user account (just a username and password).