Blacklight and Stemming

With the coming transition of the IUCAT public interface from the existing SIRSIDynix OPAC to the new Blacklight discovery layer there are a lot of exciting new features coming our way. Some examples include faceted searching, better results, an easier to use interface. Along with the change in the interface, we will see changes in how search works. One of these changes relates to truncation and word stemming.

Truncation is the ability to expand a keyword search to retrieve multiple forms of a word by using a specified symbol to replace a character or set of characters. The truncation symbol can typically be used anywhere within a word: at the end, beginning, or within a term. For example in the current IUCat a search for comput$ would find words such as:  computer, computers, computing, and computation. Truncation is a handy tool that can help bring back a lot of different results and it is a common search feature in most traditional OPACs and in many vendor databases. Blacklight, like other discovery layer interfaces such as VuFind, relies on a technique called word stemming rather than on truncation.

Word Stemming is when the catalog searches for the “root” of a word and displays all words with that stem. Rather than relying on the searcher to place a specific character to expand the search as in truncation, the use of word stemming initiates an automatic search for the “root” of a search term, then returns results with all words associated with that stem. This is similar to how Google searches, so users who use Google a lot won’t notice much of a difference.

Because this is an automatic process, oftentimes it is difficult or impossible to know or predict the “stem” terms for any particular word. For example, knees has a stem of knee, but kneel has a stem of kneel not knee. Another example of stemming is when you type the word “searching” or “search” or “searches” you’ll find they all stem to “search”. But “searcher” does not; it stems to “searcher”.

For searchers who are accustomed to truncation, there may be similar terms that would have been retrieved using truncation, but which will not be retrieved using word stemming because they do not share the same stem.

For many of our users, this change will not be apparent, but we hope this is a helpful explanation of this change for expert searchers accustomed to relying on truncation.

More about Blacklight, the new interface for IUCAT

Since the last post about Blacklight, we’ve been asked a lot of questions about Blacklight and its development in IU Libraries. Those of you who missed reading it might want to check here. This post will focus on some of the questions that we have been asked most frequently.

What sorts of changes will there be in the new IUCAT?

People involved in the development phase are working hard to insure that a new discovery interface not only retains the functionality of IUCAT currently available, but also delivers improved functionality.

Most next generation type catalogs have a simplified search box, which is easy and quick to use for users with a simple question. Blacklight provides a simple format for the basic search and facet structure for limiting searches.

The generic Blacklight interface is customized according to the library’s individual needs and specific environments. Here are some examples that adopt Blacklight’s basic format.

  • University of Wisconsin – Madison

  • Stanford University

  • University of Virginia

  • Johns Hopkins University

The new IUCAT will have a single search box for the basic search, and faceted searching on the left side will allow users to constrain searches by controlled vocabulary items. There will also be an advanced search screen for more focused searches.  As the development is still in progress, your comments and ideas will be highly appreciated.

How do Apache Solr and Ruby on Rails work to index library resources?

Blacklight’s two fundamental technologies are the Solr search server and the Ruby on Rails web application framework. Developed by the Apache Lucene project, Apache Solr is used for indexing and searching records, while Ruby on Rails is used to create the front end. Here is a nice graphic representation of the Blacklight system.

Sadler, B. (2009). “Blacklight Infrastructure.” In Project Blacklight: a next generation library catalog at a first generation  university. Library Hi Tech, 27(1), 57 – 67.

A java-based program SolrMARC reads, indexes, and exports library’s MARC records to Solr and custom Ruby scripts are used for non-MARC items to map document metadata to Solr. The Ruby on Rails application looks to the Solr server for its data, passes search queries, and formats search results.

If you are interested in what others are doing in Blacklight, you can ask a question on the Blacklight mailing list and see its codebase at GitHub.

A new interface for IUCAT: Blacklight

As you may have heard, work has begun on a new interface for IUCAT. The IU Libraries OLE Discovery Layer Implementation Task Force (DLITF) will be overseeing the implementation of a new discovery layer, powered by Blacklight, to overlay our current SirsiDynix system. Development work is going on during this fall semester and a public Beta will be launched in spring 2012. This is a good time to share some background information around the new discovery interface, Blacklight.

What is Blacklight?

Blacklight is a free and open source OPAC (Online Public Access Catalog) solution developed at the University of Virginia (UVA) Library; check the project site for detailed information. While some OSS (Open Source Software) systems, such as Evergreen and Koha, were developed to replace a library’s entire ILS (Integrated Library System), Blacklight has been designed to work with a library’s current ILS to assist in reengineering the library’s searching tools.  It uses Apache Solr for indexing and searching records and Ruby on Rails for its front end.

What are some of the features?

Blacklight features faceted browsing, relevance based searching, bookmarkable items, permanent URLs for every item, and user tagging of items. As it is capable of searching both catalog records and digital repository objects, digitized images or repositories become more discoverable for users.  Unlike MARC records, which use similar templates for different types of objects, the use of Ruby on Rails allows librarians to define behaviors that are specific to certain kind of objects.

Where can we see examples?

The Task Force will begin soliciting feedback on the local beta implementation in the near future, but in the meantime, if you would like to see more, there are other mature installations of blacklight you may review. The University of Virginia, Stanford University, Johns Hopkins University, and WGBH are the principal contributors to the code base. There are dozens of sites worldwide and here are some of heavy users:

If you have questions about the task force or the project, feel free to contact us!

Additional reading:

Sadler, B. (2009). Project Blacklight: a next generation library catalog at a first generation  university. Library Hi Tech, 27(1), 57 – 67. Access the full text.

Sadler, B., Gilbert, J., & Mitchell, M. (2009). Library catalog mashup: using Blacklight to expose collections. In Engard, N. C. (Ed.) Library mashups : exploring new ways to deliver library data. Medford, N.J. : Information Today, Inc. Access the record in WorldCat.org.

Brave new catalogs

Last week our department attended a NISO webinar titled, The Future of Integrated Library Systems (pt 2): User Interaction.

In it, three next-generation library systems were discussed. As we are looking at Blacklight & VuFind for our next generation catalog discovery layer here at IU, I’ll focus not so much on each system’s technology, but more on the other information covered:

  • Jennifer Bowen from the University of Rochester presented on the eXtensible catalog. Many of the design & functionality decisions were driven by the ongoing ethnographic research being conducted on that campus (see Studying Students: The Undergraduate Research Project at the University of Rochester [PDF]).

    They approached the project with the perspective of thinking of the catalog in terms of “what do our users need to do.” They also have a new book, Scholarly practice, participatory design and the extensible catalog, just released by ACRL. Two examples of what they learned:

    • Users want to be able to choose between versions/formats
      Their users definitely had preferences when searching (limit to online only – avoid microforms – etc), and preferred when the catalog results showed search terms in context. They started with MARC and did a lot of transformation of the data, working with FRBR (works, expressions, manifestations, etc)
    • Researchers value scholarly networks
      One way they accomodate this in their community is by defining local metadata: for example, noting the advisor on the record for a thesis.
  •  

  • SOPAC 2, a catalog primarily aimed at public libraries, was presented by John Blyberg of the Darien Public Library. Many of the items from this part of the webinar would be of more interest to public librarians and were perhaps not quite as transferrable to our situation, but I did think their robust and creative use of tagging was quite intriguing. They used tags to create “virtual displays” or easy ways to collect items around a concept (“Staff favorites”) or even a theme (“Movies Better than the Book”). As you can see from the previous example, they were also quite open to subjective metadata, and found that it added a lot of value for users.
  • and WorldCat Local, presented by Anya N. Arnold of the Orbis Cascade Alliance (Pacific Northwest) and Allie Flanary of Portland (OR) Community College. As we are generally more familiar with this system, there were fewer lightning bolts for me in this portion, but it was easy to appreciate their emphasis on user testing and on collaborating amongst the user community to identify and implement improvements for a better user experience. One quote in particular caught my ear (I’m paraphrasing): “Saying ‘Because Google & Amazon can do it’ is a reasonable expectation for our users.”

You can see info about the webinar here: http://www.niso.org/news/events/2011/nisowebinars/userinteraction/

If you’re interested in viewing the recording, drop us a note in the comments or contact me directly!