Friday, March 25, 2011

One search box to rule them all? Web Scale Discovery tools ?

Last year, my institution launched both LibAnswers (a Faq system) and LibGuides (a content management system that is wildly popular for libraries). One interesting thing I noticed is that a number of queries (about 5-10%) were treating the search for LibAnswers or LibGuides like a library catalogue search box. People were searching for keywords of very narrow subjects. Not only that but I also saw searches for ISBN/ISSNs or even Librarian names in both Libanswers and Libguide searches.

This is of course a well known effect where users do not care to understand coverage of a search box, most often seen when users either enter article titles into the library catalogue which does not have article level data, or search for articles titles when the search box covers database names.

It's such a strong effect that I remember reading a study mentioning that when researchers studying subject guides put a feedback box on their guides and instead of receiving feedback, found that users were treating it like a search box instead.

Clearly many users will blindly put in search queries when they see a search box, they don't care what the search covers, they just want a google style search box that searches *everything*. But what is this *everything* ?

Of course, right now many academic libraries are scrambling to implement one stop shop search tools often dubbed "Web Scale Discovery tools" These include Summon, Ebsco Discovery Service, Primo Central, WorldCat Local.

These tools unify the silos that traditional separate library owned or licensed content including

  • Library catalogue
  • Content from databases bought/rented from vendors
  • Institutional repositories and other local archives

As libraries began to lose ground to Google and Google Scholar due to the failure of federated search approach , these tools seem to be custom made to solve this problem.

However, such systems in general don't cover the following class of content that are also provided by the library
  • Faqs (Libanswers, Kbpublisher, Rightnow etc)
  • Subject guides (LibGuides, wikis, SubjectPlus, Library à la Carte etc)
  • Help pages on various services
  • Librarian profiles
  • Events and programmes
These are not articles you use to do research per se, but people do search for such things.

Typically Google Search Appliance (GSA) or similar is what people use to search across these resources.

But how about searching across both classes of content? 

There is already some attempt to pull in non-traditional content into library catalogues.

I just saw a request for a possible enhancement that will allow indexing library website pages into a Next-Gen catalogue. Innovative Interfaces also allows users to surface library events when searching the library catalogue.   

Realistically speaking, getting content from different publishers, aggregators into one index is a lot harder than getting content from what the library owns locally, so one way of searching across both types of content would be to rely heavily on Web Scale discovery tools which are designed to handle this problem and add local content to it.

Adding LibGuides to Summon

For now, the most common content added to such tools that's I'm aware of are libraries adding LibGuides to Summon.

Among others doing so are University of Sydney's implemention of Summon at as well as Arizona State University Libraries implementation at

Below is an example, where a search of the librarian name at ASU,  brings up libguides created by the librarian.

As you can see above, Summon brings up libguides created by the librarian, with a nice profile to boot.

Unfortunately, in most cases I found it very difficult to surface the libguides as a hit, as most of the time the libguides were buried by thousands of other results. For sure general searches even those that match the text in Libguides usually failed to surface the libguides because they were too general with too many other results.

The LibGuides are classified as a "Research Guide" under the content facet (seems like results from ERIC are research guides as well), so unless a user limited by that they wouldn't probably miss it. One idea would be to look at the most popular searches and create guides catered for that, but I'm not sure if even that might be successful in pulling them out.

What about searching from the "other end"?

On the opposite end, what about searching in systems like FAQs, LibGuides , Website searches? Can you do the reverse and pull in results from Summon etc into the FAQ search or the website search box results?

As mentioned above, this can be a good idea as some people searching the FAQ system here   would be better served sometimes if it could produce for them results for articles, books results as well, etc.

The good thing about Summon and its competitors is that they generally provide an API that you can use to plugin in results from there (Summon's API).

One could imagine a FAQ search from say LibAnswers that would give priority to its results, but would add on results drawn from the Summon API. So while 95% of users would correctly use the Library FAQ search to search for FAQ like questions, the remaining 5% who treated it like library catalogue and put in ISBNs would not be left high and dry.

Examples of hybrid systems that pull in content from these two class of products

As I have blogged before in the past, MLibrary is probably one of the most advanced implementations of this.

"When you do a search on the University of Michigan Library's web site, you get not only results from the catalog, web site, online journal and database collections, and more, you also get a librarian who is a subject specialist related to your search term. While the matching is not perfect, it provides a human face on search results. So, for example, if you search for "Kant," in addition to books and databases, you also get the subject specialist librarians for humanities and philosophy. "Putting a Librarian's Face on Search"

You can do the search here . They don't seem to be pulling in article titles via their implementaion of Summon though they have pretty much everything else from Database names, catalogue results, website results, institutional repository results, guides, website results etc.

Display of results

Mlibrary's implemention does not privilege any particular class of content and presents different class of results separately. But should it?

The other approach is just to merge everything, but as we have seen with LibGuides in Summon , it can lead to content that is more numerous drowning out the rest.

Take the example of a SpringShare Libanswers box again. Say I populate it mostly with FAQs about loan procedures. 90% of users are searching correctly for such information, while 10% treat it like a library catalogue box . And you supplement the FAQ system with additional results drawn using Summon API.

For sure if the user entered a ISBN, it's pretty straight forward, the faq system would come up empty probably and Summon would pull in the right results.

But what happens if someone searches "loan entitlement" ? As the amount of content from Summon dwarves that of the FAQ system, you would probably a lot more library science articles then the FAQ on loan rules which is usually what is wanted? 

Lots of ways around this, from prioritizing results from the FAQ first and give a penalty to anything else, showing only the FAQ results followed by a link that says "do this search in Summon" etc

Some cases are not so clear-cut. Say you have a FAQ or guide on how to search for business/economics statistics, and the user searches for "gdp of Singapore", should the search present the FAQ/Guide above other results (say reports where the search terms are in the article/report title)?

Still thinking about it, though it occurs to me none of this is really new of course. The same discussion/debate was done during the time of Federated Search systems.

Recently, I pulled everything I have read on the topic on Web Scale Discovery tools (e.g Summon, Ebsco Discovery Service or EDS, Worldcat local and Primo Central) together into a bibliography and posted it on the following Google Site on discovery tools  started by Andy Ekins (Christ Church University, UK) and Lukas Koster (Library of the University of Amsterdam, NL).

It has about 50 different links to various topics on discovery tools from evaluation reports from different university evaluation teams and taskforces, debates between competitors, presentations by vendors and took me about 4-6 hours after work to create. Please point out any other relevant resources I have missed out, so I can continue to improve it!

blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...