Sunday, March 23, 2014

Is known item searching *really* an issue for web scale discovery?

Is known item searching really a big issue in Web Scale discovery?

Since I began looking at web scale discovery in 2009-2010, I've read many librarians comment on how known item search is harder in web scale discovery and it's not just the rank and file librarians.

In the latest Ithaka S+R US Library Survey 2013, in the section on discovery, for the question "To what extent do you think that your index-based discovery service has made your users' discovery experience
better or worse in each of the following areas?", Library Directors felt that "Helping users find  items they already know about" was Discovery's weakest area. (Figure 35)

On a personal note, when we went on to implementation of Summon in my own institution, some of the most negative feedback we received were from graduate students and Faculty, who lamented that many of the items they were looking for the catalog were now hard to find.

Hence it is with great interest I noticed the following Tweet by Dave Pattern of Huddersfield University Library, a known library innovator and a early adopter of Summon.
I believe he was reacting to earlier tweets coming out of ER&L 2014 , where a presenter claimed to have improved results by tweaking the ranking to improve among other things known item search. What followed was a long ranging discussion on Twitter, with many librarians and technologists working on discovery systems giving their two cents worth

Some claimed they heard of such complaints but could never get a credible non-contrived example and most examples surfaced were due to spelling errors. This group felt it could be more of a perception issue.

A few others felt it was a really problem at first, but the issue has been improved over the years.

Yet others (a smaller group) felt it was an important issue.

I myself am of the view that it is a issue that has gotten better with time, though issues remain. And yes, often the user complaining just remembers the one time out of hundred where the known item search fails to bring up the item, but it is still frustrating for a full tenured professor to suddenly fail to find a simple item when previously they could.

It's somewhat difficult to generalise though because some of us commenting are using Primo Central, others Summon etc.

Even within Summon implementations, results can vary as I have found often by comparing "fails" here with other Summon libraries

 1) Types of packages switched on (e.g If you turn on (where you can't generally tweak algorithms) Hathitrust, newspaper database packages, known item searching of catalog results get worse due to "crowding out" effect)

 2) Cataloging

That said, the question remains how bad is the known item search issue? Even one who is skeptical of known item search issues, will probably concede it will happen because there are many more results to sort through.

You can see this is the main reason for the problem, because most problems will disappear the moment you click "item in library catalogue" in Summon.

Currently in our instance of Summon, typing Freakonomics (which is the part title of a popular book commonly known as such and a frequent course reading), gets you only journal articles, newspaper articles, book reviews, anything but the book.

But refining to Library Catalog gets you the item.

The book Freakonomics become found only by restricting to item in Library Catalog

I agree that Discovery systems have harder jobs than opacs, but that is cold comfort to someone who used to be able to find a known item with one search in the OPAC.

Admittedly, as someone who is the point person to complaints on discovery services here, such issues loom large on my mind. Randomly looking through search logs in Google analytics also helps notice issues, though in reality the issue may not be that big.

There have been attempts to quantify this difficulty.

Most recent was Emily Singley's Discovery systems – testing known item searching where she tested 8 libraries using the 4 major discovery services.

The test is interesting in that it tried 4 types of queries
  • Single word titles (e.g. 1984)
  • Titles with “stop words” (e.g To have and have not)
  • Title/author keyword (e.g. Smith and On beauty)
  • Book citation (copied from bibliographies)
  • ISBN  
The results showed, WorldCat Local (name change to Worldcat discovery service coming?), came on top. Google was slightly behind followed by Summon, Primo Central and EDS.

Though interesting for comparison, the main issue as pointed out in comments was that the test set was not from a real world examples. Of course, Emily herself admits the test is "cursory".

Some libraries have done more specific tests like testing the top 1,000 most frequent known item search queries in logs to show their discovery service performs almost as well as the traditional OPAC. In my institution, we did the same for journal title/name searches, databases and books before launch. This helped a lot, but the long tail of searches means users will still run into issues in many cases.

Fear of issues with known item is not without precedent

In fact, this fear of known item search becoming harder has precedent before the current era of web scale discovery.

When library moved towards keyword searching as a default via "Next Generation catalogues" like aquabrowser, Encore, Primo there was a fear that known item searching would become harder compared to title browse.

I remember as a newbie librarian sitting in a committee worrying that keyword search would made known item search harder.

Was this fear borne out?

Known item searching - keyword searching vs title browse - a systematic test

Perhaps its instructive to study this example by University of Minnesota Libraries, where they systematically studied the effects of switching from

i) MNCAT classic - Aleph (Traditional catalogue typically title browse is default)

ii) MNCAT - Primo (basically next generation catalogue with keyword searching but no article index)

iii) MNCAT Discovery - Primo Central (Same as ii but includes article index)

H/T found via comment on Emily Singley's Discovery systems – testing known item searching blog post.


As explained in the very informative video above,  they randomly selected 400 items from search logs from their traditional OPAC to create benchmarks for MNCAT classic (OPAC) and MNCAT (Primo) and eventuall MNCAT Discovery (Primo Central)

These 400 may include items that the library did not have.

MNCAT classic was tested with "Title begins" - Or Title browse


MNCAT was tested with Keyword search.

If the entry appeared in the first 10, or "Did you mean" for MNCAT, it was considered found.

The results showed that 90% of results were the same (66% appeared in both, 24% neither).

8% of the time MNCAT classic found the item but MNCAT did not. And 2% the reverse happened.

The video goes on to study the differences in results.

What's the bottom line?

Technically the classic catalog won. 98% of the time, the classic catalgue worked correctly with known item searches , while the next generation catalogue with keyword searching worked correctly 92% of the time. (Assuming when neither search finds it, it is working correctly)

Is this difference significant? I would argue not.

Our own experience shifting to keyword searching in III Encore - a next generation catalogue also backups this experience, that keyword searching is generally as capable as title browse for finding known items.

A lot depends on how the algothrim ranks items of course (III's Encore algothrim is very well turned for known item searching matching title fields as highest priorities), but it seems to me as both traditional OPACs and next generation OPACs match only on traditional MARC and not on articles, so it's still relatively easy to get it right for known item searches.

What happens when you add a article index?

It will be very interesting to see the University of Minnesota Libraries results when they benchmark against MNCAT Discovery (Primo Central).

I will guess that known item search would be significantly worse (maybe 85%? particularly if we see author + title combos) without lots of customization because the challenge is now much harder sorting through all the newspaper and journal content.

A key to reduce this issue is the "Did you mean...." function. It's relatively easy to do this for searches for journal titles as some primo libraires have done, but needs to be done for books as well.

A "Did you mean" that could recommend popular textbooks based on circulation, presence in reading lists as well as other metrics could help as suggested by Dave Pattern.

There are other ideas not least which is bento style.....


It's pretty obvious that web scale discovery system will have tradeoffs and one of them is slightly less effective known item searching.

The question that isn't answered is, how big is the trade-off? The answers varies from audience to audience, my suspicion is that the popularity of Bento syle and or refusal to load catalogue data into discovery at some of the high ranked Ivy Leagues/ARLs suggests that known item search can be serious enough issues for some audiences to switch away from a "blended" style of results.

NCSU Libraries - Bento Style

The more graduate students and faculty you have, the greater likelihood they will be doing known item searches that aren't on the typical reading lists, "did you mean" checklists to help.

Granted a lot of searches they do can be challenging even for a traditional catalogue (looking for a particular edition of a common work for example), but web scale discovery makes it nearly impossible.

So what do you think? Do you think known item searching issue in web scale discovery is over-blown?

Tuesday, March 4, 2014

Library and Blue Ocean strategies (II) - Reconstruct Market boundaries for academic libraries

In my last post, I mused about blue ocean strategies and how libraries should consider spending time focusing more on blue ocean strategies.

I gave the example from the book of the declining Circus industry and how Cirque du Soleil changed the rules of the games. Instead of competing along the usual circus industry factors, they innovated by blending classic theater and reaching out to new markets drawing in the more intellectual crowd while reducing other elements like animal acts.

I think like most industries, libraries have always focused on red ocean strategies , basically how to make existing processes better. We are good at tracking our input and output statistics, at doing process improvement processes. Increasingly, we do bench-marking studies which focuses more on what other libraries are doing and making sure we do the same.

Red Ocean strategies are important no doubt and they will be always be the bulk of our strategies. But they won't suffice alone.

This is particularly so since our industry arguably shares characteristics similar to that of the circus industry, where the industry market demand is falling as users start to prefer other alternatives to our services.

Traditionally, libraries are also conservative and it's always a safer bet to try to improve some existing process incrementally then to strike out to try a new radical initiative.

Brian Mathews of Virginia Tech library (a library I think that leads the way with many new ideas) wrote a whitepaper : Think like a startup - A white paper to inspire library entrepreneurialism and talked about the need for true innovators.

He wrote

"Many library strategic plans read more like to-do lists rather than entrepreneurial visions. With all the effort that goes into these documents I’m not sure that we’re getting a good return"

and then goes on to say

"They don’t say: we’re going to develop three big ideas that will shift the way we operate. They don’t say: we’re going delight our patrons by anticipating their needs. They don’t say: we’re going to transform how scholarship happens. They don’t attempt to dent the universe." [emphasis mine]

Blue ocean strategies I think are exactly the type of strategies that are designed I think to help produce the kind of thinking that can "develop three big ideas that will shift the way we operate". 

Two chapters in the book in particular I found fairly interesting to help promote thinking to find such big ideas, are Chapter 3 - "Reconstructing Market Boundaries" and Chapter 5 - "Reach beyond existing demand".

Chapter 3 introduces the six path framework that help promote thinking to break out of the fundamental assumptions that underlie most industries traditional strategies.

I am going to try to use them in the academic library context. Sadly, I don't have any ground-breaking ideas (at least not ones I wish to share). What I will try to do instead is to try to examine the current "innovative" or "Radical" innovations academic libraries are trying circa 2014, and show how they could be seen as an attempt to find new blue ocean spaces of demand.

Look across complementary product and service offerings

This is probably the easiest idea to apply and it seems to me the bulk of new library ideas seem to come from here.

The idea here is to look at what happens before and after your service or product is used. Can you combine/absorb complimentary services under one roof making things a lot easier?

A toy example would be cinema operators making it easy for married couples to put their child at the baby sitters while they go out to have fun at the movies.

The academic library example of this could be summed up typically as  "support the lifecycle of scholarly communication"

Source :

This leads to a host of things beyond merely supporting searching for articles and books including
  • Reference management
  • Grant searching/ proposal writing
  • Research Data support
  • Operating Institutional repositories
  • Library as open access publisher
  • Support of research assessment and bench-marking (e.g Bibliometrics ) 
  • Providing technical expertise for pretty much anything the researcher might need help with to do his research

Arguably one could also fit the trend of combining IT with library support desks as well as  provision of computer workstations and other authoring tools in the library (the next logical thing after finding a book for your paper is to write on a pc!) as a way to combine complementary services under one roof.

Look across alternative industries

The book points out that "Alternatives are broader than substitutes... Alternatives including different products or services that have different functions and forms but serve the same purpose".

On the other hand, substitutes tend to have the same core functionality but may have different forms.

It's a subtle point, but the authors gives as an example , a CPA (Certified Public Accountant) and accounting software as substitutes because they have the same function but different form ie getting accounting done.

On the other hand, visiting a restaurant or cinema can be seen to be alternatives, they have different forms and functions (enjoying a good meal vs watching a good movie) but they arguably serve the same purpose ie enjoying the night out.

The idea here is to expand the market by embracing characteristics of alternatives and not just close substitutes.

It seems to me these definitions are a bit grey but let's see what I can do with them.

Patron driven Acquisition (PDA)  could arguably be one example. With PDA, users can look at a ebook in  a library catalogue and if they want it can get access with one click (and the library is charged), mimicking the ease of access of Amazon, iTunes etc.

Hence this combines the best of ebook buying industry with  traditional library cost, ($0 to the user).

But perhaps amazon ebook buying and borrowing books from the library are substitutes not alternatives.

In which case, the rise of maker spaces in libraries  in both academic and public libraries could perhaps be a even better example of looking across alternative industries and taking in the characteristics if not functions of alternatives.

An older example, could be the conversion of spaces in libraries to support collaborative learning and discussions. While this may vary from the traditional purposes of libraries of providing access to books and information, they do help draw usage of libraries by pulling in attributes and values of alternatives to visit libraries.

Of course such strategies run the risk of "mission creep", Hugh Rundle's "Mission creep - a 3D printer will not save your library" is a well known response to this,\

Yet another example could be web scale discovery services that marry the ease of use of web search engines with the academic content of databases.

The idea of  embedded librarianship where librarians leave the library and setup shop at offices of faculty/teaching hospitals can arguably also be seen as librarianship taking on characteristics of service industries like doctors making house calls.

Look across strategic groups within industries

This one is tricky, it involves trying to carve out new spaces across segments (typically segmented on price and performance) in a given industry. One example given was Sony Walkman in the 70s where it combined "the high fidelity of boom boxes with the low price and transistor radios within the audio equipment industry"

I am having problems coming up with examples for this, basically because libraries generally don't compete with one another, nor do we segment markets based on price and performance.

It could be I simply don't understand this one.

Look across chain of buyers

This simply points out that the purchasers who pay for the product might be different from the actual users.

Each group may value different things, so for example the person who purchases for a corporation might be more concerned about price and may be more willing to trade off functionality than the actual users.

The idea here is to see if one could target a different set of buyers than the traditional set.

The example given was Bloomberg in the 80s which started targeting individual analysts as opposed to IT managers. They added features that appealed to analysts, even including purchasing services for traders to buy gifts, book for holidays because while traders were wealthy they were also time poor.

Another example given was how a company shifted from targeting doctors, to targets patents to allow them to administer insulin themselves.

For libraries, I can think of the following examples.

Targeting faculty to influence students to "buy" reference and information services - this is pretty old hat.

A somewhat more unconventional idea was in "The Undergraduate Research Project at the University of Rochester" an ethnographic study of students.

They found "Students told us that their parents often edit their papers and advise them about assignments, so we decided to get to know parents through the library's sponsorship of the parent breakfast held during the class of 2010 orientation." (pg 12)

The other thing I can think of is how librarians through work on advocating for Open Access Mandates or citation/ Bibliometrics standards for promotion and tenure system can arguably influence "purchasing" of such related services from librarians.

I say arguably because cause and effect could be argued here.

Look across functional or emotional appeal to buyers

This refers to how most industries have either

a) Functional orientation


b) Emotional orientation

Companies that manage to challenge these orientations may unlock new oceans.

Examples given are Swatch, which added a emotional component and QB House which went the other direction to more functional based services where extras were stripped away and the focus was on speed.

I guess few would disagree with me that library services are strongly on the functional orientation.

One good example, I think is what Mal Booth's University of Technology, Sydney Libraries is trying to achieve.

UTS Library Spectrogram 

There is also increased focus on "user experience" with user experience librarians jobs and recently the establishment of Weave - Journal of Library Experience. 

And of course libraries are now also spending a lot of effort on how library spaces make people feel....

Look across time

This is pretty obvious, look at some trend and try to project how it will ultimately affect your business and move to that point first!

In some ways you could say libraries are not too bad at this, at least in terms of technology trends (or are we?). We are pretty early on most IT trends at least, trying everything from 24/7 chat services, web conferencing for classes to SecondLife (which didn't work out well), though arguably we dropped the ball on search.

We see the writing on the wall for library space to house print materials and many libraries are slowly preparing for the day where print is not as dominant.

The author states that the trends you are looking at needs to be

i) Decisive to your business
ii) Irreversible
iii) Clear trajectory

Besides the slow shift towards electronic away from print (completed for journals, and slowly moving for most monographs) and the trend towards increased access from remote areas, another trend I think that fits these three criteria is the rise of open Access.

Others may disagree of course, but if Open Access is going to be the norm, academic libraries should prepare for the day where a lot of their services will be disrupted and start to think how would an academic library look like if most articles were open Access?

Or alternatively as  Ithaka S+R Senior Anthropologist Nancy Fried Foster asks "what it would be like to design academic libraries based not on precedent, but rather on everything we can learn right now about the work practices of the people who already use them".


So here was my attempt to apply blue ocean strategies to find new markets. Not sure how successful it was, particularly since I concentrated on fitting in examples I knew about rather than thinking of new ideas which obviously made thinking new ideas nearly impossible.

Perhaps you can do better?

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...