Saturday, August 31, 2013

How I learned to stop worrying about the size of discovery index & love the search

I blogged  8 things we know about web scale discovery systems in 2013 , and am working on a draft of "4 issues about web scale discovery systems we are still pondering about", but I was already pretty sure that beyond a certain point, the size of the index while important is no-longer the be-all and end all for evaluating the search.

The story I am going to tell pretty much nailed that point.

On 28 August Singapore Time at 9am, while running a routine check of our discovery service Summon, to sample test the linking reliability of a new content provider/database we just turned on in Summon, I noticed to my horror that the number of results I was getting in Summon was half essentially by half! (This seemed to be affecting other institutions on Summon as well )

For those not familiar with Summon, you can do a "blank" search - and it will show all the records you have turned on. I do a routine check at least once a week, so on our case, I knew we should have roughly 330 million results if not more.

But on 28 August, we were showing about 169 million results half of what was expected. Based on my last recorded content type break-down, a lot of the missing content was journal articles so it wasn't just inconsequential newspaper articles. For example, some major economics journals were now missing.

A check shows that Summon just wasn't registering a lot of our holdings so it wasn't showing those articles as available online.

Panic mode

My first thought was, we are going to be in big trouble. It was week 3 of the term, the academic year was in full swing. Librarians were teaching classes, we just got over the bump where it was mostly searches for class readings and we were starting to get requests for advisories sessions for thesis, dissertations & assignments.

I knew there were at least 4 classes in my library alone, and I myself was scheduled to do one the very next day, a starter class to assist honours year students who were planning their thesis next term.

It was panic mode time or so I thought. I quickly informed colleagues I knew who were doing classes, warning them that their canned search might give different results. In particular, I was worried that librarians extorting the power of Summon by demoing a known article title search (a very popular strategy) would be embarrassed if they suddenly found no results.

I was also worried about users. Would they be angry? Disappointed the results were now so poor because of the relatively poverty of the index now? Would we get a lot more futile searches? More document delivery requests for items we have access?

Reaction was muted

It turns out, none of these came to past, except for one librarian who was doing a class in the morning (before I had time to warn her) noticing her known article title search did not bring out the link to the full text.

As far as I know, no user even noticed the index was halved. We did not get any complaints, reference transactions and document services seemed to be at normal levels.

Because I sent out a quick mass email to all our librarians, we would never know how many of our librarians would have noticed this on their own. I am guessing short of looking at their own canned searches they prepared before-hand or going to looking for articles they know are covered probably not.

I did expect the Summon mailing list to be lit with complaints, but amazingly it was quiet even given the time difference. Even after the first report was made on the mailing list more than 12 hours on the mailing list, the reaction there was extremely mild.

But what did the aggregate statistics say about user behavior? We might expect number of visits not to be affected but search per visits should be higher as people try harder to find what they want.

This issue lasted from roughly 28 Aug, Wednesday and I noticed it was fixed at around 3pm 30 Aug Friday.

  • Page per visits based on Google analytics rose from 3.93 (Tuesday) to 4.24 (Wednesday) and 3.97 (Thursday) and rose even further to 4.55 (Friday)
  • Curiously Summon's own native statistics show a fall in searches per visit
I don't quite understand how Summon's own statistics count "searches", but for Google analytics each refinement is also considered a page view, so you can say on Wednesday there were more searches and/or refinements per visit compared to Tuesday.

That said I don't think it was very significant, in any case on Thursday it dropped back to a hair's breath of Tuesday and on Friday (when the issue was fixed at 3pm) it even rose to the highest level since 18 Aug.

How much does index size matter?

Based on the reactions so far it seems size of the index doesn't seem to bother most of our users. First off, most of them can't even tell, half of the index is missing! Librarians might but only if it's an area they are very very familiar with.

Even for a known article title search, users who cant find the article will just assume we don't have it. Though I wonder if librarians will stop teaching users to type in article titles in discovery services if the coverage of our collections in the index drop below the typically level of 90% coverage for most academic libraries. 

That said, do the quality of results suffer when searching over 170 million compared to 330 million? 

It's really hard to tell, logically it should in some cases but perhaps when you are up to couple of hundred million, even losing half of it is not a big deal unless you really doing something in-depth where there are very very few relevant results and/or you doing a comprehensive review.

When you are up to figures like 300-500 million, the relevancy ranking is far more important. Let's imagine a scenario where articles are randomly dropped from the index. 

Scenario 1  - High number of results are relevant say 200 relevant - So instead of getting 200 results that are relevant, you get 100... Most users are equally happy either way

Scenario 2 - Only a small handful say 3 are relevant - This time, halving the index makes a big difference. The main thing here is relevancy ranking is very important. If only 3 results are relevant, a far greater problem is to ensure those 3 appear on top of the numerous results that are returned.

That said, I do pity the students/researchers who are trying to do a comprehensive literature review, they will probably be missing out without knowing it compared to usual. 

Yet, remember this, today we live in a world where most of the major contender providers have decided that they want to participate and ensure their results appear in the index of discovery services. This wasn't so clear back in 2009 when discovery services first started out and one major concern by librarians was the comprehensiveness of the index. 

We can imagine a world where for some reason, discovery services were half as effective as getting publishers to get involved and the index was half as large for most academic libraries. Would discovery services still take off? Would they still be equally popular?

Given the results of this short "natural experiment", the answer seems to be yes. Beyond a certain point for topical searches, users don't even notice anything and even a discovery index that is half of what it is now, is still bigger than what we have for anything short of Google Scholar.

In some ways this reaffirms the results found in A Comparison of Article Search APIs via Blinded Experiment and Developer Review, where a counter-intuitive result was found that users actually preferred (not significantly though) a combined search of Ebscohost databases than Ebscohost discovery service which had everything in the former plus other non-ebsco content.

" EBSCO traditional is essentially a subset of the EDS corpus — EDS searches everything our EBSCO traditional API setup does, and more. They probably use similar ‘relevance’ rankings.

One would expect EDS to do substantially better than EBSCO traditional, being a newer product, with many enhancements and more coverage, from the same vendor. Yet, this did not happen. EBSCO ‘traditional’ in fact was preferred substantially more than EDS — 13 to 5, not a statistically significant level in part due to small sample size, but striking nonetheless."

Who knows if we did a blind test of Summon with index of 169 million results vs Summon with the 330 million results now, users might also prefer the former :)

I notice as I write this on Aug 31, 7pm Singapore time, the bug is back and we are down to 169 million again, I've reported the issue again but not going to lose sleep over it. 

blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...