Why not Web Scale Discovery Tools?

Recently, I pulled everything I have read on the topic on Web Scale Discovery tools (e.g Summon, Ebsco Discovery Service or EDS, Worldcat local and Primo Central) together into a bibliography and posted it on the following Google Site on the topic started by Andy Ekins (Christ Church University, UK) and Lukas Koster (Library of the University of Amsterdam, NL).

It has about 50 different links to various topics on discovery tools from evaluation reports from different university evaluation teams and taskforces, debates between competitors, presentations by vendors and took me about 4-6 hours after work to create. Please point out any other relevant resources I have missed out, so I can continue to improve it!

As I reread the material and organised my thoughts, it seems to me after the false dawn of Federated Searches and  Next generation catalogues, Summon like Discovery tools might perhaps really going to be the "real deal".

That said there still remains many huge questions and very few answers.

How big will the impact of Discovery tools?

Academic libraries around the world are excited by the potential of Web Scale Discovery tools. Initial results by early adopters seem promising. GSVU for example showed that Summon seemed to be reversing the download trends of usage, average usage of the top 100 journals increased by 42% and for the top 1000 journals it was 82%

It's still early days yet with most services being barely one year ago and this might not seem that impressive but compare that to the last "great hope" , Next generation catalogues and before that federated search.

A couple of years ago, libraries were excited by the idea of "next generation catalogues" that included "Web 2.0" features like tagging, submission of reviews, facets etc. Libraries were purchasing products like Encore, Aquabrowser, Primo...

It was believed that such modern looking interfaces would win some of our users back from Amazon and maybe even Google. I haven't done much research in this area, please correct me if I'm wrong but my impression is that in general, next generation catalogues so far have failed to live up to their hype and have not done much to turn things around, the trend of users going for google and amazon if anything has accelerated . While there is some evidence that facets are appreciated  and used more (most users don't do advanced searches in classical catalogue), users generally did not contribute tags or reviews to the next generation catalogues as there is little pay-off for users to do so and most libraries didnt have the mass alone for the social aspects to payoff (though Librarything for libraries and BiblioCommons seems to be promising as they overcome this lack of mass).

In short, many users continued to use Amazon, Google, and the library catalogue if it was used, was used more as a delivery tool ("obtain" in FRBR user tasks), to check whether the item they found in Amazon or Google was available in the library as opposed to a discovery tool.

I may be too pessimistic and it may be that the current next generation library catalogues need more time to show their effects or have their hearts in the right place but need to be designed better rather than just tagged on ("next next generation library catalogues anyone?), but thus far I have not seen any report showing that implementing these tools have led to a huge concrete increase of usage.

In light of this, the results so far coming out from Summon etc are astonishing, yet as I study the literature, I notice there has being some resistance by librarians against the idea. In fact I found concerns against "one-stop shops" are not new, going back to the days of federated search.

Discovery tools as a default search?

One thing about working in libraries is that you eventually get used to some odd things (and worse eventually don't see them) but still certain things give me pause. This is one. In theory, one library could offer the following options

  • "Classic" catalogue (e.g. WebPac Pro)
  • Next Generation Catalogue (e.g. Encore)
  • Article Finder
  • Federated search (e.g. Research Pro)
  • WorldCat/Consortium Library or Union Catalogue
  • Discovery tool (e.g Summon)
  • Google Scholar (promoted by the library officially)

I don't think any library I know of offers all this, but I recall seeing libraries that come close, offering classic catalogue/next generation catalogue/Worldcat/Summon & Google scholar on the main page.

While it is good to have choices, isn't it a great relief for most users to say forget them all, just use Summon etc ? Interestingly not everyone agrees this class of products should be promoted as a default.

Leaving aside technical issues, cost and vendor trust/lock-in issues (Carl Grant of ex libris & the people at the Federated Search blog in particular warns about problems of content neutrality & hidden cost, dangers of "library bypass" by relying solely on a unified index that cannot be affected by any one library etc) , I have divided the reasons as follows
  1. Searching through millions of items is not necessary and tends to confuse users 
  2. There are better more discipline specific tools such as PubMed with better features such as thesauri etc.
  3. It's unclear what is in the index, or if it is, whether it is full text coverage or just metadata of varying quality (related to #2)
  4. Encourages "quick and dirty searches", leading to lazy, poor searching techniques (related to #2)
  5. It is too expensive and anyway Google Scholar which is free is perfectly capable and outperforms them all.
The interesting thing of course is that all the reasons above, except #5 of course can be leveled against Google Scholar, and yet we are getting evidence that researchers are increasingly turning towards gateway services like Google Scholar (e.g See CIBER groups research) to access Sciencedirect, JSTOR and all but ignoring expensive features built into such platforms. So are such complaints really valid? 

Searching through millions of items is not necessary and tends to confuse users

Medical Librarian Dean Giustini is on record saying "The idea of providing one-search for users in medicine — mostly physicians and medical students — is very difficult for me to justify (and teach). Further, it’s not appropriate in most search instances. " It's a interesting comment, and he gives many reasons one of which is "Some of its results for medical topics are illogical and force users to re-do searches through native tools such as OvidSP MEDLINE."

Dean also wonders if the best thing to do is to always search across millions of records. Wouldn't most undergraduates be better served by just searching across something more limited in scope like Academic Search Premier?  Meredith Farkas wonders along the same lines about WorldCat Local, though her concern comes more with WorldCat Local's inability to filter out non-scholarly results, a problem that Summon doesn't suffer because of a "scholarly results only" option.

But I do get their point. The typical undergraduate is probably searching for fairly standard general topics like trying to find a paper on Kantian ethics, where the issue usually isn't not enough results but rather too many. Searching 750-1,000 million items is great, but relies heavily on the relevancy algorithm to surface relevant content.

But this argument about too many results needs to be examined closely. Larger index is not a problem, if the algothrim is capable. Being smart enough to refine using facets helps too.

I love google scholar, while speed and ease of use is no doubt a factor, but for me personally, the fact that Google Scholar has by far the largest index is more often then not a boon rather then curse.

I tend to need to search for topics concerning Singapore and Asia and if you haven't done such searches you will be surprised how quickly the number of results shrink once you add that as a search term to even the most general terms, and this is where a huge index is deeply appreciated. Discovery tools are not at the level of Google Scholar, but 700 million records (in Summon) is nothing to sneeze at.

I would guess for most researchers they tend to be doing long tail searches as well where the larger the index the better.

I do agree that the relevancy algorithm of Summon and some products I have tried seem a bit odd/weak which contributes to what Dean is saying, but in general I seldom hear that complaint of Google Scholar. Yes, you get a lot of results but the top few look relevant (precision is good enough).

I am hoping discovery tools will eventually be nearly as good.

There are better more discipline specific tools such as PubMed with better features thesauri etc.

This is probably a very strong argument for advanced sophisticated users particularly in the medical fields. I'm not a medical librarian but it seems to me medical field tends to benefit the least from Web scale discovery tools because they have Pubmed which suffers less (compared to say Social Sciences) from fragmented content across silos with a powerful search interface to boot.

But as a default search for most users and for most topics I doubt anything is much better (except google or google scholar?). Can you think of a tool that works for every situtation and every user? For sure it improves on using Classic catalogue or next generation catalogues as defaults to prevent the very common problem of users typing in article titles.

Moreover the use of database recommender systems like the one in Summon can help guide users to such high quality resources.

Also with more and more research being multidisciplinary, I wonder how good such subject specific indexes are becoming.

I suppose discovery tools are not for every situation and not every case is a nail that you have to hammer, but that doesn't mean a hammer isn't a very useful tool in many situations.

It's unclear what is in the index, or if it is, whether it is full text coverage or just metadata of varying quality

This was mentioned in a paper about librarian's response to Summon at Edith Cowan University Library. They were worried over the coverage of Summon of CINAL, Academic One File etc. This is IMHO to borrow a phrase a very "librarian concern".

While librarians are trained to think about content divided into silos, most users just don't think that way. Their concern is simpler. Did they find relevant papers? They are willing to trade off not covering 100% of some silo (which when you think about it , what's in a database is really arbitrary why worry about it) for being able to cover ultimately more ground in shorter time.

Content is not king, convenience is, see this oclc report 

I think it's the librarian obsession with the certainty that they searched absolutely everything in a given silo (Academic search premier or a specific journal title) , that drives this concern. 

Of course, Google Scholar is notorious for not even giving a clue what's in the index leading to dozens of papers written on this topic trying to quantify this, but to be honest I am thinking does it really help all that much if you know exactly what is and what is not in the index ? Or whether you are searching full-text?

Say you run some nursing search term in Summon. You look at the results and find either a) tons of relevant results or b) mostly non-relevant results

Does knowing Summon covers 80% of CINAL even to the extent of knowing which journals and volumes are not in, really help you in either case? Or does knowing Summon's index covers full-text or just metadata help?

A little I guess, but you will be still faced with the decision to decide whether it is worth redoing the search in CINAL and that is more a question of the amount of time you have and the costs of missing a relevant article.

Of course, known item searches are a different matter, where even searching 99% is not enough, but for topical searches, I'm not sure if knowing coverage levels is useful to a typical user or even librarian for that matter when it comes to the context of finding relevant articles (for other decisions like choosing between discovery tools this data can of course be very relevant).

Or am I misunderstanding this objection?

Encourages "quick and dirty searches", leading to lazy poor searching techniques

Still thinking about this one. I can see their point. I still teach proper search construction techniques using complicated strings of Boolean operators (synoymns connected by OR operator and grouped by parenthesis), but to be honest, I seldom do so when searching Google Scholar, unless I need to do a comprehensive search.

But am I really poorer off for it? But I just don't do complicated strings of Boolean often because it is seldom necessary if all you want is a few relevant articles .

I also generally don't use advanced thearsui features, e.g to "explode" results.  Is that bad?

Note: techniques like pearl growing (forward citation, backward citation, "sideways" to find related articles) *are* useful and I use them all the time to find related articles, I don't see how use of Google scholar or discovery tools will prevent development of such skills, though admittedly some advanced databases make this easier with cited data (e.g Scopus) and generally have superior indexing.

An allied concern is that discovery tools might mislead users into thinking it covers *all* of our resources and certainly resources not indexed in discovery tools might become under-utilised. It is also speculated that such tools will lead users to have an expectation that everything is online. That is certainly valid concern.

Google Scholar is perfectly capable and outperforms them all.

I'm quite sympathetic to this point. Many libraries outright recommend Google Scholar, partnering with them via Google library links. To be perfectly frank Summon and other discovery tools was designed to be a Google and more to the point a Google Scholar killer.

But are such tools really better then Google scholar? Currently I would say no. Google Scholar is by far broader, though with a lot of material that are grey literature type and the relevancy algorithm is superior.

It's funny to say this but Summon & discovery tools are in fact easier to use then Google scholar (since when can you say a library product is easier to use!) as users tend to be frustrated by the task of determining if full text of the article they want is available via Google Scholar.

For a skilled searcher who knows his way around the library system to find what he needs and is capable of evaluating sources, probably this wouldn't be as big a factor but for most others discovery unified indexes would be easier.


I don't mean to minimize the concerns of librarians and as it stands now, discovery tools are certainly nowhere near perfect, and being so new they have many technical issues that need fixing.  Aside from that a skilled searcher using more advanced tools can almost always get more and do a more through search.

But discovery tools are not necessarily for them (or us), they are for the least knowledgeable, most helpless of our users. 

They are for the ones who search our OPACs by article title name and find nothing, for the ones who search google and settle for wikipedia articles. Discovery tools help the ones who  who use JSTOR for every topical search because it is what they were recommended in their very first course and the ones who don't come to the reference desk to ask but rely on peers for advise.

Can they do better then using a discovery tool? Perhaps. Will they? Doubtful.

For these group of users (who may be very large), it's hard to believe discovery tools will hurt them anymore than what they are doing already. For all the others, we as librarians have influence over them anyway (or they are ones who naturally work hard to learn the tools they use) and can teach them the limitations of these tools and point out the "proper way" to do things if they so desire.

Am I dismissing the concerns of some librarians too easily? Perhaps, I suspect my own information searching skills are fairly undeveloped for a librarian, so I might easily be failing to appreciate the nuances here.

What's your take on such tools? What impact do you think they will have? What are the potential pitfalls? How do you complete the following sentence .... "Use Summon (or EDS or Worldcat local or Primo Central or...) as a first search unless ______________________ .

Going to ALA Annual 2011 - Thoughts & questions of a newbie

I'll being attending ALA Annual 2011 at New Orleans in June thanks to support from my employers. Amazing to think that I will be attending the largest library conference in the world, when my prior experience has being attending relatively small local library conferences/seminars in Singapore.  Beyond that the chance to meet international peers who I have being corresponding online is something that is not going to come by easily.... Going to be a great learning experience for sure!

I notice Joeyanne, a academic librarian from the UK is in the same boat as me, a international librarian attending our first ALA conference,  she blogged about it here. I hope to run into her at ALA 2011 Newbie & Veteran Tweet-Up

Just looking at the ALA annual website is overwhelming which probably explains why after booking conference tickets, rooms and airfare, I promptly stopped thinking about it based on the excuse that most of the programmes weren't finalised yet.  (Side note, I wonder if this is how our new users feel when they visit our library websites which is chock full of information)

But #ALA11 is less than 3 weeks away and I have to start planning and I finally settled down to do it on Friday.

Struggling with Google calender, travel apps

I'm not much of a traveler, so it was interesting  to try some new stuff out for international travel.

First thing I did after booking my flight was to try to put my flight details into Google calender. I managed to automatically import it in, but with timezone differences, I was very wary of mistakes. On Friday I eventually realized you could enter different time zones for start and end times.

Click on timezone when adding events in Google calender so you can add flight details 

I also took the opportunity to try many travel apps like Tripit, TripDeck, FlightStatus, GateGuru which was supposed to make travel easier by capturing flight and accommodation details (often from emails or google calenders).

It reminds to be seen how useful such apps will be. (Probably bringing along my iPad 2)

Tripit app showing travel details

I also  tried services like Planely which allow users to connect with fellow travellers by Flight number, but I seriously doubt I'm going to use this.

Picking events with ALA Online Scheduler 

The ALA Online Scheduler is an amazing piece of work, I used the recommender which was pretty good at figuring out which events I would be interested in (based on your ALA profile). I went crazy selecting the events, so many interesting events so little time.... There doesn't seem to be an easy way to share your schedule though? No mobile app?

ALA scheduler, mine is way too full, currently just listing candidates 

Feeling very uncertain

As a novice to library conferences, attending his first international conference (which happens to be the biggest!), I'm obviously very uncertain about a million things. Leaving aside the fact I'm from Singapore and the social cues, practices might be different (e.g tipping ), library specific issues are huge. The gang at LSW have being very helpful though.

  • How much should I plan? Should I plan every move with military precision, or should I go with the flow e.g follow people, or be prepared to jump from conference room to conference room when it gets boring? I'm guessing I shouldn't be too ambitious.

  • Not quite sure about the differences between "Presentation/Session Tracked Programs" and "Discussion/Interest group".  Wonder if vendor sponsored lunches are worth it.

  • Think I probably want to go to vendor exhibits and have a look at the posters. Wonder what's the best time for the vendor exhibits? I have some free time after attending the Movers & Shakers celebratory luncheon on Friday which ends at 3pm so was thinking of visiting the vendor exhibits then but I notice it says Friday, June 24, 2011 05:30 PM - Monday, June 27, 2011 04:00 PM . Er but what times do the exhibits close? Seems an odd time for the exhibit to start at 530 pm?

  • There should be free Wi-Fi at most of the sites right? The hotel I am staying says there is, but I got to check whether it comes with the room free.

Many more questions but Oh well,  I need to balance between overthinking it, and pretty much just winging it...

Events I will be attending

I will arrive on Wednesday night, but have no plans for that day and the whole of Thursday for now as I will not be attending any of the pre-conferences. Maybe attend some tours? Anyone free to meet? 

It's probably pointless to list every event I intend to attend, since I will probably change my mind or even might get lost! But the one's I  probably will attend will be the following

1. Mover & Shaker celebratory luncheon - Friday June 25, 2001- 12:00 pm - 3:00pm.  Broussard's Restaurant
(Sadly this means I am going to miss the International Librarians Orientation which I dearly need, though I understand why they placed it at that time slot, since few LJ Mover & Shakers are international librarians)

2. ALA 2011 Newbie & Veteran Tweet-Up Saturday, June 25, 2011 - 7:00pm - 9:00pmBar Uncommon, 817 Common Street. May attend ALA Facebook After Hours Social later.

3. International Librarians Reception - Monday, June 27, 2011 - 6:00pm - 8:00pm, Generation Hall, 310 Andrew Higgins Drive. Main problem here is I also want to watch Battledecks 2011 which starts from 530 pm - 7 pm, maybe I can be a bit late? :)

4. I will be probably plan my schedule around these, as for the rest, I have being asked to attend the LibQual+ ones and my own interests tend to mobile, social and discovery tools. So will probably be at The Ultimate Debate: “Library Web Scale Discovery Services: Paradigm Shift or More of the Same? , 2011 PR Forum - Going Mobile @ your library: How Libraries Can Serve Mobile Phone Users  , Top Technology Trends etc

Any suggestions? Any sessions by good speakers I should definitely go for? 


All in all, I think I should try not to be too ambitious and avoid packing too many things in. I will try not to expect too much, just go and enjoy the ride! It is after all only my 1st International conference hopefully with more to come. :)

Any tips, advise from Jetsetting librarians, ALA veterans etc would be very welcome....

