Sunday, December 7, 2014

Four possible Web Scale Discovery Future Scenarios

I was recently invited to give a talk at the Swedish Summon User Group meeting and I presented about possible scenarios for the future of web scale discovery.

As Web Scale Discovery in libraries goes back to roughly 2009, most academic libraries have by now had 2-3 years of grappling with the concept. Using the hype cycle concept many if not most have moved past the pains of trough of disillusionment and are moving up the slope of enlightenment or even at the plateau of Productivity as pointed out by Dave Pattern.

For example, mobody today here believes that web scale discovery in its current form can replace subject databases totally, an idea that was mooted during the peak of inflated exceptions.


http://en.wikipedia.org/wiki/Hype_cycle#mediaviewer/File:Gartner_Hype_Cycle.svg


Still one wonders, what lies in store for web scale discovery in academic libraries? How would things looks like in 5 , 10 or even 15 years time?


How well is web scale discovery doing?

To forecast possible futures, we need to know where we are now. Generally I think many of us (where us = the odd mix of technologists including ILP librarians who have worked on implementing discovery services though we are of course biased)  think it is doing decently well based on any one of the criteria

a) Usage of web scale discovery is increasing year on year typically for most institutions (eg See University of Michigan's experience)

b) Statistics on Openurl origins show a substantial share from Discovery services (varies by institution but typical share would be 40% at least for most destinations)

c) Studies are appearing that show when controlling for various factors, institutions that have web scale discovery systems Summon or Primo , do seem to increase usage per FTE versus institutions that do not have any such systems.





This bright picture seems to contradict the sense we get (via surveys etc) that increasingly more users are favoring Google, Google Scholar, Medneley etc and that the University Library is slowly having a declining role as a Gateway for discovery.


 Ithaka S+R US Faculty Survey 2012 above shows a slight improvement in 2012, though the trend is generally down.

Lorcan Demseley notes,“They have observed a declining role for the "gateway" function of the library, partly because it is at the wrong level or scale. Google and other network level services have become important gateways.”

So what is happening here? Is there a shift to other non-library sources of discovery? Or isn't there?


Lots of unknowns

One way to reconcile this contradiction is to note, undergraduates are by far the largest group typically, and we know undergraduates love web scale discovery while other groups particularly faculty do not, so it is not a surprise to note usage goes up when web scale discovery is initially implemented while noting most sophisticated users could be favoring something else.

A typical year on year increase in web scale discovery is good news. Still I suspect part of it is inertia, my experience is it takes a while for any major change to filter down to users. So if you implemented web scale discovery in say 2012. It is likely you will continue to see gains until 2015 (assuming a 4 year course), when the freshman who were the first batch to start using web scale discovery become final year students. As prior to that, the more senior students will likely to ignore it.

It is somewhat surprising to me even today that many students are surprised to know our default search tab handles articles as well as books though part of it could be a UX issue.

It's also unclear to me, what a say 10% increase usage of your discovery system year on year really means since we don't know how much the "demand for discovery" went up. So say between 2014 and 2015, required "discovery incidents" that one could conceivably use your discovery went up say 100% more but your usage of web scale discovery only went up 10%. Doesn't sound so good right?

Lastly, web scale discovery systems also work beautifully as known item article search tools by simply cutting and pasting articles, so the increased usage seen (a), or frequent use as openurl orgins (b) or even increased usage compared to peer control institution per FTEs could simply be due to increased ease in finding known articles as opposed to really helping in discovery.

In short, there is a lot of uncertainty here. It's not easy for libraries to know if Web Scale discovery has helped shift the balance of discovery from outside the library back to the library in both the short and long term.

I suspect the answer is probably not, indirect evidence I have on the use of the proxy bookmarklet (which is usage primarily when user does not start from library homepage) and link resolver usage of Google Scholar of my institution seem to suggest even though the implementation of Summon was successful, usage of both tools continued to rise. In fact in the following year after implementation of Summon, usage rose even higher than before. If this trends continues.....



Four Discovery Scenarios

In the long run, I suspect the ultimate fate of web scale discovery will fall into one of these four broad category


Discovery Dominant - Web Scale Discovery continues to grow and become the prime source of discovery displacing google, google scholar and other external sources (Unlikely)

II Discovery Deferred - Web Scale Discovery continues to be used along side other non-library tools. Most often it will be used as a secondary source after looking at other places first (Possible)

III Discovery Diminished - In this scenario, Web scale discovery services have been displaced in their discovery role, and are used for known item search only. Kinda like a glorified catalogue, except it includes article, conference etc titles (Perhaps)

IV Discovery Decommissioned - This is the most extreme scenario, where the whole system is removed and doesn't even play a role in known item searching. (Unlikely)



Discovery use matrix

After reading all the various arguments about the position of search in libraries, I was initially confused. But let's try to create a framework here.

Let's consider 2 dimensions of use here.

Firstly, are users using the web scale discovery for known item search or for discovery?

Secondly, it is their primary go-to tool? Or is it secondary?

Below shows one hypothetical use of one library discovery search.


Known item Search Subject Search
First Stop 50% 10%
Secondary Stop 30% 40%
Total 80% 50%

We could split this up further into content types, say search of books vs articles but let's keep it simple.

In this hypothetical example , say users have a "discovery need" 100,000 times a year.  (Look at the 2nd column)

10% of the time, they go straight to your discovery service and starting typing keywords.

40% of the time, they do some preliminary search elsewhere eg Wikipedia, Google, Google Scholar, but they do eventually end up doing subject searches in library discovery for whatever reason.

50% of the time, they totally ignore our tools and use something else such as Google Scholar, Mendeley to search.

Note : If one cares about library supplied tools, then one would need to take into account subject Abstract and indexining databases provided by the library such as Scopus, but I will ignore this for now.

Similarly say users have a need to find a known item 100,000 times a year (Look at the 1st column)

50% of the time, they go straight to your discovery service and starting searching for known item by entering article or book titles.

30% of the time, they do some preliminary search elsewhere eg Wikipedia, Google, Google Scholar, but they do eventually end up doing known item search in library discovery.

20% of the time, they totally ignore your discovery service for known item search. It could be they found it using another tool which could be either library related such as traditional catalogue, Browse A-Z list, or non-library related Google, Google Scholar, or it could be they use link resolvers from Google Scholar, Libx, or they just give up.


Discovery is not just keyword based search

One thing to note is that when there is a desire for discovery, search based tools like web scale discovery or Google Scholar are not the only options.

There are recommender sources or systems (both humans and machines) as well as discovery based on citation based methods. In this case, I am assuming discovery here is keyword based discovery. In other words if there is a need to enter keywords, how many times do people use the library discovery service vs other systems.

It possible to envision a future where a powerful Google Now type recommender system becomes so dominant, keyword based discovery becomes obsolete but let's put aside this possibility.



Scenario I : Discovery Dominant 


Known item Search Subject Search
First Stop Variable High
Secondary Stop Variable Moderate
Total Variable Very high

This would be the ideal scenario. Our discovery tools become the dominant tool for discovery as first stop.

How popular such tools for known item search would be less critical, though I suspect it's easier to optimize  for one rather than both,


This scenario unfortunately I think is unlikely. It implies at least one of the following

a) All other non-library discovery sources dry up.

For example, Google giving up on Google Scholar would be a good example.

While Google Scholar is doing well now after 10 years, this has always been a possibility. Still, is hard to believe this will happen for all the competitors.

b) We created tools so compelling that it makes all other sources pale in comparison.

What would such compelling differences be?

I suspect the following would NOT be them

a) Gesture/Motion based inputs - eg Kinect/ Leapfrog

b) Augmented reality outputs - eg Oculus

While such features might eventually be included in future discovery tools, they don't give such tools any competitive advantages  They would be the equivalent of mobile responsive design features for example.

The following might give us a fighting chance

Personally tuned discovery layers - Championed by Ken Varnum - see his two recent presentations Personally-Tuned Discovery and Library Discovery: From Ponds to Oceans to Streams . The idea as I understand it is that libraries can create specially tuned features , particularly scopes that appeal specifically to their communities. So I suppose in the context of my institution we can create filters and features that are specially designed to work well with research on southeast asia. A more generic global system is unlikely to be able to tune the system to such an extent.

Improved semantic search - No doubt Google etc are working on this. However libraries particularly publishers like Ebsco have tons of expert assigned subject headings in specialised theasuri. Would a cross walked "mega theasuri" be leveraged to improve relevancy? Do note , I have practically no idea how linked data will come into this.


Scenario II : Discovery Deferred


Known item Search Subject Search
First Stop Low to Moderate Low
Secondary Stop Moderate to High Moderate
Total Moderate Moderate

This scenario is the scenario I think that is closest to the current situation. Our discovery tools are seldom the first stop in the discovery process. But many users do use it in combination with Google , Google Scholar etc. So overall use is moderate.

Use for known item searching is generally low to moderate. While tools like library links in Google Scholar , use of link resolvers in mendeley and other systems means that users can use the link resolver to check for availability after discovery direct, there is still sufficient numbers of users who do put in article titles or book titles in discovery services.

One can encourage use of discovery service has a secondary discovery source by various means.See for example 6 ways to use Web Scale Discovery tools without visiting library sites which provides some ideas, chief among them is to use Libx.  If you have Libx with Summon you can even pull off something interesting with the API.




For driving subject searches/discovery from Wikipedia, I've blogged about John Mark Ockerbloom's Forward to Libraries service that takes the title of the Wikipedia article you are on (among other things) and does the same search in your discovery service.









Scenario III : Discovery Diminished


Known item Search Subject Search
First Stop Moderate for books Extremely Low
Secondary Stop Moderate for books Extremely Low
Total Moderate for books Extremely Low


This is basically the situation prior to web scale discovery in 2009. Federated search wasn't successful , most users went to the library catalogue to do known item search for books and to a lesser extent search for books but not articles.

So with Web Scale Discovery we are happily over with this scenario and are in at least Scenario II, where some discovery at least happens in our tools right?

Is there any chance we should fall back to Scenario III?

I think Utrecht University believes so and was one of the first if not the first to talk about giving up discovery to focus on delivery or fulfillment.






I've blogged about this in the past. Essentially the idea here is that Google etc have won the discovery battle already and there is no point trying to compete with them.

Libraries should focus on supporting fulfillment. In other words Discovery occurs elsewhere say on Google or Google Scholar and we provide the way to check/obtain full text.

Google Scholar, of course has the well known library links program.



A newer idea would be collaboration that Worldcat is working on with Wikipedia that allows linking from references in Wikipedia to full text or library catalogues.

 http://en.wikipedia.org/wiki/Wikipedia:OCLC/Search


The title of the talk  is A library without a catalogue, though interestingly while the speaker has decided they won't bother to implement a web scale discovery system and will give up their federated search, they are not sure if they can give up the the normal traditional catalogue (stated at the end).

I suspect that while one can give up web scale discovery, giving up a catalogue for known item search is harder. Sure, one can contribute holdings to Worldcat, which will allow the holdings to be shown from various places including Google books but can this truly work for everything?



I somehow doubt it would be likely any library wouldn't have a search facility for known item search- or a comprehensive search of what is available to the institution since it wouldn't be a lot more work to do so especially after the effort spent on populating the ILS or Knowledgebase.

Under this scenario, libraries would maintain something similar to classic catalogues that are optimised for known item searching of books, DVDs etc. The difference is they may also include an article index but unlike Discovery services available between 2007-2014 they give up the pretense of serving discovery at all.

Some libraries may totally dispense with this if most major outside discovery tools have good linkages with link resolvers, catalogue apis etc. But as I said this is unlikely.

By focusing on known item search of books and articles, the relevancy issue would be much easier solved then trying to balance discovery and known item search needs. A bento type box search might even make more sense.



Discovery Decomissioned

Known item Search Subject Search
First Stop Low to zero Zero
Secondary Stop Low to zero Zero
Total Low to zero Zero

This is the most unlikely scenario. In this scenario, use of library discovery tools, for both known item and subject search is utterly destroyed!

This most unlikely Scenario was mentioned in The day library discovery died - 2035 . In this scenario, open access has won out completely, with open access been the norm in both books or articles.

How academic libraries may change when Open Access becomes the norm details the implications.

After summarizing the argument in the above scenario about losing the war in discovery and focusing on delivery, I proposed that the rise of open access

"has the potential to disrupt even the delivery or fulfilment role. In a open access world when most articles or perhaps even books (open access models for books exist, as well "as all you can eat" subscription services like Scribd, Oyster, Amazon Prime) can be gotten for free, academic libraries' role in both discovery and fulfillment will be greatly diminished."

In such a world, libraries would no longer need to maintain long lists of holdings for both books/traditional catalogue items as well as article index. Libraries don't even

As everything or nearly everything is open access, discovery and delivery would be coupled together. Where you do your discovery is where you get delivered the articles or books. There would still be some portion that would be unavailable immediately (special collections, older books , articles not digitized) but in time they would be reduced.

Even in such a world, some argument there may be a role for libraries in terms of aiding discovery by providing better curated collections, metadata etc - based the personal tuned discovery services argument above. It is of course unclear if that will be enough.


Conclusion

This is an extremely speculative piece of course but had to get it off my chest.

If you found it thought provoking, or at least entertaining do comment or share.





blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...