Thursday, July 16, 2015

5 things Google Scholar does better than your library discovery service

I have had experience implementing Summon in my previous institution and currently have some experience with EDS and Primo (Primo Central).

The main thing that struck me is that while they have differences (eg. Default Primo interface is extremely customizable though requires lots of work to get it into shape, while Summon is pretty much excellent UI wise out of the box but less customizable,  EDS is basically Summon but with tons of features already included in the UI), they pretty much have the same strengths and weaknesses via Google Scholar.

So far, my experience with faculty here in my new institution is similar to that from my former's, more and more of them are shifting towards Google Scholar and even Google.

Though Web scale discovery is our library's current closest attempt at mimicking Google Technology it is still different it is in the differences that Google Scholar shines.

Why is Google Scholar, a daring of faculty?

To anticipate the whole argument, Google Scholar serves one particular use case very well - the need to locate recent articles and to provide a comprehensive search.

While library discovery services are hampered by not just technological issues but also the need to balance support for various use cases including the need to support known item searching for book titles, journal titles and database titles.

It is no surprise a jack of all trades tool comes out behind.

Here are some things Google Scholar does better.

1. Google Scholar updates much quicker

One feedback I tend to get is from faculty asking me why their paper (often hot off the press) wasn't appearing in the discovery service.

In the early days of library discovery service, often the journal title simply wasn't covered in the index, so that was that.

These days more often than not the journal title would be listed as covered in the index particularly if it was a well known mainstream journal. So why wasn't the particular article in the discovery service?

Unfortunately, typically I would discover the issue lies with the recency of the article. The article was so new it didn't appear in the discovery service index yet.

Yet I would notice time and time again for example whenever an article appeared on say Springer, within a day or two it would appear in Google Scholar while it would take over a month if that to appear in our discovery service index.

Google Scholar simply updates very quickly using it's crawlers compared to library discovery services which may use other slower methods to update.

Also I have found library discovery services may often not index "early access/edition" versions, while Google Scholar, whose harvesters seem to happily grab anything on the allowed publisher domain have less issues.

The discovery service providers might argue, Google Scholar tends to employ almost zero human oversight and quality control and that as such they provide less accurate results.

This may be so, but it's unclear if the trade-off is worth it, in today's fast paced world where anxious faculty just want to see the article with their name appear.

2. Covers scholarly material not on usual "Scholarly" sources   

Besides speed of updates, Google Scholar shines in identifying and linking to Scholar material even if they are not found in the usual publisher domains.

Take the experience back in 2014 of a Library Director who was trying to access a hot new paper on "Evaluating big deal journal bundles".

The library director was smart enough to know it wouldn't appear in the discovery service and so did an ILL for the article and it turns out she could have just used Google Scholar to find a free PDF that the author linked off his homepage.

Here we see the great ability of  Google Scholar's harvester to spot "Scholarly" papers (famously with some false positives), even if it resides on non-traditional sites. For instance it can link to pdfs that authors have linked off their personal homepages (which may or may not be university domains).

This is something none of our library discovery services even attempt to do. In general our discovery services build their index at a higher level of aggregation, typically at journal level or database level, so there is no way it would spot individual papers sitting on some unusual domain.

3. Greater and more reliable coverage of Open Access and free sources

It's a irony that I find discovery services generally have much poorer coverage of Open Access than Google Scholar.

Let's not even start with Hybrid journals which are often articles in top journals yet impossible to correctly identify and find in discovery services (I notice the example tested in the article on the difficulty of finding hybrid articles works for Google Scholar)

How about Gold Journals. Most discovery services have indexed DOAJ (Directory of Open access Journal), but many libraries experience so bad linking experience (linking may not be at article level and/or lead to broken links), they just turn it off. (Discovery indexes that cover OAIster might have better luck?)

How about institutional repositories? Something created and managed by Libraries? On most discovery services, you typically can add only contents of your own institutional repository and you have a very limited selection of other institution repositories (always on the same discovery service) you can add

Usually you can add only the libraries that have volunteered to open their institutional repositories to other customers on the same discovery service and this is a very short list (probably a dozen or so).

The list is even shorter when you realise some of these institutions are not wholly full text and the discovery service makes it difficult to offer only full text items from these Institutional repositories when you activate them, so you are eventually forced to turn them off.

I am not well versed enough with institutional repositories and OAI-PMH to understand why there is so much difficulty to figure out which items listed in them are full text or not, but all I can say is Google Scholar's harvesters have no such issues identifying free full text and making it available. I would add some of it is not quite legal (eg look at the pdfs in, researchgate etc surfacing in Google Scholar).

Reason #2 and #3 above is the main reason why Google Scholar is by far the most efficient way to find free full text and why apps like Google Scholar Chrome button and Lazy Scholar are so useful.

4. Better Relevancy due to technology and the need to just support article searching

Going through the few head to head comparisons between Google Scholar and discovery services in the literature (refer to the excellent - Discovery Tools, a Bibliography), it's hard to say which one is superior in terms of relevancy, though Google Scholar does come up on top a few times.

My own personal experience is Google Scholar does indeed have some "secret source" that makes it do better ranking. There are many reasons to suspect it is better from the fact it can personalize, uses many more signals (particularly the network of links and link text) and just sheer technical know-how that made it the world's premier Search company.

A somewhat lesser often expressed reason why Google Scholar seems to do so well is that unlike library discovery services, Google Scholar is designed for one primary use case - to allow users to find primarily journal literature.

A library discovery services on the hand according to Exlibris has 5 possible cases

I would argue library discovery services are handicapped because they need to handle at the very least "Access to known book or journal" + "Find materials for a course assignment" + "Locate latest articles in the field".

Trying to balance all these cases simultaneously (which includes ranking totally different material types such as Books/articles/DVDs/Microforms etc) results in a relevancy ranking that can be mediocre compared to one that is optimised just for finding relevant journal articles aka Google Scholar.

During the early days of library web scale discovery, libraries and discovery service vendors learnt a costly lesson that despite the name "Discovery", a large proportion of searches (I see around 50% in most studies) was for known items. This included known items of book titles, journal titles and database titles.

Not catering for such users would typically lead to great unhappiness, so you started seeing many discovery service vendors working on their relevancy to support known item searching and adding features like featured boxes, recommenders to help with this.

All this meant that library web scale discovery services would always be a disadvantage compared to Google Scholar which focused on one main goal , discovery of articles as nobody goes to Google Scholar to look for known book titles, journal titles or database titles.

They do go to Google Scholar for known article title searches but "ranking" of such queries is easy given how unique and long the titles tend to be. In any case, doing well for article known item search is less a matter of ranking and more a matter of ensuring the article needed is in the index and as we have seen above Google Scholar is superior in terms of coverage due to broader sources and faster updates.

5. Nice consistent features

Google Scholar has a small but nice set of features. It has a "related articles" function, you won't find in most web scale discovery services unless you subscribe to BX recommender.

Many users like the "Cited by" function. Your library discovery service doesn't come with that natively, though mutual customers of Scopus or Web of Science can get citation networks from those two databases.

Because Google Scholar creates their own citation network, they can not only rank better but also provide the very popular Google Scholar Citations service. Preliminary results from this survey, seems to indicate Google Scholar citations profile are popular then on, Researchgate etc.

But more important than all this is the fact that it is worth while to invest in mastering Google Scholar. All major academic libraries will support Google Scholar via library links/open url, so you can carry this with you no matter which institution you are at.

On the other hand, if you invest in learning the library discovery service interface at your current institution, there's no guarantee you will have access to the same system at your next institution given that there are four major discovery services on the market (not counting libraries that use discovery service apis to create their own interfaces).

Bonus #6 Google Scholar provides more details in the "Search snippets"

When I search in Google Scholar, I inevitably get results that show the keyword I searched for highlighted in what Google called Search snippets in the description below the title.

For example here's me doing a vanity search for my name "Aaron Tay".

Google Scholar helpfully highlights where my keywords appear in the search snippets/descriptions, allowing me to quickly judge if the result  is relevant or not and whether I want to click in.

I would add, Google Scholar doesn't always do this, and sometimes will show snippets or descriptions where my keywords don't appear. Still, by comparison, our library discovery services show matches of keywords in snippets a lot less.

 Here for example is a result from Summon. (Primo is similar)

The result from Ebsco Discovery Service is interesting

It highlights the keyword which appears in the subject heading!

When you click into the details page, you notice EDS not also has subject headings but also a nice abstract, giving context.

As an aside the fact that EDS has subject headings and abstracts while Summon/Primo does not is one of the same boasting points of EDS.

Still, maybe it's just me, but I much prefer if possible the discovery system display and highlight the keyword I am searching in context of the full text and/or abstract as the description rather than just the subject heading or author (both would be fine).

I can go further in depth on why I suspect Library discovery services tend to fall down in this area (Hint: Google Scholar indexes full text a lot more than our library systems), but from the user point of view we just don't care why.


Does this mean library web scale discovery are useless? Not really.

I would argue that web scale discovery tools are designed to be versatile.

While they may come up second best in the following cases

  • In-depth literature review (both Google Scholar and Subject indexes are superior to web scale discovery in different ways)
  • Known item search for books/journal titles/database titles (Catalogues and A-Z journals and database lists are superior)

There are no other tools that can be "pretty good" in all these tasks, hence their popularity with undergraduates who want a all-in-one tool.

Can we solve this issue of being jack of all trades but master of none?

One interesting idea I have heard and read about in various conferences including Ebscohost's webinars was the idea of a popup appearing after entering the keyword and clicking search, asking the user whether he was trying to find a known item or a subject search or any other scenarios and based on the answer the search would execute differently.

Somehow though I suspect it might get annoying fast.

