Wednesday, April 18, 2012

How a "Facebook for researchers" platform will disrupt almost everything

I recently attended a talk about Mendeley institutional version (powered by Swets) , I am fairly familiar with Mendeley , Zotero and other reference managers (though my main usage is with EndNote) but have not looked at the institutional version yet.

You can read about the exact features of the service   and also here but more importantly, during the talk while looking at the features I finally grasped how powerful and disruptive a real and dominant "Facebook for researchers" is going to be.

Of course, the road to such a goal has being strewn with many failures, including Elsevier's 2collab , Labmeeting etc (check a report in 2008 of such tools and check how many still stands) and attempts have being or could be made from social bookmarking/reference management angle (e.g citeulike/Connotea/Mendeley),  Discovery/Search angle (potentially webscale discovery/next generation catalogues with social features) or  even more directly straight forward Identity management (e.g. ResearcherID).

But no matter who wins how would a dominant "Facebook for researchers" platform affect academic research and hence academic libraries? What areas would they disrupt?

Note: I am going to mostly use the Mendeley Institutional edition as a stand-in for this dominant hypothetical  "facebook for researcher" platform. I actually haven't use the institutional service beyond looking at brochures. I am not saying that Mendeley will eventually succeed either.


Disrupt search including webscale discovery tools

There is a reason why Google is so worried about Facebook coming after them in search and desperately trying to force people into their own version of Facebook. Simply put the more the system knows about you the better recommendations you can get and potentially much better search results.

In the academic/research world, the advantages are perhaps less but still considerable.

Mendeley , Citeulike etc are already starting to show hints of this, when you search you can see how many people put a certain article in their reference libraries, that itself could be a strong signal of quality. Think of it like having articles ranking by Times cited, except you don't have to wait for a year or so for the paper to be cited. You don't necessary cited everything that is in your reference library of course but studies are starting show there is strong correlation between these two measures.

And that's just the beginning, one could imagine Mendeley or similar tools, allowing you to restrict searches to take into account only people in your institution, your specific groups, your friends etc, do collaborative filtering techniques for recommendations based on researcher profile characteristics (see Mendeley's version) and more - ie "researchers like you have read this"

Currently Mendeley claims to have 150 million unique items (Jan 2012) when you search Mendeley , "This makes it, according to Victor Henning, the company’s CEO and co-founder, the world’s largest research database."

Depending on how one defines research database, this is probably false. Web Scale Discovery systems like Serial Solutions Summon, OCLC's Worldcat local etc have more items. Currently Summon for example has 249 million items , Worldcat local has 663 million articles , totaling 943 million items etc.

Still, it's clear Mendeley is catching up, and I could be wrong but they probably have partnerships pulling in metadata with publishers , as I doubt crowd sourcing alone is unlikely to get so much so quickly? In fact, crowd sourcing would be a distinct advantage since one could find items like data sets, reports inside that would not typically be found in a traditionally discovery product.

Currently Mendeley gets you to full-text using OpenURL, very similar to Summon and provides an option to upload your library holdings. While I am not sure what you uploading your library holdings does currently , I would guess it wouldn't be impossible to use that to eventually allow "search within your subscriptions" options or at least use it to show the openurl button only when an item exists like in Google scholar now.

I suppose though it will never completely replace your ILS, as I doubt such a platform will want to take that function (though who knows?) but perhaps discovery layers that sit atop ILS might be disrupted.

Disrupt unique author indentifer rivals

I don't know much about this area but I know there is probably no leading contender in this area yet.

I know of attempts like ResearcherID by Thompson-Reuters, Elesvier has a Scopus Author ID and there's an attempt at a standard with ORCID

But just as Facebook Connect is pretty much making OpenID irrelevant , could a Facebook for researchers platform make efforts like ORCID irrelevant?

Mendeley provides a researcher profile and if it becomes as dominant for researchers as Facebook is for common networking it would be the one ID to rule them all.

Mendeley Institutional Version also claims to allow you to "track your members publications", "view the reach of your publications" etc


Provide better analytics

Imagine being able to see what papers , articles, or entries your researchers are downloading and putting in their libraries. You might think so what? We have usage stats downloads (Counter Stats or not) , so we already know what is used.

Not quite. What about items that are Open Access and researchers download directly say via Google scholar? What about items they find from searching Google etc that are not traditionally in databases you track? But perhaps you don't care about those. But what about items you don't own and they never get around to do document delivery because  they get it via other methods?




One can imagine the degree of tracking available with signed in users would be considerable and one could get in theory all sorts of user behavior during the research process.

One wonders with the collaboration with Swets whether this will eventually lead to linkage to backend systems but that's a long way off.



Replacing your library website

Everyone knows about the finding that practically zero percent of library users start from the library website and I have written wondering whether if this is the case how much effort we should spend on it, versus trying to reach users outside the portal , but assuming this "facebook for researchers" takesoff, it is likely going to be as sticky as the real facebook and the amount of time spent there while doing research is likely going to be very high.

Add the fact that it is like going to have a superior search experience (see above), it will become the first stop for research (perhaps even giving Google , Google Scholar a run for its money, the CEO already claims people are using Mendeley to search instead of Google Scholar), further displacing library portals.

No big deal right? Users weren't coming to us after all right? If given this is the case, should libraries try to put our offerings and services into this platform?

Mendeley Institutional version is starting towards that direction, with the ability to upload A-z list (to allow direct linking of eresources), "Have teachers set up course packs to direct students to important content" (presumably this is just a link to eresources the library subscribe to only not scanning of hardcopy material??) etc.

What else would a user really need from the library website if he can search for articles from the platform and get access to full text vis his library's subscription?

Not much really, perhaps he might want to find a way to contact librarians to ask questions on research or policy issues? Or perhaps the library would like to "push" important news and events to users? I suspect the latter is more of a want by the library than of researchers though :)


Targetted marketing 

So say you want to market something on this Facebook for researchers platform.

I suppose a liaison librarian could create research groups in mendeley and invite all researchers into the group to communicate with them (equalvant of Facebook pages/groups), or link up using the librarian's personal mendeley accounts (equalvant of friending people on facebook with your personal account), but are there other ways to reach them?

Well... if this was actually Facebook, you could buy an ad :)

In Why Google Is Terrified Of Facebook , there is a nice screenshot showing the amazing amounts of granular targetting one could do.

Check out the image here

Now imagine if libraries could do this. Target specific library news , events of interest to specific people in your university mostly likely to be interested instead of blindly mass emailing everyone in the university, or even in a department. Say you have a speaker on a exotic topic coming....and you could immediately target only researchers who might be interested based on their profiles or better yet based on papers they put in their library. So say it might notice you have plenty of papers by researcher X in your library....

Of course if one had really top notch liaisons who had their pulse on the research of every researcher, their interests one could sort of already do this, but realistically speaking for large universities that would be very hard. Imagine a system where you could automatically maximise the possibilities of reaching the ones most interested.

I envision a system where you would still push news on your normal broadcast channels like blogs , posting on your portal etc, but researchers specially targetted would see events streaming on their platform as they did their work.

Is this going too far I wonder? What about privacy? Are librarians too noble to use such marketing tactics? I don't know. I have heard of libraries experimenting with google adwords and facebook ads to target users, so this isn't quite unheard of...

Google adwords seem to work but facebook ads didn't "They found that Facebook advertising was not effective because that is not where students are spending their time when they’re in research mode" but you won't have that problem for this hypothetical "facebook for researchers" platform. 


Issues

Of course, I am just wildly speculating, there are many major differences between social networking/social media in general and using it for academic research and some of the network effects that work for Facebook will probably be a lot weaker in academic world.

For one thing, it's unclear if researchers want to share what is in their library either with each other or with the library. Aggregated stats is probably okay I suspect but that would reduce the effectiveness of some of the social signals.

It's still an open question if a network for researcher will work best by operating more like Twitter default open model with asymmetric links or like Facebook which is default closed with symmetric links or some combination.

Unlike facebook it is also unclear if there will be one dominant winner.  In social networking sites the network effect would lead to one solution winning out as you want to be on the same network where your friends are.

In the academic world, if most of your "friends" are in the same institution you would by definition be on the same supported platform. Or would the desire to collaborate across institutions online push towards one dominant solution?

Still even without one dominant platform used by all researchers, that platform would still have a lot of power over your institution users as all the eyeballs would be there.....


Libraries positioning

Let's say I am even half right and eventually such a platform will come to dominate the research world (or perhaps just locally on institutional level). What should libraries do?

Firstly one has to recognise that Mendeley and its cousins should not be treated just as another reference manager like Refworks or Endnote, they have far bigger ambitions. To just focus on how it performs as one versus your existing solution while important is not the only or even major point.

In fact their whitepaper makes it very clear of their ambitions , there is ample references to discovery, facebook likes and Twitter trending and pretty much makes similar arguments to this blog post then there is this passage...

"Many researchers have welcomed social media into their workflow, using Twitter, LinkedIn and Facebook to organise groups and share information. However these all-purpose platforms do not always have the unique functionality that researchers need, and involve them stepping out of their workflow to login, post a link or make a contribution"

followed by their determination to be in the digital workflow of all stages of research, pretty clear isn't it that Mendeley wants to be that platform....

Indeed Reference Managers are a very good base to build a crowd sourcing/social platform around because 1) there is value in using them even when used alone so early adopters still benefit 2) It does not require the researcher to do anything extra on top of what they do normally.

Mendeley's strategy seems to be to give away free to personal users to build enough brand awareness and now that there is a sufficient user base so Mendeley isn't a complete stranger to most librarians, they are going after libraries by institutions. This phase seem to have being announced by partnering Swets 

I was a bit puzzled at first by the tag line "Institutional edition powered by Swets", in what sense is it powered by Swets? Particularly when all? of the technology is Mendeley's?




But then i realized Swets was partnered more for the marketing and sales arms which has relationships with libraries that Mendeley lacks.

The fact that Mendeley made this move , is a compliment to the power and influence academic libraries have on users choice of reference managers. While many researchers will end up trying and learning on their own, sizable numbers will be taught by librarians in their honours year or post grad year and might end up using that tool for life , so it makes sense for Mendeley and swets to court libraries.

But I guess from the libraries point of view is, what is in it for us? Cynical as it seems, frankly speaking my opinion currently is that while some reference managers are better than others, the differences isn't really that great to be worth the switching costs.

*paranoid mode on*

If libraries start supporting one platform together we could potentially end up creating a powerful entity that would make the library even more invisible in the research workflow and would tip the balance of power away from us. Once they are dominant will they use their power against us?

* paranoid mode off*

I guess that's the same argument, some librarians make against being on facebook, the fear of giving them even more information and power, but to be fair librarians were hardly the ones who gave Facebook their power...

This is not so for citation managers. Hate to sound cynical but at this stage such services still need us more than we need them I think, so while our bargaining position is strong we should make a stand and not give the store away at least not without quid pro quo.

At the very least, switching will mean, in the end the librarians are the one who will bear the cost of training, handling difficult troubleshooting queries on cite while you write etc, so it's not a small thing.

But what we should ask for in exchange for support, I leave it as an exercise to the reader.


Looks like the parody below about Pubmed to the tune of the movie trailer for "The Social Network" could be redone for Mendeley :)



Notes


1. I am not the first to see how disruptive Mendeley can be disruptive


"Mendeley has often been mentioned as a potential industry disruptor. With its presence as a resource manager, database, search tool, social network and now, thanks to the partnership with Swets, its integration with library holdings and provision of usage analysis to libraries, it’s not hard to see why."

http://www.researchinformation.info/news/news_story.php?news_id=879


2. Again I reiterate while I use Mendeley as an example here, it could be a stand-in for any service that has similar ambitions to be a Facebook for researchers platform.  So Mendeley supporters please don't take it that I am targetting Mendeley.







Friday, April 13, 2012

Different ways of finding a known article - Which is best?

As a fresh graduate from library school with little practical experience, I used to think that known item searches ie finding an article or book when you already knew the title etc was relatively trivial and the difficulty was with the other type of searches subject/topical searches?

(BTW I am well aware that there is quite a bit of disagreement over what actually counts as a known item search  (more academic piece) but for simplicity , I am going to take known item here to mean finding an article if you know the article title at least and perhaps even the whole citation.)

But as time went by, after answering question after question on how to find if the library has a certain known article, I realized known item searches for articles while not as hard as subject searches, it is usually no piece of cake for users either.

It's not that users didn't ask about finding known titles of books, some do, particularly if they got the title wrong or in cases where they were looking for textbooks with common titles and dozens of editions like "Financial Accounting" (failure on identify in FRBR tasks).

Still in general they were dwarfed by users asking if a known article exists either because they were
  • following up from a reference in a paper (online or print) 
  • looking for a paper that cited the one they were reading
  • found it in a indexing and abstract database
  • or a professor/colleague/friend mentioned it. 

Why is it so difficult? Several reasons
  • Users are used to searching by keywords in article titles thanks to web search engines and complete article index that covers everything accessible don't exist
  • Difficulties in maintaining a clean knowledgebase/ source of journal titles means even if the user does a search by journal name he may still get misleading results
  • Increasing amount of Open Access or Free material not in the usual library silos/database/OPACS 

Today libraries support a bewildering list of options for finding known articles from searching
  • OPACS 
  • Next generation catalogues e.g Encore (that do not list article titles)
  • Journal A-Z Listings - e.g Serial Solutions E-Journal PortalEBSCO A-to-Z  
  • Article finder/citiation linkers (OpenURL) - e.g Webbridge/SFX/360link
  • Article index search engines like Google Scholar or the new Web scale/Unified discovery products like Summon or Ebsco Discovery Service or Pubget like services


Classic Library Catalogue - ASU Libraries

Serials Solutions EJ Portal - ASU Libraries

Citation Linker (SFX) - University of York

Summon - University of Queensland

Google Scholar - with Harvard 


Pubget


Which method is the best? In general they divide into 2 main classes, searching by source/journal title first, followed by article versus article title search directly in a search engine that indexes article titles.

Of the two methods, users instinctively do an article title search unless  first trained by a librarian. But we as librarians know that if we want to be sure if an article is available, a source title search approach by searching for journal title is the best method because search engines that index article titles don't cover everythng we own.

Warning : What follows is overly complicated, over-thinking that provides hold little value. Feel free to skip to the comments and post what method you use yourself or use to teach others when they ask you how to find a known article.

A Finding known item using Journal Title first

This is the method that was taught to me when I first joined my current library and the method I used when I was at library school. Still there are 2 options at least if one uses this method.

1. Searching using OPAC/ILS

At my current work place, the official method we teach involve searching by Source Title aka Journal Title in either our classic catalogue Innovative Interfaces' webpac pro or the next generation catalogue Encore.  Both given the same information, a title browse works better for webpac pro by cutting out possibilities compared to a default keyword search.

Then of course hopefully you do see the Journal title, make sure the online version has the right coverage, click on it and then hope that there is an online version with the right coverage or failing that a print copy. Assuming online copy, once you are in right the platform or database, you either browse by issue or just do an article search by article title. (I personally prefer the former which while slower is surer, as a search by article title might fail due to special characters like commas fouling the search, or copy and paste space characters causing problems).

Phew! I am so used to doing this, I can almost do this instinctively but in fact there are many pitfalls, some specific to our system some not. Below are some fairly common one to most systems.
Cons

First the obvious. For this method to get reasonable amounts of accuracy the library has to have a policy of catalogues all journal subscribed even those subscribed in a database or aggregator. While many libraries do, some don't and simply create a MARC record to the database. E.g There are libraries that have a library record for Business Source Premier, but don't catalogue separately (or upload journal titles to the OPAC) each journal in it.

Depending on the size of the collection this can be a huge undertaking. Assuming this is done, there are other issues to do with user error.


1) Difficulty getting the journal title to search for 

This could be due to the fact that the user only has an article title for whatever reason. Of course, one can usually find the source title by googling or using google scholar but for some titles particularly older ones it may actually fail to yield anything. This scenario typically happens when a student is told of some article title (which may be slightly off) mentioned by his supervisor. 

2) Abbreviations of journals

Some citations/references have very obscure abbreviations. That itself may not be a problem depending on the quality of the journal title cataloguing. For many institutions, the cataloguing of abbreviations may not be very good, in our case we tend to recommend users find the full journal name rather than try the abbreviation. Finding out the full name of a journal from an abbreviation may however not be simple matter sometimes.

3) Very generic Journal names 

Journal names like Nature, Science can often lead to dozens of records. Depending on whether the library practices single record approach (combining print and online journals into one record) or seperate records approach or hybrid this can lead to even more confusion and whether the user was smart enough to restrict to journals etc.

4) Inaccurate journal holding or coverage dates

Electronic resource management is a big bugbear for all libraries. This approach presumes that journal holdings are accurate and often it is not with wrong coverage dates . As we will see later some article title first search approaches might actually give access even if the holdings are wrong.

5) Many OPAC systems may not cover free/open access journals unless special pains is taken to upload this. A subset of #4

6) Time consuming

This is the biggest factor of all. While in theory this can be the most exhaustive method to confirm existence of an article in the subscription assuming no problems with #4 and #5 , it can be extremely time consuming.

You need to navigate two different systems, the OPAC first, and once you reach the ejournal/database platform you will need to hunt around for the right way to access the article by either searching or browsing.

Add to the fact that there are so many platforms out there that are constantly changing, even an experienced reference librarian  if sent to some unfamiliar interface may have to spend minutes looking for the right place to browse by issue, figure out where to click to download.

Pros

1) Covers print and online - unlike other approaches this method catches both print and online. If library practices single record approach for print and online this is even a bigger advantage since you can see everything in one view.


Classic Catalogue - Single record approach


2) Depending on workflow for Journals subscriptions, may be the most accurate method

This varies from library to library. For my place of work, definitely this is true. I have no idea if OPAC centric journal collection is still the rule for most, or do libraries focus more on their  Ejournal A-Z lists see below.


2. Searching using A-Z Ejournal lists
I am pretty new to this class of products, though I remember using them back in library school when I relied a lot on my library's  EBSCO A-to-Z  list. Currently I am playing with Serials Solutions A-Z Ejournal portal.


My understanding  is that such lists are generally meant for Ejournals (though it is not unknown for libraries to load up print holdings). Librarians manage holdings or lists of ejournals by selecting default packages or by selecting specific journal titles and if necessary customizing coverage dates. The main thing they don't do as compared to a ILS/OPAC is to load up or create MARC records.

In many ways finding a known item using this method is very similar to using the OPAC as it starts by searching for the journal title.

It has some advantages over searching OPAC in that

1) It covers only journals so you get less irrelevant results from books etc

To be fair, one could always restrict by default to journal collection in OPACs to get around this

2) It allows an easy way to browse by A-Z 

3) May or may not be more accurate journal holdings and coverage holdings, and probably has better journal information (e.g. title, alternative titles, issns, eissns) than inhouse cataloguing of Journals.

Many Journal A-Z lists are backed by strong authority records managed centrally. For example SerialsSolution's products are backed by KnowledgeWorks which is managed centrally, so once you indicate a journal or package is owned by your library, any changes needed to journal names, alternative names, issn, e-issn, merging/splitting etc of journals will all be managed automatically centrally by SerialsSolutions.

With economies of scale that come from mistakes found being corrected for everyone this can lead to a far more accurate journal search by title or issn then any one library can manage.  This makes searches by journal abbreviations etc more likely to work.

While SerialsSolutions can handle authorithy control of journals centrally, one thing they cannot handle is holdings. That is something they cannot do for you and the onus is on you to update yourself when your subscriptions and packages change (particularly if you use lots of customized packages).

Depending on your library workflow, the Journal A-Z listings may have more or less accurate data than the OPAC, depending on where your priotizes lie and where the source comes from. 

Some libraries push data from the Journal A-Z listings to the OPACs, some do the reverse, and yet others keep and update two independent systems.

I always wondered if one could also, maintain ejournal holdings in the A-Z listing like Serials Solutions 360Core, maintain print only MARC records of journals in the OPAC, then combine both in a web scale discovery product like Summon.

It gets even more complicated if you use SFX link resolver with Summon so you need to maintain two knowledgebases on top of the OPAC?  

The disadvantages of A-Z listings are often similiar to using OPAC to search/browse by journal title

including

1) Time consuming and unintutive

2) usually does not include print journals


B. Finding known item using Article Title

The main problem with searching by journal title is that's it's so indirect and extremely slow. In essence one must do the FRBR's 4 user tasks of Find-Select-Identify-obtain  almost TWICE, once for the journal title , then again for the article title, which explains why it is slow. 

What if we could just enter the article title, click on the result, authenticate  and get access?  

If I was writing this prior to 2007, I would probably talk about how one can use federated search for this. But in fact if I wouldn't bother back then as in fact federated search would actually be a non-starter since most library federated search systems did not provide enough coverage to make this method worth trying and would be too slow anyway.

Of course now we have Google Scholar and Web scale discovery products like Summon that cover typically 90% of most academic libraries collections in a unified article index so it's worth a shot to see if it might be worth doing an article title search.


3. Searching by article title using Web scale discovery products

I am most familiar with Summon but Ebsco Discovery Service and others are pretty much similar. You enter the article title. With any luck you see the article you want. You click on it, and it brings you to the full-text via OpenURL linking or direct linking (via some sort of agreement with the provider).



Ebsco Discovery Service, Nanyang Technological University Library


Pros

1) Fast quick and efficient - If it works it gives you the experience akin to google, though you may have to go pass a link resolver page and of course authentication. 

2) No need to figure out journal title name, abbreviations etc


Cons

1) Problems with known item searches - discovery products at least currently struggle with known item searches. Often it is not that the article title is not in the discovery index but it isn't surfaced simply because it is buried on 2nd or later page! This might be improving but could be still problematic for very article titles with very generic or common words.

2) Inaccurate holdings - This is similar to the problem in the A-Z listings. In the case of Summon it is drawing from the same holdings that populate the E-Journal A-Z listings. So the same problems apply here depending on the workflow the holdings here might be less accurate than the OPACs/ILS

3) Article index does not cover article - Even if inaccurate holdings is not an issue, searching by article title in Summon and its cousins often fails. This is because, Summon does not yet have the article metadata (much less full text indexed) so searching by article name fails, where searching the ejournal A-Z listing by journal title first succeeds.

While most discovery services boast over 90% coverage of typical collections this may vary from subject area to subject area. For example for Summon it's almost certainly weaker in chinese and law then in science areas, so if you tried searching for law articles in Summon you would get far below 90%

4) Access to full-text is sometimes not stable (varying problems from wrong metadata from source, knowledge base of resolver is wrong, provider target URL translation error etc )  - Even if the article is correctly listed in Summon, clicking on the full-text might fail as typically OpenURL is used to access , which is usally less stable then a direct link to journal title in OPACs or Journal A-Z lists.

In particular, in some cases the target does not allow OpenURL linking to the article level and drops the user at the journal level, which of course is almost the same as first searching by journal title!

5) Similar to other approaches, article level searches  work only for online articles but again it is possible to upload your print collection as does University of Huddersfield.


University of Huddersfield A-Z EJ Portal showing print holdings


4. Searching by article title using Google Scholar  



In many ways, Summon and similar were designed to compete with Google Scholar and hence both are very similar. Fast, quick with article level searching features.

In fact, some libraries have evaluated Web scale discovery products and opted to go for Google Scholar due to costs/

How then does one get to the full-text via Google Scholar? Typically the library opts into the Google Scholar Library Links program , this uses the library's OpenURL resolver but also requires that the library provide holdings to Google Scholar so the search results page in Google Scholar is "smart" and shows the  OpenURL link only if necessary.

A lesser option but still fairly popular is using proxy bookmarklet.

Using either method to access full-text leads to the same pros as Discovery products including

1) Fast, quick and efficient

2) No need to figure out journal title name, abbreviations etc

Google scholar also handles Open access and free stuff very well as a bonus. 


The disadvantages are similar as well

1) May not be in the Google Scholar index (it's notorious that nobody knows what is inside)

2) Inaccurate holdings given to Google Scholar

3) Access to OpenURL maybe unstable etc


To complicate matters one can bypass the  OpenURL /Google library links programme by using a proxy bookmarklet.

That can sometimes bypass inaccurate holdings and inability due to  OpenURL  since it blindly applies the Ezproxy stem to see if access is available.

So even if the journal holding coverage is wrong in either the OPAC/A-Z Journal listing etc, it doesn't matter, you will be brought to the article page via Google Scholar and the proxy applied might work.

In our institution this is hugely popular method. Needless to say this method can fail, because without  OpenURL  to solve the appropriate copy problem, Google Scholar's first choice to send you to get the full text might be wrong.

So you may have access via subscription agent like Swetwise or aggregator like Ebscohost but Google Scholar's would not know and send you direct to somewhere typically the publisher's version ("Publisher's full-text, if indexed, is the primary version") where you have no access.

However in our institution this is not so common for most science and social science users they are perfectly happy with the proxy bookmarklet method since we usually do buy direct if we have access. And I estimate this method working correctly in ideal conditions better than 8 out of 10. 

Add the fact our current OpenURL implemention is quite new , relying on the proxy bookmarklet seems to be the best balance of speed and accuracy. More about this later.


5. Pubget - Even faster?


You might think the article title first approach is is the fastest possible way to get a known article in terms of number of clicks, but you can in fact speed this up.

To repeat searching using Google Scholar or Web Scale Discovery involves

1. Typing article title in search box
2. Scan results list (hopefully only one) and clicking on result
3. Authenticate here or after step 4
4. Scan Link resolver page for results and click on appropriate result that brings you to the article page
5. Scan article page and click on download pdf link or button


If you are using Google Scholar with a link resolver or a WebScale Discovery product like Summon you will usually see the link resolver page (#4). But is that screen really necessary?

Of course, Google Scholar + proxy bookmarklet avoids #4 but that has drawbacks already stated since it doesn't take into account the library's collection.

The link resolver page can be bypassed if you turn on one-click functionality in SerialsSolution 360link (OpenURL resolver) so it always sends you to the first option available if multiple options  are available if your library happens to have the article in multiple places.




One click option from 360 link bypasses link resolver screen


One-click option is nice, but OpenURL linking is well known to be unstable sometimes, so SerialsSolutions tries to handle this problem with a "helper window", actually a Iframe so users can go to the link resolver screen if the direct link fails (See above).

Besides this option, depending on the Discovery platform you use there may be "direct linking" options that don't rely on OpenURL at all.  Both ways you don't see the additional OpenURL screen.

In fact a study has shown that 23% of students tested actually got stuck at the link resolver screen! So perhaps it would be good to bypass that screen if possible. So let say you do that.

Still is the following the fastest (in term of clicks)?


1. Typing article title in search box
2. Scan results list (hopefully only one) and clicking on result
3, Authenticate if necessary and brings you to article page (with helper window for one-click OpenURL)
4. Scan article page and click on download pdf link or button


Surprisingly no.

You can actually one up this by skipping #4 and offering downloading of the PDF from the search engine page . This is in fact the selling point of Pubget.

As shown below, you search for the article in  Pubget  and you don't even need to go to the ejournal page, there is a "Find PDF" button that will automatically get you the pdf so you don't even see the original Ejournal article page.





I am not familiar with the inner-workings of  but I assume it's somewhat like a openurl resolver except it "knows" for a certain journal/platform the correct way to accesss the pdf direct with a contructed url so you don't even need to land on the ejournal page.

You might think this isn't a huge improvement to bringing you to the article page and then clicking download, but many users just want the pdf to download, they don't care to go to the ejournal page to struggle with the diverse and varying user interface to hunt for the link to the pdf etc. 



Using article finder/citiation linkers (OpenURL) - e.g Webbridge/SFX/360link




I was of two minds about placing this under either A. Using Journal title or B. Using Article title.

But since this method usually requires at least a journal title or ISSN, it should be the former , in any case I left it for last.

This is actually using OpenURL with the user manually providing the data needed in a form. So there is no article level index, and theoretically this should outperform article level indexes even if both are using the same journal holdings or knowledgebase.

The html form typically called article finder/citation linker can actually lead the user to the full text via  OpenURL , while searching by article in Summon fails because the article may not be indexed in Summon, but the  OpenURL can still lead the user there.

This took me a while to grasp, particularly since you could enter article title in citaion linker as well, confused me.

But basically article finder/citation linker is not relying on article index. It requires on 360core which is a journal level index. Given the journal and certain other metadata such as author, issue , date, starting page, the OpenURL  can "guess" the correct url to construct that will get the user to the full-text.

The  OpenURL  resolver does not need to know if the article actually exists, but knows that for that platform, if the article with such and such charactersitcs did exist, the url would be such.

In comparison Summon etc needs to have the article title indexed before it can be found.

If this methods works flawlessly, the user enters sufficient metadata, the  OpenURL brings him directly to the article (assuming one-click is on) or shows the OpenURL resolver screen for options.

The article finder has many weaknesses of course including

1) The citation needs to be pretty complete and accurate for direct linking to article. Lacking some information means you can be dropped down only to journal level which can be frustrating.

2) For users it's often unclear how much or how little citation data to give. 

3) It's strictly OpenURL based and hence subject to all the problems of  OpenURL already mentioned

4) Time confusing since you need the whole citation to maximize chances 


Bonus method - DOI resolution

Articles have unique indentifers like DOI, PMIDs that can bring you to an article. The main problem with using that is, not all articles have this! Another problem, failure to cope with the appropriate copy problem unless paired with a OpenURL solution.


What is best method?

If one were looking for pure accuracy and cannot tolerate false negatives (i.e missing full text for a journal title when there is one), what is the best method?

If the same knowledge base with holdings are used accuracy in descending order (online only) seems to be

1. Journal A-Z list/ OPAC - Browse by Journal
2. Citation Linker
3. Discovery service eg. Summon

#2 may fails to get articles by manually browsing #1 because of problems with OpenURL linking. While #3 even if informed by the same knowledge base and journal holdings is subject to the article being indexed on top of issues related to linking via OpenURL (usually).

For my library the most accurate method typically involves the following algorithm 

1. Search journal title in OPAC (Our OPAC is loaded with the most accurate journal holdings)

2. Only if print and online are not available there, will try searching Google Scholar, for free & even as a lark might even try applying proxy bookmarklet to online copies just in case it works. Sometimes an article might just happen to be free for a short period.

This method is often very cumbersome and perhaps only library staff engaged in checking document delivery requests would do and typically we just tell users to do the first.

But is that necessarily the most efficient way? Let's make it simple and assume we are looking only for online articles (a very common scenario for users who want quick access only). Let's also assume that the OPAC holdings are 100% correct and if you can't find it using that, it isn't available.

So which is on average faster?

A) Search Google Scholar by article title and use proxy to access, if that fails, re-research by journal title in OPAC

B) Search by journal title in OPAC

Say you knew for example that searching Google Scholar first, then applying the proxy worked 90% of the time to find the full text. This would take on average 30 seconds.

While in 10% of the cases you would fail to find full text and need to search the catalogue by journal title to confirm if it exists. Say that takes 10 mins on average.

Simple maths should show it is more efficient on average to search using Google Scholar first then re-search in OPAC only if it fails, than to always try searching OPAC.

Mean time method A using Google Scholar first + re-search if necessary = (0.9*30 + 0.1*(30+600)) = 90s

Mean time method 2 using library catalogue first and only  = (1*600) = 600s

Of course, I just pulled all the numbers from the air. The average time taken for each method could be gotten by time studies which isn't particularly hard I think.

In any case I would estimate from fastest to slowest to complete/terminate one search (terminate could mean successfully find the article or failure)

1. Pubget

2. Summon + OpenUrl (one click enabled or direct linking) or Google Scholar + OpenUrl (one click enabled), Google Scholar + proxy bookmarklet

3. Summon + OpenUrl (normal) or Google Scholar + OpenUrl (normal)

4. Journal title search using Journal A-Z

5. Journal title search using OPAC

But let's get back to the example where we compare searching Google Scholar first and a re-search in OPAC with journal title vs searching OPAC with journal title only.

It's fairly easy to estimate average time for each step, somewhat harder to estimate would be the probability of getting a hit so you stop the first time. This would be a function of a) How big the article index is (for cases where you search for article title in Summon or Google Scholar) and b) How large your subscription/collection is (for article title first approaches and source title first approaches).

A) is intuitively true, the larger the article index you searching for, the more likely the step will terminate with success. B) is true as well if you think about it. The larger your subscription/collection the more likely you won't have to do a re-search.

E.g If there are only 1,000 articles in the universe and Library A *really subscribes* to 990, vs Library B that *really subscribes* to 10. No matter what method you use, the search will tend to terminate the first time with less re-searches for Library A. The latter library B will almost always have to re-search and still fail anyway.


Better methods?

Perhaps a hybrid method might work? 

This article  suggests enhancing citation indexes with article level indexess using Ajax. So if you enter article title in article finder and it matches.. the system would match without you even clicking the search button.

Think Google's auto-complete/auto-suggest as you type in the article title it would suggest closest matches drawn from it's known article index.  Of course you could do this to assist in entering other fields for say Author, Journal name in the Citation Linker etc.

Talking about Google, how about a Google instant version? Yes, I know technically could be tough to display the full article due to authentication, though I wonder if it can just show the brief record with metadata.

Or how about a discovery search that could identify known item searches and if it fails to match will automatically suggest a journal name browse method. 

Conclusion

I am not sure if anyone made it all the way to the bottom given this overly complicated post. But if you did, I thank you.

I am not a system or even eservices librarian, so my understanding of systems relating to journals might be off-base, if so do let me know, I am always trying to learn. 






























Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...