Friday, April 13, 2012

Different ways of finding a known article - Which is best?

As a fresh graduate from library school with little practical experience, I used to think that known item searches ie finding an article or book when you already knew the title etc was relatively trivial and the difficulty was with the other type of searches subject/topical searches?

(BTW I am well aware that there is quite a bit of disagreement over what actually counts as a known item search  (more academic piece) but for simplicity , I am going to take known item here to mean finding an article if you know the article title at least and perhaps even the whole citation.)

But as time went by, after answering question after question on how to find if the library has a certain known article, I realized known item searches for articles while not as hard as subject searches, it is usually no piece of cake for users either.

It's not that users didn't ask about finding known titles of books, some do, particularly if they got the title wrong or in cases where they were looking for textbooks with common titles and dozens of editions like "Financial Accounting" (failure on identify in FRBR tasks).

Still in general they were dwarfed by users asking if a known article exists either because they were
  • following up from a reference in a paper (online or print) 
  • looking for a paper that cited the one they were reading
  • found it in a indexing and abstract database
  • or a professor/colleague/friend mentioned it. 

Why is it so difficult? Several reasons
  • Users are used to searching by keywords in article titles thanks to web search engines and complete article index that covers everything accessible don't exist
  • Difficulties in maintaining a clean knowledgebase/ source of journal titles means even if the user does a search by journal name he may still get misleading results
  • Increasing amount of Open Access or Free material not in the usual library silos/database/OPACS 

Today libraries support a bewildering list of options for finding known articles from searching
  • OPACS 
  • Next generation catalogues e.g Encore (that do not list article titles)
  • Journal A-Z Listings - e.g Serial Solutions E-Journal PortalEBSCO A-to-Z  
  • Article finder/citiation linkers (OpenURL) - e.g Webbridge/SFX/360link
  • Article index search engines like Google Scholar or the new Web scale/Unified discovery products like Summon or Ebsco Discovery Service or Pubget like services

Classic Library Catalogue - ASU Libraries

Serials Solutions EJ Portal - ASU Libraries

Citation Linker (SFX) - University of York

Summon - University of Queensland

Google Scholar - with Harvard 


Which method is the best? In general they divide into 2 main classes, searching by source/journal title first, followed by article versus article title search directly in a search engine that indexes article titles.

Of the two methods, users instinctively do an article title search unless  first trained by a librarian. But we as librarians know that if we want to be sure if an article is available, a source title search approach by searching for journal title is the best method because search engines that index article titles don't cover everythng we own.

Warning : What follows is overly complicated, over-thinking that provides hold little value. Feel free to skip to the comments and post what method you use yourself or use to teach others when they ask you how to find a known article.

A Finding known item using Journal Title first

This is the method that was taught to me when I first joined my current library and the method I used when I was at library school. Still there are 2 options at least if one uses this method.

1. Searching using OPAC/ILS

At my current work place, the official method we teach involve searching by Source Title aka Journal Title in either our classic catalogue Innovative Interfaces' webpac pro or the next generation catalogue Encore.  Both given the same information, a title browse works better for webpac pro by cutting out possibilities compared to a default keyword search.

Then of course hopefully you do see the Journal title, make sure the online version has the right coverage, click on it and then hope that there is an online version with the right coverage or failing that a print copy. Assuming online copy, once you are in right the platform or database, you either browse by issue or just do an article search by article title. (I personally prefer the former which while slower is surer, as a search by article title might fail due to special characters like commas fouling the search, or copy and paste space characters causing problems).

Phew! I am so used to doing this, I can almost do this instinctively but in fact there are many pitfalls, some specific to our system some not. Below are some fairly common one to most systems.

First the obvious. For this method to get reasonable amounts of accuracy the library has to have a policy of catalogues all journal subscribed even those subscribed in a database or aggregator. While many libraries do, some don't and simply create a MARC record to the database. E.g There are libraries that have a library record for Business Source Premier, but don't catalogue separately (or upload journal titles to the OPAC) each journal in it.

Depending on the size of the collection this can be a huge undertaking. Assuming this is done, there are other issues to do with user error.

1) Difficulty getting the journal title to search for 

This could be due to the fact that the user only has an article title for whatever reason. Of course, one can usually find the source title by googling or using google scholar but for some titles particularly older ones it may actually fail to yield anything. This scenario typically happens when a student is told of some article title (which may be slightly off) mentioned by his supervisor. 

2) Abbreviations of journals

Some citations/references have very obscure abbreviations. That itself may not be a problem depending on the quality of the journal title cataloguing. For many institutions, the cataloguing of abbreviations may not be very good, in our case we tend to recommend users find the full journal name rather than try the abbreviation. Finding out the full name of a journal from an abbreviation may however not be simple matter sometimes.

3) Very generic Journal names 

Journal names like Nature, Science can often lead to dozens of records. Depending on whether the library practices single record approach (combining print and online journals into one record) or seperate records approach or hybrid this can lead to even more confusion and whether the user was smart enough to restrict to journals etc.

4) Inaccurate journal holding or coverage dates

Electronic resource management is a big bugbear for all libraries. This approach presumes that journal holdings are accurate and often it is not with wrong coverage dates . As we will see later some article title first search approaches might actually give access even if the holdings are wrong.

5) Many OPAC systems may not cover free/open access journals unless special pains is taken to upload this. A subset of #4

6) Time consuming

This is the biggest factor of all. While in theory this can be the most exhaustive method to confirm existence of an article in the subscription assuming no problems with #4 and #5 , it can be extremely time consuming.

You need to navigate two different systems, the OPAC first, and once you reach the ejournal/database platform you will need to hunt around for the right way to access the article by either searching or browsing.

Add to the fact that there are so many platforms out there that are constantly changing, even an experienced reference librarian  if sent to some unfamiliar interface may have to spend minutes looking for the right place to browse by issue, figure out where to click to download.


1) Covers print and online - unlike other approaches this method catches both print and online. If library practices single record approach for print and online this is even a bigger advantage since you can see everything in one view.

Classic Catalogue - Single record approach

2) Depending on workflow for Journals subscriptions, may be the most accurate method

This varies from library to library. For my place of work, definitely this is true. I have no idea if OPAC centric journal collection is still the rule for most, or do libraries focus more on their  Ejournal A-Z lists see below.

2. Searching using A-Z Ejournal lists
I am pretty new to this class of products, though I remember using them back in library school when I relied a lot on my library's  EBSCO A-to-Z  list. Currently I am playing with Serials Solutions A-Z Ejournal portal.

My understanding  is that such lists are generally meant for Ejournals (though it is not unknown for libraries to load up print holdings). Librarians manage holdings or lists of ejournals by selecting default packages or by selecting specific journal titles and if necessary customizing coverage dates. The main thing they don't do as compared to a ILS/OPAC is to load up or create MARC records.

In many ways finding a known item using this method is very similar to using the OPAC as it starts by searching for the journal title.

It has some advantages over searching OPAC in that

1) It covers only journals so you get less irrelevant results from books etc

To be fair, one could always restrict by default to journal collection in OPACs to get around this

2) It allows an easy way to browse by A-Z 

3) May or may not be more accurate journal holdings and coverage holdings, and probably has better journal information (e.g. title, alternative titles, issns, eissns) than inhouse cataloguing of Journals.

Many Journal A-Z lists are backed by strong authority records managed centrally. For example SerialsSolution's products are backed by KnowledgeWorks which is managed centrally, so once you indicate a journal or package is owned by your library, any changes needed to journal names, alternative names, issn, e-issn, merging/splitting etc of journals will all be managed automatically centrally by SerialsSolutions.

With economies of scale that come from mistakes found being corrected for everyone this can lead to a far more accurate journal search by title or issn then any one library can manage.  This makes searches by journal abbreviations etc more likely to work.

While SerialsSolutions can handle authorithy control of journals centrally, one thing they cannot handle is holdings. That is something they cannot do for you and the onus is on you to update yourself when your subscriptions and packages change (particularly if you use lots of customized packages).

Depending on your library workflow, the Journal A-Z listings may have more or less accurate data than the OPAC, depending on where your priotizes lie and where the source comes from. 

Some libraries push data from the Journal A-Z listings to the OPACs, some do the reverse, and yet others keep and update two independent systems.

I always wondered if one could also, maintain ejournal holdings in the A-Z listing like Serials Solutions 360Core, maintain print only MARC records of journals in the OPAC, then combine both in a web scale discovery product like Summon.

It gets even more complicated if you use SFX link resolver with Summon so you need to maintain two knowledgebases on top of the OPAC?  

The disadvantages of A-Z listings are often similiar to using OPAC to search/browse by journal title


1) Time consuming and unintutive

2) usually does not include print journals

B. Finding known item using Article Title

The main problem with searching by journal title is that's it's so indirect and extremely slow. In essence one must do the FRBR's 4 user tasks of Find-Select-Identify-obtain  almost TWICE, once for the journal title , then again for the article title, which explains why it is slow. 

What if we could just enter the article title, click on the result, authenticate  and get access?  

If I was writing this prior to 2007, I would probably talk about how one can use federated search for this. But in fact if I wouldn't bother back then as in fact federated search would actually be a non-starter since most library federated search systems did not provide enough coverage to make this method worth trying and would be too slow anyway.

Of course now we have Google Scholar and Web scale discovery products like Summon that cover typically 90% of most academic libraries collections in a unified article index so it's worth a shot to see if it might be worth doing an article title search.

3. Searching by article title using Web scale discovery products

I am most familiar with Summon but Ebsco Discovery Service and others are pretty much similar. You enter the article title. With any luck you see the article you want. You click on it, and it brings you to the full-text via OpenURL linking or direct linking (via some sort of agreement with the provider).

Ebsco Discovery Service, Nanyang Technological University Library


1) Fast quick and efficient - If it works it gives you the experience akin to google, though you may have to go pass a link resolver page and of course authentication. 

2) No need to figure out journal title name, abbreviations etc


1) Problems with known item searches - discovery products at least currently struggle with known item searches. Often it is not that the article title is not in the discovery index but it isn't surfaced simply because it is buried on 2nd or later page! This might be improving but could be still problematic for very article titles with very generic or common words.

2) Inaccurate holdings - This is similar to the problem in the A-Z listings. In the case of Summon it is drawing from the same holdings that populate the E-Journal A-Z listings. So the same problems apply here depending on the workflow the holdings here might be less accurate than the OPACs/ILS

3) Article index does not cover article - Even if inaccurate holdings is not an issue, searching by article title in Summon and its cousins often fails. This is because, Summon does not yet have the article metadata (much less full text indexed) so searching by article name fails, where searching the ejournal A-Z listing by journal title first succeeds.

While most discovery services boast over 90% coverage of typical collections this may vary from subject area to subject area. For example for Summon it's almost certainly weaker in chinese and law then in science areas, so if you tried searching for law articles in Summon you would get far below 90%

4) Access to full-text is sometimes not stable (varying problems from wrong metadata from source, knowledge base of resolver is wrong, provider target URL translation error etc )  - Even if the article is correctly listed in Summon, clicking on the full-text might fail as typically OpenURL is used to access , which is usally less stable then a direct link to journal title in OPACs or Journal A-Z lists.

In particular, in some cases the target does not allow OpenURL linking to the article level and drops the user at the journal level, which of course is almost the same as first searching by journal title!

5) Similar to other approaches, article level searches  work only for online articles but again it is possible to upload your print collection as does University of Huddersfield.

University of Huddersfield A-Z EJ Portal showing print holdings

4. Searching by article title using Google Scholar  

In many ways, Summon and similar were designed to compete with Google Scholar and hence both are very similar. Fast, quick with article level searching features.

In fact, some libraries have evaluated Web scale discovery products and opted to go for Google Scholar due to costs/

How then does one get to the full-text via Google Scholar? Typically the library opts into the Google Scholar Library Links program , this uses the library's OpenURL resolver but also requires that the library provide holdings to Google Scholar so the search results page in Google Scholar is "smart" and shows the  OpenURL link only if necessary.

A lesser option but still fairly popular is using proxy bookmarklet.

Using either method to access full-text leads to the same pros as Discovery products including

1) Fast, quick and efficient

2) No need to figure out journal title name, abbreviations etc

Google scholar also handles Open access and free stuff very well as a bonus. 

The disadvantages are similar as well

1) May not be in the Google Scholar index (it's notorious that nobody knows what is inside)

2) Inaccurate holdings given to Google Scholar

3) Access to OpenURL maybe unstable etc

To complicate matters one can bypass the  OpenURL /Google library links programme by using a proxy bookmarklet.

That can sometimes bypass inaccurate holdings and inability due to  OpenURL  since it blindly applies the Ezproxy stem to see if access is available.

So even if the journal holding coverage is wrong in either the OPAC/A-Z Journal listing etc, it doesn't matter, you will be brought to the article page via Google Scholar and the proxy applied might work.

In our institution this is hugely popular method. Needless to say this method can fail, because without  OpenURL  to solve the appropriate copy problem, Google Scholar's first choice to send you to get the full text might be wrong.

So you may have access via subscription agent like Swetwise or aggregator like Ebscohost but Google Scholar's would not know and send you direct to somewhere typically the publisher's version ("Publisher's full-text, if indexed, is the primary version") where you have no access.

However in our institution this is not so common for most science and social science users they are perfectly happy with the proxy bookmarklet method since we usually do buy direct if we have access. And I estimate this method working correctly in ideal conditions better than 8 out of 10. 

Add the fact our current OpenURL implemention is quite new , relying on the proxy bookmarklet seems to be the best balance of speed and accuracy. More about this later.

5. Pubget - Even faster?

You might think the article title first approach is is the fastest possible way to get a known article in terms of number of clicks, but you can in fact speed this up.

To repeat searching using Google Scholar or Web Scale Discovery involves

1. Typing article title in search box
2. Scan results list (hopefully only one) and clicking on result
3. Authenticate here or after step 4
4. Scan Link resolver page for results and click on appropriate result that brings you to the article page
5. Scan article page and click on download pdf link or button

If you are using Google Scholar with a link resolver or a WebScale Discovery product like Summon you will usually see the link resolver page (#4). But is that screen really necessary?

Of course, Google Scholar + proxy bookmarklet avoids #4 but that has drawbacks already stated since it doesn't take into account the library's collection.

The link resolver page can be bypassed if you turn on one-click functionality in SerialsSolution 360link (OpenURL resolver) so it always sends you to the first option available if multiple options  are available if your library happens to have the article in multiple places.

One click option from 360 link bypasses link resolver screen

One-click option is nice, but OpenURL linking is well known to be unstable sometimes, so SerialsSolutions tries to handle this problem with a "helper window", actually a Iframe so users can go to the link resolver screen if the direct link fails (See above).

Besides this option, depending on the Discovery platform you use there may be "direct linking" options that don't rely on OpenURL at all.  Both ways you don't see the additional OpenURL screen.

In fact a study has shown that 23% of students tested actually got stuck at the link resolver screen! So perhaps it would be good to bypass that screen if possible. So let say you do that.

Still is the following the fastest (in term of clicks)?

1. Typing article title in search box
2. Scan results list (hopefully only one) and clicking on result
3, Authenticate if necessary and brings you to article page (with helper window for one-click OpenURL)
4. Scan article page and click on download pdf link or button

Surprisingly no.

You can actually one up this by skipping #4 and offering downloading of the PDF from the search engine page . This is in fact the selling point of Pubget.

As shown below, you search for the article in  Pubget  and you don't even need to go to the ejournal page, there is a "Find PDF" button that will automatically get you the pdf so you don't even see the original Ejournal article page.

I am not familiar with the inner-workings of  but I assume it's somewhat like a openurl resolver except it "knows" for a certain journal/platform the correct way to accesss the pdf direct with a contructed url so you don't even need to land on the ejournal page.

You might think this isn't a huge improvement to bringing you to the article page and then clicking download, but many users just want the pdf to download, they don't care to go to the ejournal page to struggle with the diverse and varying user interface to hunt for the link to the pdf etc. 

Using article finder/citiation linkers (OpenURL) - e.g Webbridge/SFX/360link

I was of two minds about placing this under either A. Using Journal title or B. Using Article title.

But since this method usually requires at least a journal title or ISSN, it should be the former , in any case I left it for last.

This is actually using OpenURL with the user manually providing the data needed in a form. So there is no article level index, and theoretically this should outperform article level indexes even if both are using the same journal holdings or knowledgebase.

The html form typically called article finder/citation linker can actually lead the user to the full text via  OpenURL , while searching by article in Summon fails because the article may not be indexed in Summon, but the  OpenURL can still lead the user there.

This took me a while to grasp, particularly since you could enter article title in citaion linker as well, confused me.

But basically article finder/citation linker is not relying on article index. It requires on 360core which is a journal level index. Given the journal and certain other metadata such as author, issue , date, starting page, the OpenURL  can "guess" the correct url to construct that will get the user to the full-text.

The  OpenURL  resolver does not need to know if the article actually exists, but knows that for that platform, if the article with such and such charactersitcs did exist, the url would be such.

In comparison Summon etc needs to have the article title indexed before it can be found.

If this methods works flawlessly, the user enters sufficient metadata, the  OpenURL brings him directly to the article (assuming one-click is on) or shows the OpenURL resolver screen for options.

The article finder has many weaknesses of course including

1) The citation needs to be pretty complete and accurate for direct linking to article. Lacking some information means you can be dropped down only to journal level which can be frustrating.

2) For users it's often unclear how much or how little citation data to give. 

3) It's strictly OpenURL based and hence subject to all the problems of  OpenURL already mentioned

4) Time confusing since you need the whole citation to maximize chances 

Bonus method - DOI resolution

Articles have unique indentifers like DOI, PMIDs that can bring you to an article. The main problem with using that is, not all articles have this! Another problem, failure to cope with the appropriate copy problem unless paired with a OpenURL solution.

What is best method?

If one were looking for pure accuracy and cannot tolerate false negatives (i.e missing full text for a journal title when there is one), what is the best method?

If the same knowledge base with holdings are used accuracy in descending order (online only) seems to be

1. Journal A-Z list/ OPAC - Browse by Journal
2. Citation Linker
3. Discovery service eg. Summon

#2 may fails to get articles by manually browsing #1 because of problems with OpenURL linking. While #3 even if informed by the same knowledge base and journal holdings is subject to the article being indexed on top of issues related to linking via OpenURL (usually).

For my library the most accurate method typically involves the following algorithm 

1. Search journal title in OPAC (Our OPAC is loaded with the most accurate journal holdings)

2. Only if print and online are not available there, will try searching Google Scholar, for free & even as a lark might even try applying proxy bookmarklet to online copies just in case it works. Sometimes an article might just happen to be free for a short period.

This method is often very cumbersome and perhaps only library staff engaged in checking document delivery requests would do and typically we just tell users to do the first.

But is that necessarily the most efficient way? Let's make it simple and assume we are looking only for online articles (a very common scenario for users who want quick access only). Let's also assume that the OPAC holdings are 100% correct and if you can't find it using that, it isn't available.

So which is on average faster?

A) Search Google Scholar by article title and use proxy to access, if that fails, re-research by journal title in OPAC

B) Search by journal title in OPAC

Say you knew for example that searching Google Scholar first, then applying the proxy worked 90% of the time to find the full text. This would take on average 30 seconds.

While in 10% of the cases you would fail to find full text and need to search the catalogue by journal title to confirm if it exists. Say that takes 10 mins on average.

Simple maths should show it is more efficient on average to search using Google Scholar first then re-search in OPAC only if it fails, than to always try searching OPAC.

Mean time method A using Google Scholar first + re-search if necessary = (0.9*30 + 0.1*(30+600)) = 90s

Mean time method 2 using library catalogue first and only  = (1*600) = 600s

Of course, I just pulled all the numbers from the air. The average time taken for each method could be gotten by time studies which isn't particularly hard I think.

In any case I would estimate from fastest to slowest to complete/terminate one search (terminate could mean successfully find the article or failure)

1. Pubget

2. Summon + OpenUrl (one click enabled or direct linking) or Google Scholar + OpenUrl (one click enabled), Google Scholar + proxy bookmarklet

3. Summon + OpenUrl (normal) or Google Scholar + OpenUrl (normal)

4. Journal title search using Journal A-Z

5. Journal title search using OPAC

But let's get back to the example where we compare searching Google Scholar first and a re-search in OPAC with journal title vs searching OPAC with journal title only.

It's fairly easy to estimate average time for each step, somewhat harder to estimate would be the probability of getting a hit so you stop the first time. This would be a function of a) How big the article index is (for cases where you search for article title in Summon or Google Scholar) and b) How large your subscription/collection is (for article title first approaches and source title first approaches).

A) is intuitively true, the larger the article index you searching for, the more likely the step will terminate with success. B) is true as well if you think about it. The larger your subscription/collection the more likely you won't have to do a re-search.

E.g If there are only 1,000 articles in the universe and Library A *really subscribes* to 990, vs Library B that *really subscribes* to 10. No matter what method you use, the search will tend to terminate the first time with less re-searches for Library A. The latter library B will almost always have to re-search and still fail anyway.

Better methods?

Perhaps a hybrid method might work? 

This article  suggests enhancing citation indexes with article level indexess using Ajax. So if you enter article title in article finder and it matches.. the system would match without you even clicking the search button.

Think Google's auto-complete/auto-suggest as you type in the article title it would suggest closest matches drawn from it's known article index.  Of course you could do this to assist in entering other fields for say Author, Journal name in the Citation Linker etc.

Talking about Google, how about a Google instant version? Yes, I know technically could be tough to display the full article due to authentication, though I wonder if it can just show the brief record with metadata.

Or how about a discovery search that could identify known item searches and if it fails to match will automatically suggest a journal name browse method. 


I am not sure if anyone made it all the way to the bottom given this overly complicated post. But if you did, I thank you.

I am not a system or even eservices librarian, so my understanding of systems relating to journals might be off-base, if so do let me know, I am always trying to learn. 

blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...