Sunday, May 31, 2015

Rethinking Citation linkers & A-Z lists (I)

I am right now involved in helping my current institution shift towards a new Library Service Platform and discovery service (Alma and Primo) and this has given me an opportunity once again to rethink traditional library tools like citation linkers, A-Z journal and databases lists.

It's pretty obvious such tools need a refresh as they were created

  • before Google/Google Scholar and web scale discovery.
  • in an era where electronic was not yet hugely dominant.


For this post, I will discuss citation linkers and how some vendors or libraries have attempted to update it for the current new environment of discovery followed by a further post on  ideas to update the A-Z database and journal list.


Citation linkers - a outdated tool?


The idea of citation linker (sometimes known as citation finder or article finder) function was meant to be straight forward. You entered a reference and the library would hopefully link you to full text of an article via the library's openurl resolver.

Most link resolvers such as ExLibris's SFX, Innovative's Web Bridge, Ebsco's Linksource etc offer a variant of such a tool.

Below we see some typical citation linkers across different vendors.





Typical citation linker from Proquest's 360 link




Typical SFX citation linker




Typical EBSCO LinkSource Citation finder




Typical Alma Uresolver Citation linker





I first encountered this tool myself pretty late in 2012, when implementing the suite of then Serialssolutions (now Proquest) services including Summon and 360link in my former institution.

Initially, I was totally confused by the fact that simply entering the article title alone would not work! You had to painstakingly enter various pieces of information which even then would often fail to work, depending on the accuracy of the citation fields you entered.

My confusion is understandable because I came upon this tool after the rise of web scale discovery where entering an article title was usually sufficient to get to the full text.

Even after I grasped the concept of how it worked, I realized how unlikely a user would be willing to use it, much less successfully use it since it was much easier to just enter the article title in Google Scholar or a library discovery service.

Sure as I discussed in Different ways of finding a known article - Which is best? way back in 2012, searching by article title via Discovery index has drawbacks (eg it can't find non-indexed items) but it is far easier and more convenient for the user and if there is anything I learnt in my years working in the library, convenience tends to trump everything else.


Can we improve on it? Autocomplete to the rescue


How would I create a citation linker 2.0?

A obvious improvement would to be to work on UX.

One study on the usability of the SFX citation linker  noted that while users who tried finding articles via the Journal A-Z list had issues, it was even worse when using the citation linker.

They suggested improving the usability of the tool by removing unnecessary fields such as author and article title fields which were usually not used for openurl resolution.

Georgia Tech Library seems to have followed this recommendation, as unlike the default sfx link finder
they hid the various author fields (first name, last name, initial) etc








A more interesting proposal to improve the tool was made by Peter Murray way back in 2006 entitled A Known Citation Discovery Tool in a Library2.0 World

"The page also has an HTML form with fields for citation elements. As the user keys information into the form fields, AJAX calls update the results area of the web page with relevant hits. For instance, if a user types the first few letters of the author’s last name, the results area of the web page shows articles by that author in the journal. (We could also help the user with form-field completion based on name authority records and other author tables so that even as the user types the first few letters of the last name he or she could then pick the full name out of a list.) With luck, the user might find the desired article without any additional data entry!"

Essentially he is suggesting that each of the fields in the citation linker would have autocomplete features via ajax which helps the user as well as adding a "Results area" which displays likely articles that the user is searching for. He goes on to suggest similar ideas for various fields such as volume and issue fields.

"Another path into the citation results via the link resolver: if a user types the volume into the form field, the AJAX calls cause links to appear to issues of that volume in addition to updating the results to a reverse chronological listing of articles. If a user then types the issue into the HTML form field or clicks the issue link, the results area displays articles from that issue in page number order. Selecting the link of an article would show the list of sources where the article can be found (as our OpenURL resolvers do now), and off the user goes."

At the time of the proposal, such a feature was not possible because it would require a large article index to draw results from. Today we of course have web scale discovery systems.




Auto parsing of citations 


One of the weaknesses of citation linkers is that it requires the user to parse the citation and enter each piece of information one by one into various fields. Not all users are capable of that or even patient enough to do that.

Why not simply allow users to cut and paste the citation and let the software figure it all out?





Brown University's free cite tool, allows you to toss in a citation and it will try to parse out each citation field. I believe there are a few other similar tools out there. The logical idea of course is to use this parsed output to fill in the citation linker field.

This is exactly what UIUC Journal and Article Linker tries to do.







A interesting variant of this is done by EBSCO.

EBSCO has an app called EBSCO Citation Resolver  via it's new Orbit platform, an Online Catalog of EBSCO Discovery Service™ Apps.



This uses the above mentioned Brown University's Free cite to parse references but instead of passing over the data to a traditional citation linker to try to get to the full text via OpenURL as UIUC does, it passes the data over to EDS itself.



As you can see above, the parsed information is sent to EDS for advanced searching using field searching.

We will get back to this example later.


Finding full text by text and voice recognition


Also why restrict oneself to cutting and pasting citations? What about other input methods? There used to be a ios app, I believe by Thompson Reuter's Web of Science that allowed you to take a photo of a reference and by the magic of OCR and text recognition combined with the citation parser, link you to the full text.

Unfortunately I lost track of that app but I recall it didn't work very well because it was limited to linking you to article entries in Web of Science and the text recognition combined with citation parser wasn't that good.

Still as technology advances I think the idea has legs. I have no doubt if Google desires, they can easily set this up to work with Google Scholar.

Now imagine combining this with voice commands such as Google Now, "Ok Google, find me such and such article by so and so in journal of abc".

Output accuracy should improve too.


Making it easy to input the citation is just one part of the equation, making sure full text can be reached is the other.

Coming back to the EBSCO Citation Resolver a interesting point to note is that after parsing the reference instead of passing it over to a citation linker such as their own Linksource citation finder (see below), it dumps the information into the discovery service Ebsco discovery service.


Parsed citation did not get passed to LinkSource's article finder


Why would one send the information to the discovery service and not the citation linker tool?

Part of the reason is that linking via OpenURL is often hit and miss in terms of linking to full text.

Some studies put full text linking success at around 80% of the time due to well known openurl issues which IOTA and KBART and are trying to solve.

Summon and EDS provide more stable forms of linking (often called direct linking that can work up to 95% of the time), which can be used whenever possible on-top of OpenURL. (Note : 360Link v2.0 provides the same type of direct linking as Summon)

Add the fact that automatic citation parser's is going to be somehwat inaccurate at text recognition, it might be easier to employ strategies that involve just extracting the author and article title to work with the discovery service , then trying to identify every citation field (eg vol, issue, page) to work with the full openurl resolver as the latter method is very error prone, requiring a large number of fields to be recognised correctly to work well.

For a third method that uses crossref metadata search api see "Resolving Citations (we don’t need no stinkin’ parser)"

That said as more citation styles require dois to be added, the work of parsing citation becomes easier as often the doi alone is sufficent to get to the full text. I also suspect the increased use of citations created by reference managers (eg Mendeley, Zotero) and the increased support of  Citation Style Language (CSL) for various styles may eventually make things more consistent and easier for the citation parser.

I can go further and imagine a hybrid system for output that would even work with Google Scholar for free pdfs + Web Scale Discovery direct linking + Openurl linking to give the best chance of reaching the full text.

You can see this hybrid multiple approach system somewhat in play in the Lazy Scholar extension (supports Chrome and Firefox) that checks Google Scholar for free full text and also offers openurl resolution.






This could work either the same way link resolver menus work now and display various options or there would be some intelligent system in the background deciding whether to use the discovery service or Google Scholar to find the full text (how likely was the first result in Summon say based on a title only phrase search the hit?) or to rely on traditional openurl resolution.


Conclusion


All in all though, I don't see much of a future for a stand-alone citation linker sitting on your website.

Few people have the patience to use it.

Ideally a web scale discovery service - basically the big 4 - Summon, EDS, Primo and Worldcat , should be built to handle cases when users copy and paste the whole citation. (I understand Primo has enhancements that handle it).

As it is, I notice the rise of such user behaviour in search logs of discovery services under my care. It's a small but significant amount, something noted in other studies that analyse discovery search logs.


Can Summon handle cutting and pasting full references?


Discovery services should definitely be trained to identify such cases and automatically call the citation linker function.

Perhaps the system would then try to

a) Recognise the likely type of material sought (book, book chapter, article etc)
b) Depending on material type, focus on identifying with high likelihood the title, doi, author etc.
c) Use either the discovery index, doi resolution or traditional openurl methods depending on a) and b)

I expect, usually the system would try a phrase search for an article title, perhaps further narrowed by author in the article index (the top match usually is highly likely to be the right one), sometimes it would resolve the doi and yet other times it would try the traditional citation finder method.

With tons of statistics on success rates, it might be possible to get a reasonably accurate system.

Depending on how certain you are on the model you are using, it could show all the options (similar to how link resolvers menus work now and in particular Umlaut is worth looking at), or it could just show the highest probability match.

Next up, do we really need A-Z database and A-Z Journal lists?

blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...