Saturday, May 21, 2016

Does the type of Open Access matter for future of academic libraries?

In Aug 2014, I wrote the speculative "How academic libraries may change when Open Access becomes the norm".

I argue that the eventual triumph of open access will have far reaching impacts on academic libraries with practically no domain of librarianship escaping unscathed. The article predicts that in a mostly open access environment, the library's traditional role in fulfillment and to some extent discovery will diminish (arguably library's role in some aspects of discovery is already mostly gone ).



Given that currently faculty view academic libraries mainly in the role of purchasers, I suggest to survive academic libraries will start shifting towards expertise based services like Research data management, GIS, information literacy etc.

Libraries may move towards supporting publishing of open access journals (perhaps via layered journals or similar) or focusing on special collections, supporting Lorcan Dempsey's inside-out view

I end by suggesting the trick for academic libraries is to figure out the right way and time to shift resources away from current traditional roles. Perhaps the percentage of content your faculty uses/cited that is available for free could be a useful indicator of when to shift roles.

What about the nature of open access that emerges?

One thing I shied away from speculating on was the type of open access that emerges as well as how the transition would occur. When open access become the norm (defined as say 80% of yearly scholarly output freely available) would most of the Open access be provided predominantly via Green Open Access or via Gold Open Access or some fair mix of the two? Would it be provided via Subject Repositories or Institutional Repositories (or maybe even modules from CRIS systems like PURE, Converis) ? 

Heck would it even matter if a Sci-Hub like system prevails and everyone pirates articles?  (That was a joke, I think....)

In other words, did it matter for the future of academic libraries no matter how articles were made freely available?


Elsevier , SSRN and the civil war in Open access

What led to this article was of course the news that the very dominant social science & humanties subject repository SSRN (Social Science Research Network) was bought up by Elsevier. 

I knew institutional repositories in general were not experiencing much traction and if I were a betting man I would venture preference for open access by faculty or rather faculty wanting to publicise their work by placing output online generally goes

a) Gold Open Access (if payment not issue)
b) Green Open Access (via Subject Repository) - for disciplines with traditions such as RePec, ArXiv, SSRN etc
c) Commercial academic sharing networks (e.g Academia.edu, ResearchGate)
d)Green Open Access (via Institutional Repository) 

and when (if?) open access became dominant, open access would be provided mostly in this order.

Still, I must admit until this happened it never occurred to me that subject repositories could be bought by legacy publishers!

Barbara Fister and Roger Schonfeld as usual have very good takes on the situation.

Roger's article points out that Elsevier is likely to pursue a very similar strategy as the one that led them to purchase Mendeley (leverage user information and analytics and to get into the user work flow)

"Given the nature of the emphasis that Elsevier has been making on data and analytics, we should expect to see over time other integrations between an article repository like SSRN and Elsevier’s other services. There is a wealth of information in the usage logs of services like SSRN that could help guide editors trying to acquire manuscripts for publication or that could assist business development efforts for journal acquisitions. Also important to watch are SciVal, Pure, and some of Elsevier’s other “research intelligence” offerings."

In addition, SSRN strength in the social sciences complements nicely Mendeley's strength in STEM fields. 

To me though this purchase of SSRN also shows how much a force Elsevier now is in the open access area.

First off not just 5 days ago it was announced Elsevier was now the world’s largest open access publisher. In terms of number of Gold Access Journal titles they are now in the lead.

Their acquisition of SSRN gives them a foothold in the social science preprint-postprint world. Will arXiv (which I remember had to resort to begging for donations a few years back) or other subject repositories be next? (RePEc apparently is safe) Will other publishers or companies in the library space start doing the same?

In the GOAL (Global Open Access) mailing list, I see talk that the distributed nature of institutional repositories are the best defense against such take-overs.

But one wonders if all this makes any difference if our institutional repositories fail to compete.

Given the large investments that Elsevier can pour into SSRN, add the synergies it can create with it's ownership of other parts of the ecosystem , can institutional repositories truly compete? Institutional repositories today are often mostly metadata rather than full text. Even as a librarian I find uploading my papers to University Institutional Repositories extremely painful compared to commercial alternatives like ResearchGate, Academia.edu due to the complicated online forms.

Sure, most Universities running Dspace , Eprints can in theory can fix the interface, add functionalities that aren't in the standard set, but this would apply only to their versions and not the base package. Compared to a centralised subject repository , researchers would find uploading their output extremely fragmented and uneven experience. Eg Some Institutional repositories would have usage statistics sent to them, some wouldn't. Compare to someone uploading to SSRN,  which will have a set of consistent data available for comparison (Institution, Researcher, Paper) across the whole output posted in SSRN.

So much for my hope that one of the tasks academic libraries could do once the purchaser role was phrased out would be that of a publisher via Institutional Repositories or even overlay journals. 

Also as Jennifer Howard notes, we are slowly getting cut out of researcher workflows. In the past such publishers would still consult librarians to get  a sense of how their material was used. With the digital era, they can see a lot more via web analytics. With acquisition of tools used across the whole research cycle (e.g Citation manager, preprint server etc), they can arguably be closer and know more about faculty than any liason librarian can hope to know!


One bright spot exists though. Current research information system (CRIS)  (eg. Thompson Reuters' converis or Elsevier's Pure), do have the potential to be in researcher workflows and it's logical for institutions to leverage on those systems to provide traditional Institutional Repository functions. But as noted here , such systems are mainly internal focused rather than external focused (though this might change) and libraries are generally secondary partners in them compared to Institutional repositories where they typically lead.

So it's hard to say where this will pan out or if they do what roles libraries will play.

Conclusion



"Librarians certainly should be thinking about what we can contribute to an open access world – after all, we’ve been advocating for it for decades. We need to figure out how we can contribute to a more open, more accessible world of knowledge."

Let's start thinking seriously now...


Personal Note

I was recently awarded the LAS (Library Association of Singapore) Professional Service Award 2015 at a ceremony at the Singapore National Gallery last week.





I am truly humbled and thankful for this incredible honor. I truly did not expect this.

I would like to thank Gulcin Gribb my University Librarian for nominating me and the awards panel.

I was cited for my contributions to the library profession for sharing of knowledge and ideas and this blog is definitely a very big part of it, so I thank everyone who I have worked with, corresponded and exchanged ideas with including all of you dear readers who give me motivation to blog.


Tuesday, April 26, 2016

A quick comparison of online infographics makers - Infogram, Pikochart and Venngage

When I was back in school, I dreaded art class as I was simply horrible at it. I was never a visual type of person and even today I favor words and numbers and avoid most "artistic" endeavors. So you can understand why when I decided to try creating a infographic for the library I expected it to be a big disaster.

Fortunately many tools have appeared that help even the artistic impaired people like me to not fail too badly.

Creating infographics to me involve three parts

1) Pulling out the data you need from various library systems
2) Creating interesting infographic "objects" (images, charts, visualizations)
3) Organizing everything in a interesting structure

I am decently well versed in the first step and can happily pull library data from Google Analytics, Primo Analytics, Alma Analytics etc so this part wasn't the problem.

For the organization of the infographic, I kept it simple and used one of the numerous templates available.

So the last part involved doing charts and other visualizations of the data I had extracted. While Excel has become increasingly capable at creating all types of charts (Excel 2013 has donut charts, radar charts, combo charts etc while Excel 2016 adds histogramstreemaps, waterfallsunburst, Box & Whisker, Pareto and more ) , there are still some typical visualizations used in infographics that Excel can't do and this is where the online infographics makers come into play.

In particular a very common visualization commonly used is to show X people in Y type statistics.



Another similar visualization often seen is to represent a percentage by shading proportionally an icon.





While it's possible to create the above by hand using say powerpoint , it can be pretty exhausting. This is where the free online tools help.

I tried the free versions of Infogr.am , Piktochart and Venngage and these are my impressions.


Infogr.am - good for more than 2 categories


Infogram has the usual charts and visualizations you expect and also some less commonly used ones like Treemap, Bubble , Hierarchy etc

But it is the Pictorial ones that are interesting to me.




Pictorial bar is the down right easy way to visualize X in Y type of images.

For example, if you want to show say 1 in 4 history students visit the library daily, it looks like this.



You can easily change the colors by editing the data , then clicking settings





You can also change the shape of the icon, to say a female icon or any of the preset ones. The selection is very limited though compared to others for the free version.

What if you want to create a visualization of three or more categories? Say you want to show of 10 students who visit the library, three are from business, five are from history and two are from Science?

For that you use the Pictorial chart.






I admit that I am puzzled that when I first enter the data by default it gives me a row of icons that are 12 by 24 = 288 icons.

If you fiddle around with other switches such as turning off the "round values" and using the "absolute distributions", you can see some of the icons are partly filled.

But still I wonder what's the point, why such a weird distribution of 12x24, I may have missed something but I can't change this distribution to something more sane like 10x10 to get a "For every 100 students...."

In any case, you can always turn on the "actual" switch, to get the exact number of icons you included.



Do also check out the Gauge , Progress bar or Size visualizations but in general Infogram visualizations are fairly simple compared to the ones below.


Pikochart - upload your own icons


Pikochart has roughly the same types of visualizations as infogram via their "icon matrix" 




However Pikochart seems to have far more options than infogr.am. You can 

a) change to a far large set of icons available than in infogr.am
b) change to a icon you uploaded (svg file)
c) Set the number of columns for the icon to be arranged in.



In the above example, I changed the data to Business = 20, History = 30 and Science = 50.

I also changed the columns to 10, so there are 10 icons per row.


You can of course use this to do various tricks. In general, I find Pikochart has slightly more options than Infogram and the ability to upload your own icon is a big win.

Venngage - my favourite

Venngage  is by far my favourite tool  of the bunch at least for the purposes I am using it for. 

First off, if you just want to represent 2 categories (e.g Use/non-Use) you select Pictograms.

Like Pikocharts, you get a huge library of icons to select from. But unlike Pikocharts you can't upload your own (not for the free version at least).

Still with the wide variety of icons available, you can easily create high quality professional looking stuff like this.



By default, you get a 5 by 5 set of icons and you get 13 icons colored blue.

You can easily change it to say 10 by 10 with a value of 35. I've also changed the color.



Besides the fact that this visualization can't handle visualizations with more than 2 categories (say Faculty/Post-Graduate/Undergraduate), it also can't show partial shading of icons. So for example if you wanted to represent in a icon of 5x2 , 2.5 icons shaded it can't be done. 

A fairly unique visualization that Venngage offers is the icon column and icon bar. Below is an example of a icon column that visualizes queries at the desk by source.

All you have to do is to enter a table with values, choose the icons you one and venngage will automatically calculate and create the icons scaled proportionally to your values.

In the above example, below is what I entered as values.



I also changed each of the icons to the appropriate icons using one of the available icons. It doesn't seem possible to upload your own, but fortunately there seems to be hundreds available. 



Have you seen infographics where there are icons that are proportionally filled up to X%? Seems a lot of work to create one? Venngage makes it easy.

In this example, I wanted to show that the library has an average occupancy rate of 80% at 10pm by creating a icon of a chair that is 80% filled.

The way to do so in Vennage is a bit hidden. First go to charts (on left), scroll down and select Icon chart .



Drag the icon chart (the partly filled Twitter icon) to the canvas on the right. But how do you change it from the Twitter icon to something else?

This is done by choosing icon (again on left), selecting one of the hundred icons available and then dragging it to the canvas. If you have done it correctly, if you click on the icon, you can see at the top a way of adjusting colors and the percentage fill.






Other nice stuff to explore include icons showing percentages (see below), bubble, stacked bubble and cloud bubble.


Canva

Canva has a very nice set of icons and other graphical elements etc, but it is relatively lacking in the pictorials that I have covered above. Still worth looking at if you want to use the large number of templates and other graphical elements



Conclusion 

This is just a quick overview of these online tools in one particular aspect that I was looking for.

Most of these tools are also capable of creating map visualizations something I didn't try this time.

This is something I might cover in future posts together with a quick comparison of desktop visualization/business intelligence tools including Qliksense Desktop, Tableau public and Microsoft BI desktop.

I am obviously still a beginner at this, so any corrections, comments and tips are welcome.

Sunday, March 20, 2016

Ezpaarse - a easier way to analyze ezproxy logs?

I've been recently trying to analyse ezproxy logs for various reasons (eg. supplement vendor usage reports, cost allocation, studying library impacts etc) , and for those of you who have done so before, you will know it can be a pretty tricky task given the size of files involved.

In fact , there is nothing really special or difficult about ezproxy logs other than the size, a typical log will look something like

140.42.65.102 jRIuNWHATOzYTCI p9234212-1503252-1-0 [17/May/2011:10:01:44 
+1000] "GET
http://heinonline.org:80/HOL/ajaxcalls/get-section-id?base=js&handle=hein.journals/josf65&id=483
HTTP/1.1" 200 120

Your library may show slightly more details such as capturing user login information, user-agent (i.e type of browser) and Referrer (i.e the URL the user was on before).

In fact, you could even import this into Excel using space as a delimiter to get perfectly analyzable data. The main issue is you couldn't go very far doing this because Excel is limited to 1 million records only.

Overcoming the size issue

So one can't use Excel, what about exporting the data into a SQL database?

One idea is to use sed - a stream editor to convert the files into csv and import them into a SQL database (which is capable of managing a large number of records, though you may still come up against memory limits of your machine).

In any case I personally highly recommend sed, it is capable of finding, replacing and extracting even very large txt files in a efficient manner as it a stream editor. For example I can use it to go over 15 Gb of ezproxy logs to extract logs that contain a certain string (e.g. sciencedirect.com) in less than 10 minutes on a laptop with 4-8Gb of Ram.

I messed around with it for a day or two and find it relatively easy to use.

What if you don't want to use a SQL database and just want to quickly generate the statistics?

Typical most methods involve either

a) Working with some homebrew Perl or Python script - eg See ones shared by other libraries here or here

b) Using some standard weblog analyzer like Sawmill , Awsstats , analogx etc .

These can run through your logs can generate statistics on any reasonable machine.

Still too big? Another alternative is to do an analysis over so called SPU (start point URLs) , which basically only captures the very first time a user logins via ezproxy and creates a session. This results in much smaller files , depending on the size of your library you probably will be able to analyse it even in Excel.

You may have to set up your ezproxy configuration files to generate SPU logs as it is not logged by default.

Session based analysis

But regardless of the method I studied , I realize that fundamentally they gave the same results basically what I call sessions based analysis.

Example output from this script

These methods would tell you how many sessions were generated, and combined with the domains in the HTTP requests could tell you the number of sessions or users for each domain (say Scopus, or JSTOR)

But sometimes sessions alone or not enough, if you wanted more in depth analysis like the number of pdfs downloaded or page viewed from say Ebsco or Sciencedirect you are stuck.

The difficulty lies in the fact that it isn't always obvious from the HTTP request whether the user is requesting a download of a PDF or even if it is a html view from that platform.

Certainly if you wanted to you could do a quick adhoc analysis of the URLs for one or two platforms, but to do it for every platform you subscribed to (and most libraries subscribe to hundreds) would be a huge task especially if you started from the scratch.

Is there a better way?


Going beyond session based analysis with ezpaarse

What if I told you there was a free open source tool - ezpaarse that already had URL patterns for parsing over 60 commonly subscribed library resources and could produce data rich reports like the ones below?



















Starting out with Ezpaarse

Ezpaarse comes in 2 versions, a local version you can host and run on your own servers and more interestingly a cloud based version.

The cloud based version is perfectly serviceable and great to use if you don't have resources or permission to run your own servers but obviously one must weight the risk of sending user data over the internet even if you trust the people behind ezpaarse. (The ezproxy log you upload to the cloud version doesn't seem to be secured I think)

One can reduce the risks by anonymizing IP address, masking emails, cleaning HTTP requests etc before sending it off to the cloud of course (I personally recommend using sed to clean the logs)


Choosing the right log format 

Your logs might be in slightly different formats , so the first step after you sign in you need to specify the format of your logs. You do so by clicking on the "Design my log format" tab, then throwing in a few lines of your logs to test.




If you are lucky, it may automatically recognise your log format, if not you need to specify the log format.

Typically you need to look into your ezproxy.config for the ezproxy log directive. Look for something like

LogFormat %h %l %u %t "%r" %s %b

If you did it correctly, it should interprete the sample lines nicely like this (scroll down)



If you are having problems getting this to work do let the people at Ezpaarse know, they will help you figure out the right syntax. My experience so far is they are very helpful.

In fact, for ease of reuse, the ezpaarse people have helped some institutions create preset parameters set already. Click on parameters



You can see some predefined paraemters for various institutions . They are mostly France and Europe in the screenshot, but as you scroll down you will see libraries from US, Australia are already included, showing that word is spreading of this tool.



You can look at other options, including the ability to email you when the process is complete but most intriguing to me is the ability to simulate COUNTER reports (JR1)


I haven't tried it yet but could be used to compare with vendor reports for a sanity check (differences are expected of course because of off-campus access etc).


Loading the file and analyzing

Once done the rest is simple. Just click on Logfiles tab and add the files you want to upload.




I haven't tried with huge files (e.g >4 Gb), so there may be file limits but it does seem to work for reasonably sized files as it seems to be reading line by line.



As the file is processed line by line you can see the number of platforms recognized and the accesses recorded so far. My own personal experience was on the logs occasionally choking on the first line and refusing to work, so it might be worth while clicking on system traces to see what error messages occur.

Downloading and reporting


Once the file is 100% processed you can just download the processed file.

It a simple file csv file where the data is divided or delimited by Semicolons that you can open with many tools such as Excel.



You can see the processed file below.



There are tons of information that ezpaarse managed to extract from the ezproxy log, including but not limited to

a) Platform
b) Resource type (Article, Book, Abstract, TOC etc)
c) File type (PDF, HTML, Misc)
d) Various identifiers - ISSN, DOIs, Subjects (extracted from DOIs) etc.
e) Geocoding - By country etc

It's not compulsory but you can also download the Excel template and load the processed file through it to generate many beautiful charts.











Some disadvantages of using Ezpaarse

When I got the cloud based version of Ezpaarse to work, I was amazed at how easy it was to create detailed information rich reports from my ezproxy logs.

Ezpaarse was capable at getting very detailed information that I wouldn't have thought it was possible. This is due to the very capable parsers build-in for each platform.

This also is it's weakness because ezpaarse will totally ignore lines in your logs for platforms it does not have any parsers.

You can see the current list of parsers available and ones that is currently worked on.

While over 60 platforms have parsers such as Wiley, T&F, Sciencedirect, Ebscohost etc, many popular ones such as Factiva, Ebrary, Westlaw, Lexisnexis , Marketline , Euromonitor are still not available though they are in progress.

Of course if you subscribe to obscure or local items chances of them been covered is nil unless you contribute a parser yourself.

Overall, it seems to me currently Ezpaarse has parsers on more traditional large journal publishers and fewer on business, law type databases. So institutions specializing in law or business may get lesser benefit from Ezpaarse.

In some ways, many of the parsers cover platforms that libraries typically get COUNTER statistics from, but ezproxy log analysis goes beyond simplistic COUNTER statistics allowing you to for example to consider other factors like user group, discipline etc as such data is available in your ezproxy logs.

A lot of the document is also in French but nothing Google translate can't handle.

Conclusion

Ezpaarse is a really interesting piece of software. The fact it is open source and allows other libraries to contribute to the project  without reinventing the wheel and create parasers for each platform is a potential game changer.

What do you think? I am a newbie at this ezproxy analysis with limited skills, do let me know what I have missed or misstated. Are there alternative ways to do this?

Monday, February 29, 2016

Primo and Summon - Same but different? - (I)

Last month, I had the opportunity to attend VALA16 in Melbourne and I capped off my visit down under by seizing the opportunity to attend ANZREG  (Australia New Zealand Regional Ex Libris Group for users) seminar.

I had the opportunity to present and the following is part one of what I presented.

Introduction


One of the biggest news to hit the library automation space in 2015 was Proquest's acquisition of Ex Libris.

Obviously this has great implications for academic libraries as both companies own leading library technology software in terms of link resolvers (360link vs SFX/Alma Uresolver), Discovery services (Summon vs Primo) , ILS/Library service platforms (Alma/Voyager/aleph vs Inota) and a host of other library technology systems including recommender systems, digital content management systems, ereading list software etc.

I won't go into the implications of such a move except to mention that Primo's selling point as content netural discovery service is now shot because it is now owned by Proquest which owns plenty of content libraries' subscribe too , as others more qualified have done so.

How things might play out in the discovery space

From the point of view of a academic librarian who has had experience managing and handling Summon and recently Primo, while I recognize that such a merger has disadvantages in terms of fewer choices for libraries and the loss of content neutrality, I admit to salivating at the idea of a discovery service that can draw from the strengths of both Proquest and Ex Libris.

I don't have any insider information on how things will play out, but like many (see above) who have commented it seems very likely that Primo and Summon will eventually merge (though it's unclear about the time scales)

I was toying with a blog post on what I hoped the merger would bring but didn't manage to get it out before Proquest's Ex Libris company officially unveiled their strategy roadmap in 2016.

This webinar that was overwhelmingly subscribed, pretty much fulfilled my wishlist of things that I hoped to see from the merger by bringing the best of both together. But first and overview of changes.

Overview of changes

In brief, Intota work appears to be folded into Alma, as an institution that just switched to Alma, this pleases me of course. Proquest's excellent Knowledge base will be included in Alma and Alma analytics will be boosted with Intota assessment features by bringing in additional sources like Books in Print , Ulrich etc.

While it was predictable that the combined company would favour the well established Alma over the fledgling Intota , it was harder to guess what Ex Libris would do for Summon and Primo. Both are well established services used by hundreds of libraries and were flagship products of Proquest and Ex Libris. We can't have two flagship discovery services right?

Actually we can, as it was announced both are going to be flagship services going forward! You might say Ex Libris is not new to this in their playbook, having supported multiple ILS at the same time. But arguably, front end discovery services are a different animal from ILS - ILS often have unique quirks and are used by library staff who are sensitive to change and are difficult to migrate.

Web scale discovery services are used by front end users and while you can squint and see differences, the big 4 of Summon/Primo/EDS/Worldcat are pretty similar. Various studies have shown users don't seem to see much difference between them, interface wise at least.

Also it makes so much sense for the company to use one combined knowledge base and one unified index for both discovery services which further reduces any difference between Summon and Primo.

As far as I can tell Ex Libris's play based on their road map for the next 2 years is aimed at making the two even more similar, to the point, such that eventually there might be almost no difference between the two.

Let's see shall we what was announced to be coming.


The interface is made similar


Even before the Proquest acquisition of Ex Libris, the latter was making the rounds in conferences like ALA annual showing off the new upcoming Primo interface.

When I first looked at it, it struck me how suspiciously similar it was to Summon except for the facets position. (I was told later that you could swap it to the left with just a click of a button). Could be my bias but on twitter, people agreed it was similar too so it wasn't just me.




                                                   New upcoming - Primo interface




Summon v2.0 UI - example from Arizona State University



Another similarity? Both Summon V2 UI and the new Primo will use AngularJS. 


The new Primo interface like Summon is a clean interface with plenty of white space. Options only appear when needed on mouseover. Infinite scroll exactly like in Summon 2.0 is included. A recent talk I attended at VALA16 was one where at Bond University they compared the Summon 2.0 interface with their well customized Primo interface and predictably most users couldn't decide between the two. They did note that infinite scroll in Summon 2.0 was nice and now you can have that in the new upcoming Primo interface too!



Then again it could be just a general convergence since I am noticing many library databases like Scopus, Web of Science have redesigned to have the same clean, "lots of white space, hide details unless needed" aesthetic.

Still there are other similarities. For example, Primo features like Primo Featured results +  New Collection Discovery feature one can also duplicate Summon's content spotlighting that allows libraries to visually distinguish valuable content by content type.











The index and knowledge base is equalized



Arguably, discovery services UIs are unlikely to differ too much but it is in the content surfaced that lies competitive advantage and discovery vendors like to distinguish themselves from their competitors by boasting about the quantity and quality of their index.

As already mentioned it was announced the same knowledge base (there's a new one announced) and unified index will be used for both Primo and Summon. This is great news for Primo users as I think it is generally acknowledged Summon has a better index and knowledge base inherited from Serialssolutions (that was acquired by Proquest).



Features like Personalization (Primo), article recommendation (BX recommender in primo), Database recommender (Summon), Topic explorer (Summon) are shared



Even the small differences in feature sets between Summon and Primo will diminish.

Primo gets the database recommender and hopefully best bets feature from Summon. In fairness some institutions like mine use the adwords plugin from the developer network to achieve the same effect.


Database recommender feature - Summon university of Reading



Summon gets personalization (based on user status and degree) and article recommender (I assume this means if you subscribe to BX recommender the recommendations will appear in the Summon UI).



Primo personalization feature  



Primo personalization feature is also available on login selection


Currently Primo libraries that also subscribes to Ex Libris article recommender are able to integrate easily the two. Summon will eventually do the same. 






There are also other things mentioned that Primo will get from Summon including "Topic exploration"and "synonym match"  (I suspect that is Summon's Topic explorer and automated query expansion respectively) , which are relatively minor features.  

Misc & Conclusion


Discovery is closely related to delivery and another change in the works is that Proquest's IEDL (index enhanced direct linking) feature now available in Summon and 360 Link v2.0 will come to users who use Primo's Uresolver. (Unclear if it will come to SFX). This is great because IEDL has better link reliability (near 100% reliability claimed but in reality is close to 95-98%) compared to just openURL (which have error rates as high as 30%).

Add the fact that will 2017 onwards users of Alma can choose between Summon or Primo as their discovery solution, and assuming all the announced features are successfully implemented one wonders if there is really going to be any significant difference between the two.

In part (II), I argue that there is in fact a couple of differences between Primo and Summon that still remain (as far as I know) that might have significant impact on the decision to go for one or another. 


Tuesday, February 16, 2016

5 Alternative ways to get scholarly material that don't involve your library

I always been fortunate to be associated with a institute with a good academic library so I haven't really kept up with illegal or semi-illegal ways of getting access to scholarly material.

Still I've been recently thinking about the amount of free scholarly material available online and the amount of it that is legally or otherwise available and how they stack up against library document delivery.

The 5 methods are

1. Search for free copies via web search (mainly Google or Google Scholar)

2. Requesting copies from authors (via Institutional Repository or Social networks like Academia.edu)

3. #icanhazpdf requests on Twitter

4. https://www.reddit.com/r/Scholar/ on reddit

5. Illegal sources like Libgen and/or sci-hub.org

How effective are each of these methods?

To read on.......


Sunday, January 24, 2016

Look back at 10 top posts on librarianship I am proudest of (2012-2015)

It's the beginning of 2016, and nostalgia once again makes me look back at my past posts to see how they have stood the test of time.

The last time I did this was in  December 2011's Top 12 library blog posts I am proudest of and covered the first 3 years of this blog, so this post will cover the period from 2012-2015.

Of the 80 odd posts since these are the ones I am happiest with.

How academic libraries may change when Open Access becomes the norm (Aug 2014)






Written in 2014, this still reflects my current (as of 2016) thinking about the future of academic libraries. In this article, I argue that the eventual triumph of open access will have far reaching impacts on academic libraries with practically no library area escaping unscathed.

The article predicts that in a mostly open access environment, the library's traditional role in fulfillment and to some extent discovery will diminish. 

Libraries may move towards supporting publishing of open access journals (perhaps via layered journals or similar) or focusing on special collections, supporting Lorcan Dempsey's inside-out view

Given that currently faculty view academic libraries mainly in the role of purchasers, I suggest to survive academic libraries will start shifting towards expertise based services like Research data management, GIS, information literacy etc.

I end by suggesting the trick for academic libraries is to figure out the right way and time to shift resources away from current traditional roles. Perhaps the percentage of content your faculty uses/cited that is available for free could be a useful indicator of when to shift roles.

I don't have much I would change to this article as events since 2014 show that open access has continued to gain momentum. Perhaps if I wrote it now, I would mention a little about open education resources (OER).

Also check out the companion piece How should academic library websites change in an open access world? (Oct 2014) and for another strategy type article, Library and Blue Ocean strategies (I) - the case of discovery services (Dec 2013)







This article together with the others in the series, How are libraries designing their search boxes? (II)How are libraries designing their search boxes? (III) - Articles, Databases and Journals and Branding library discovery services - what are libraries doing? were massive surveys I did to study how ARL - Summon using libraries were branding the Summon search and exposing it as search boxes on their library homepages.

Echoing surveys I did on library mobile websites in the 2010s , it was done at the time when I was figuring out the testing and implementation of Summon. I spent a massive amount of time studying this, as I remember I was really fascinated by this topic. 

I would like to think many other academic librarians found these articles interesting and useful as it is now the 8th most viewed article ever.



Written when I was getting confident that I had a mature understanding of library discovery services, I believe it was a pretty fair summary of the current understanding on the state of library discove
This was a pretty popular article that presenters on discovery at conferences often pointed to when they wanted a way to quickly point to a concise summary of what was generally agreed in 2013.

Also catch the follow up - 6 things I am wondering about discovery (Oct 2013)



The most recent post on this list. In the early days of this blog, I would constantly post about various new online tools, web services that I found that were useful. For example in 2010, after I just acquired my first smartphone and then tablet, my posts were full of posts on apps and Twitter services.

In recent years, I did fewer of such posts, though I did dutifully write about history memory based apps, gamification, curation tools  and presentation tools like Haiku and Storify.


But still my favorite post is the recent post on how libraries are using Trello. It's amazing how many ways libraries have used it for their purposes from managing renewals, tracking troubleshooting requests , liaison work and more. 



How a "Facebook for researchers" platform will disrupt almost everything (April 2012)


Written in 2012, I wrote about the rise of sites like Mendeley that I described as "Facebook for researchers"

Back then I predicted they would start to occupy and then dominate a central part of the scholarly communication ecosystem and disrupt the following areas.

  • Discovery - Users would start to prefer searching  in them for discovery purposes (partly due to superb recommender systems possible by capturing tons of user data)
  • AuthorIDs - Users would prefer research profiles to other author unique IDs
  • Analytics - Due to the capacity audience they gained, they would have a host of user analytics that could be used for their own benefit.
Writing today in 2016, I think I wasn't too far off the mark. Mendeley grew from strength to strength and was eventually acquired by Elsevier who quickly recognized their growing value in 2013. Today Mendeley stands with Academia.edu and ResearchGate as the third surviving contenders to the throne.

Other players like Springer followed the lead of Elsevier by acquiring Papers in Nov 2012 (yet another reference manager) and Proquest started to push their cloud based reference manager - Flow (now renamed Refworks flow) in 2014, trying to leverage their dominance in the library discovery and database business in the process.
This sudden interest in reference managers is no big surprise, companies are figuring out that being where the researchers are, and owning their workflow is essential as I set out in the article.

In terms of detailed predictions I was mostly right as well.

I used to receive comments from graduate students asking why our discovery service was not as good as searching in Mendeley, and with the implementation of recommender systems, I have no doubt there is a portion of users who take much of their discovery service to such systems

At the time I wrote the article, I failed to make the distinction between author profiles on one hand and author identifiers. A single unique author identifier like ORCID could and should happily live alongside multiple author profile systems in Mendeley, Google , CRIS Systems etc

Today I am glad to report while, author profiles on Mendeley and ResearchGate and in particular Google remain popular, support of ORCID is or is nearing tipping point with publishers requiring authors to submit ORCIDs with their papers. This coupled with crossref's auto-update functionality , probably signals a bright future for ORCID.





As anyone who manages or leads the library discovery service team will tell you, much of one's responsibility as the lead is to answer to stakeholders (in particular other librarians) on relevancy issues in the discovery service 

I would write many times on relevancy ranking issues but I am proudest of this post that explains why nested boolean of the form

(A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3)  are counter productive in library discovery.

This is based on an understanding of how the type of environment where boolean started off is a lot different today.  We no longer operate in a environment where there is no full text available for matching, where there are no big mega-indexes and where users expect precise exact search matches as opposed to search systems with helpful features that autoexpand the search such as stemming.

The day library discovery died - 2035 (September 2013)


Yet another library discovery piece, but this one was different because it was written tongue in the check.

"A tongue in a cheek, thought experiment or perhaps precautionary tale of the ultimate fate of library discovery services in 2035. 

With a sigh, Frank Murphy, head of library systems of Cambridge-Yale University made a quick gesture at his computing device and the system began to shut down the library discovery service for the last time. Or at least that was what he intended but the gesture based device - a distant descendant of the Kinect device refused to comply."

My first and so far only attempt at writing fiction on this blog, watch out for the little twist at the end.

This piece of fiction describes one of four possible fates I expect might happen in Four possible Web Scale Discovery Future Scenarios (Dec 2014). 

Also check out What would Steve Jobs say? Reinventing the library catalogue (Oct 2013) for another post written in a similar style. 






5 things Google Scholar does better than your library discovery service (July 2015)


Besides spending the last 3 years thinking almost obsessively on library discovery services, it was natural I eventually became fascinated with the similarities and differences between Library discovery services closest rival - Google Scholar.

This "series" began with How is Google different from traditional Library OPACs & databases? (May 2012) and also included

However it was this latest article that directly pointed out the strengths of Google Scholar against library discovery services that blew up eventually being cited in various places from Marshall Breeding's NISO White Paper and Horizon reports : Library edition.





I found readers of my blog were not just interested in library discovery services but more directly in Google Scholar.

The 8 surprisingly things I learnt about Google Scholar came about because I was tasked to "teach" Google Scholar to faculty who basically wanted to know how to "rank high in Google Scholar for their articles".

That's a impossible task of course, as nobody knows the exact way Google Scholar ranks articles, and as far as I know there was no SEO (Search engine optimization) experts in Google Scholar.

In any case, I tried to know as much as I could about Google Scholar from various sources including public documentation and pulling together details from articles written by others who had experimented with Google Scholar.

I was surprised by the reaction to the article as it seems what surprised me about how Google Scholar worked was new to many too. As of today it's the 5th most viewed article!

Also check out the recent but popular 6 common misconceptions when doing advanced Google Searching (Oct 2015) which explores the common mistakes advanced users used to library database syntax (aka librarians) often make.





A all-time top 10 viewed post , this was a fun post surveying how libraries were exploiting memes for marketing. I went on to do the wildly successful library memes contest that I eventually presented on at Internet Librarian in 2012.



See also : 
More good library related video that spoofs movies or tv (April 2013)
What are library facebook pages using as cover photos? A survey  (March 2012)


Conclusion

These 10 articles I think is a fair representation of my most read articles from 2012-2015. Half of them relate to the issue of discovery, both library and commercial systems and this perhaps fairly reflects my obsession at the time.

Towards the later part of the period, perhaps disillusioned by the growing belief that in the long run libraries will be slowly pushed out of the discovery business, I began interested in open access and also started to play trend spotter or strategist with a couple of "strategy" management articles.

What I will be interested next is anyone's guess, though I believe that article and book discovery while not a 100% solved issue is increasingly becoming easier, and the next challenge that awaits us is the handling of data.

Some people have asked me, how much  time I spend on my blog posts and one even perhaps not too kindly suggested that blogging was my job.

In all seriousness, I really can't honestly tell you how much time I have spent on my blog. With over 216 posts in all since I started blogging in 2008 and at a conservative 5 hours per post (including editing), I have easily spent over 1,000 hours blogging, mostly during weekends, often after work on weekdays. Add in the time researching and thinking it could be between 3,000 to 5,000 hours in the last 8 years.

Will I stop one day?  My average posting rate per year is trending down in a somewhat predictable fashion.

2009 - 4.0 per month
2010 - 4.0 per month
2011 - 3.1 per month
2012 - 2.9 per month
2013 - 1.7 per month
2014 - 1.3 per month
2015 - 1.0 per month


Most library bloggers who started before me have since long stopped blogging, so I may too one day.

Until then, I thank you all who continue to subscribe or read and share my posts. 

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...