Sunday, August 23, 2015

Things i learnt at ALA Annual Conference 2015 - Or data is rising

I had the privilege to attend ALA annual conference 2015 in San Francisco this summer. This was my 2nd visit to this conference (see my post in 2011) and as usual I had lots of fun.

Presenting at  "Library Guides in an Era of Discovery Layers" Session

My ex-colleague and I were kindly invited to present on our work on a bento-style search we implemented for our LibGuides search.

For technical details please refer to our joint paper Implementing a Bento-Style Search in LibGuides v2 in July's issue of Code4lib.

See the Storify of event at

Data is rising 

Before I attended ALA 2015, I was of course aware that  research data management was increasingly an important service academic librarians are or should be supporting.

To be perfectly frank though, it was a hazy kind of "aware".

I knew that increasingly grant giving organization like NIH and other funders were requiring researchers to submit data sharing plans, so that was an area where academic librarians would provide support in particularly if open access takes hold since it would make obsolete many traditional tasks .

Also I knew there was all this talk about supporting Digital Humanities and GIS (geographic information system) services such that my former institution where I worked with began to start appointing Digital humanities and GIS librarians just before I left.

Perhaps closer to my wheel-house given my interest in library discovery, there was talk about Linked data and BIBFRAME which isn't research data management per se.

All these three areas relate to emerging areas that I knew or strongly suspected would be important but was unsure about the timing or even the nature (see later)

Add the "stewardship's duty of libraries" towards the "Evolving Scholarly Record" (what counts as scholarly record is now much expanded beyond just the final published article and libraries need to collect and preserve that), you can see why data is a word librarians are saying a lot more.

Still attending ALA annual 2015, made me wonder if finally a tipping point has been reached and I should start really looking at it deeper.

Is Linked data finally on the horizon?

While attending a session by Marshall Breeding "The future of Library Resource Discovery: Creating new worlds for users (and Librarians) he asked this question.

Breeding's observation was indeed apt, though one's choice of sessions to attend obviously as an impact so for example this blogger wonders if the overdose of linked data is simply due to her interest.

Still, this year there seemed to be quite a lot of talk on linked data and Bibframe. Perhaps a tipping point has been reached?

I think part of it is due to the fact that ILS/LMS/LSP vendors have began to support linked data.
This breaks the whole chicken and egg problem of people saying there is no interest in using linked data hence there are no tools for it and that there are no tools for it because it isn't worth making because no-one is interested.

The biggest announcement was on Intota v2 - ProQuest's cloud-based library services platform

"Intota v2 will also deliver a next generation version of ProQuest's renowned Knowledgebase. Powered by a linked data metadata engine, Intota will allow libraries to participate in the revolutionary move from MARC records to linked data that can be discovered on the web, increasing the visibility of the library." - Press release

I actually was in attendance during the session but left before it was demoed (kicking myself for that). The tweet below is interesting as well.

Of course, we also can expect Summon to start taking advantage of linked data to enhance discovery via Intota,

Besides Proquest, SirsiDynix announced to "produce BIBFRAME product in Q4 2015".
While Innovative had pledged support to Libhub Initiative a few months earlier.

OCLC of course has always been a early pioneer on linked data.

"Nobody comes to librarians for literature review?"

As part of my attempt to balance going to sessions where I was really interested in the area (and hence likely I would be well versed  in most of the things shown) and sessions where I was totally unfamiliar with (and hence likely most things would go over my head), I decided to go to some GIS sessions.

I accompanied my ex-collegue and co presenter to a couple of sessions on GIS (Geographic Information Systems) which he has an interest/passion in and is currently tasked with trying to start something up for the library.

I attended various sessions including a round table session  which focused more on what libraries were doing as opposed to more technical sessions. It was clear from the start that some academic libraries in the US were far more advanced than others, such as Princeton, who I believe had a librarian state that libraries have being managing data for over 50 years and it's not a new thing to them.

Much nodding of heads occurred when someone warned about jumping on the band wagon simply because their University Librarian thought it was a shiny new thing.

Many talked about staffing models, how to fit in liaison librarians vs specialist roles into these new areas which is a perennial issue whenever a new area emerges (e.g it was promoting open access the last time around for many academic libraries).

One librarian stated that helping faculty handling research data is important because "nobody comes to us anymore for literature searches".

Of course this immediately drew a response from I believe a social science (or was it medical) librarian who said, faculty do come to them for both literature review as well as data sets! :)

Why searching for data is the next challenge

ExLibris has been sharing the following diagram in various conferences recently, listing 5 things users expect to be able to do.

Of the five tasks above, I would say the greatest challenge right now would be to "obtain data for a research project" which can be seen as a different class of problems compared to the other 4 tasks which broadly speaking involve finding text based material.

I would think this is because over the years, improvements in search technology (from the "physical only" days to the early days of online and now to Google scholar and web scale discovery), coupled with easily over a century of effort and thinking of how to organize and handle text - this has made searching for text, in particular scholarly texts (peer reviewed articles in particular) if not a completely solved problem, at least a problem that isn't so daunting that most academics would recoil in terror and ask for help.

Yet, the level of difficulty for searching for data sets/ statistics is I would say about the same level of difficulty for searching for articles in the 1980s to 1990s. While the later has improved by leaps and bounds the former hasn't moved much.

Lack of competition from Google? 

Having worked in a business/management oriented University for 5 months, I am starting to appreciate how much more difficult it is to get datasets from say finance areas and I know many librarians including myself feel a sinking feeling in our stomach when asked to find them.

Firstly, the interfaces to get the data out of them are horrendous. Even the better ones are roughly at the level of the worst article searching interfaces.

This is partly I suspect because without Google to put pressure on these databases, there is no incentive to improve. Competition from Google I believe have driven the likes of EBSCO, Proquest etc to converge into pretty much the same usable design or at least a google like design that takes little to adjust to.

Today, the UI you see in Summon, Web of Science, Scopus, Ebsco platforms etc is pretty much the same, and you practically can use it without any familiarity. (See my post on how library databases have evolved most in terms of functionality and interface to fit into the google world).

Google's relentless drive to improve user experience has benefited libraries to try to keep up. You could say the Ebscos of the world would practically forced to improve or die from irrelevance as students flocked to Google .

Of the databases that libraries subscribe to , the worse ones typically belong to either the smallest outfits or ones that primarily served other non-library sectors.

So the likes of bloomberg , Capitaliq, T1 and even many law databases  such as lexisnexis have comparatively harder to use designs.

They can get away with this because of lack of competition from Google and also these are primarily work tools, and professionals are proud of the hard earned bloomberg skills say that gives them a competitive advantage.

When it comes to non-financial data, it becomes even more challenging, since there isn't many well known repositories of data (at least known to a typical librarian not immersed in data librarianship) that one should look at. Google is of limited help here showing up the usual open data worldbank/UN etc sources that are well known.

How researchers search for public data to use

A recent Nature survey asked researchers how they find data to use.

The article noted that no method predominated with checking references in articles as common a method as searching databases.  Arguably this points to the fact that

a) databases on date are not so well known
b) databases on data are hard to use (due to lack of comprehensiveness of data or poor interface).

Of course this survey question asks about "public data" to reuse,

Researchers often approach me about using data from databases (for content analysis) we license such as newspaper databases and article databases. This seems yet another area that academic libraries can work on, leading libraries like NCSU libraries have took on this task to negotiate access of data from the likes of Adam Matthew and Gale

Confusion over what libraries can or should do with data

Like any new area, academic libraries are trying to get involved in (thanks to reports like NMC's Library Horizon Report - Library editions listing this area as a increased focus) , there is a lot of confusion over the skill sets, roles and responsibilities needed.

What a "data librarian" should do is not a simple question, as this can span many areas.

In Hiring and Being Hired. Or, what to know about the everything data librarian, a librarian talked about how his responsibilities blow up and that "everything data librarians
 don’t actually exist".

He points out that many job ads for data librarians actually comprise 5 separate areas
  •  Instruction and Liaison Librarian
  •  Data Reference and Outreach Librarian
  •  Campus Data Services Librarian - (this job is most associated with Scholarly communication)
  •  Data Viz Librarian (Learning Technologist)
  • The Quantitative Data Librarian (Methods Prof)

I can smell the beginning of what the Library Loon dubs as "new-hire messianism". Where a new hire is expected to possess a impossible number of skill sets, working under indifferent or even hostile environments and expected to almost singlehandedly push for change with no or limited resources or authority. 

Obviously no one staff should be "responsible for data", I've been reading about concept of "tiers of data reference.  and thinking of how to improve in this area.


Like most academic librarians, I am watching developments closely, and trying to learn more about the areas. Some sites

blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...