Thursday, August 11, 2016

Are institutional repositories a dead end?

Long time readers of my blog know that I'm a bit of a worrier when it comes to libraries,

I've written tongue in check posts about  "The day library discovery died (for libraries)", I've been a skeptic of trends such as SMS reference, QRcodes and mobile sites (nailed the first 2), Altmetrics and 3D printers (the jury is still out on them) and generally worried about the death of libraries like any good librarian.

Consider this post as yet another part of the series where I play skeptic or devil's advocate.This time I focus on institutional repositories.

To avoid doubt, I define institutional repository as failing/dead end if they are unable to get most of the institution's scholarly output and make it publicly available to all.

The one great strength of Institutional Repositories

Let me set the scene, it was May 2016, and the scholarly world was still reacting to the shock purchase of SSRN by Elsevier.

On the GOAL mailing list, it was pointed out that the distributed nature of institutional repositories which are owned by individual universities was a great defense against monopolistic take-overs, as no single commercial entity could buy up all institutional repositories in the world. No one could do with IRs what Elsevier did with purchasing SSRN, hence taking a big slice of the OA market market (in certain disciplines) in one blow.

A response to that by a certain Eric F. Van de Velde caught my eye. He basically outlined why  he thought institutional repositories would fail and why subject repositories or even commercial based sites like ResearchGate were winning out.

It resonated with me because I was coming to the same conclusion.

Last month, I found he expanded his short reply into a post provokingly entitled  "Let IR RIP "  .

How provocative? It begins "The Institutional Repository (IR) is obsolete. Its flawed foundation cannot be repaired. The IR must be phased out and replaced with viable alternatives."

Eric as he explains was a early believer and advocate in the future of institutional repositories (going way back to 1999). This is someone who has managed and knows IRs and was hoping that they could eventually disrupt the scholarly communication system. Such a person now thinks IRs are a "dead end".

I don't have even a tenth of his experience in this field, but as a humble librarian working on the ground, I must concur with his points.

It seems to me, no matter how we librarians try, most researchers don't seem to have half the enthusiasm (assuming they had in the first place) they have with depositing full text in institutional repositories as compared to subject repositories or even social networking sites like ResearchGate.

Why is this so? You should really read his post , but here's my rambling take from a librarian point of view.

1. Institutional affiliations will change and control is lost when it happens.

Many faculty will move at least once in their career (twice if you include their time as a Phd) as such this doesn't incentivize them much to learn how to use or manage their own local IR systems.

Compare this to someone who invests in setting up his profile and/or deposits in ResearchGate or SSRN. This is something they will own and control throughout their career no matter where they go.

ORCID helps solves part of this problem, but even in a ideal world where you update in ORCID and it pushes to various profiles, the full text has to exist somewhere.

And if you upload it to a IR, the moment you leave, you lose control of everything there. And some progressive IRs include public statistics like downloads and views of your papers which is all well and good (especially if you are smart enough to create metadata records in multiple venues but link back to your IR copy) until you leave the institution and you can't bring them over to aggregate with your future papers.

Why would someone devote so much time on something they may not fully own? Compare this to someone setting up SSRN/ResearchGate profile, where all the work you do, all the statistics you accumulate in terms of downloads etc will forever be with you centralized in one place.

SSRN Statistics

Incidentally that's also why I suspect implementing the "copy request button" idea on institutional repositories tends to not work so well.

STORRE: Stirling Online Research Repository

For those of you who are unaware, the idea here is that you can legally? circumvent embargo by adding the "copy request button". Just list the record (with no full text) on repositories and the visitor to the metadata only record can click on a "Copy request" button to instantly request a copy from the author. You as the author get the email, you can either reply with the file or in some systems simply give approval and the file will be released automatically to the individual.

This idea works very well in theory but in practice when you leave a institution it is likely the IR will continue to list your old invalid email!

Since I started my profile in ResearchGate, I've gotten requests for thesis and papers written when I was a undergraduate and later as a library masters student.

I would not have seen these requests if I relied on my old institution's IR "Copy request" buttons!

2. Lack of consistency across IRs

Though most University IRs are using a relatively small set of common software such as Digital Commons, Dspace, Eprints they can differ quite greatly depending on the customization and feature set, and this can be very off putting to the researcher.

It's not just surface usability and features, but also because there are no standards for metadata, content etc, it's becomes as Eric says "a mishmash of formats" when you try to search across them using aggregator systems like CORE, BASE etc. Each IR will have it's own system of classifying research, subjects, fields used etc. This is also something familiar to those of us who have tried to include IR contents into discovery services and find to our dismay we often have to turn them off.

A researcher who wants to use the IR when he switches institutions will have to struggle with all this and why would he when he could use something more familiar that he has been using since his grad school days....

3. Subject/Discipline affiliations are stable while institution affiliations are not. 

@aarontay @lisalibrarian @helent13 another disadvantage is scholars think in fields/disciplines not institutions.
— R. David Lankes (@rdlankes) July 30, 2016

This is a complimentary point to point number 1.

Subject Repositories have the advantage of greater familiarity to scholars and can have systems custom built for each researcher's community.

4. IRs generally lag behind in terms of features and sophistication  

Not every institution is a rich top Tier 1 University that is capable of investing time and money to provide a useful and usable IR that can compete with the best in the commercial world.

For example, there's a belief (which I think might be justified but I have no evidence) floating around that it's better to put your outputs in ResearchGate, than in IRs because the former two have greater visibility in Google.

I'm no expert but I find systems like ResearchGate and are just more usable. I've deposited to Dspace , Digital Commons systems before and they take me easily 30 minutes to get through it, and I'm a librarian!

ResearchGate and company are also more aggressive in encouraging deposits, for example if I list a metadata only record, it will often check Sherpa Romeo automatically for me and encourage me to deposit when it's allowed.

Maybe there are Dspace, Eprint etc systems out there with such features but the few ones I have used don't seem to do that. (CRIS systems do that I believe?)

While many find ResearchGate and annoying and intrusive, I think you can see they try to work on human psychology to encourage desired behaviors to deposit through gamification techniques or just evoking old fashioned human curiosity.

For example, Researchgate can tell you who viewed your record, who downloaded and read your paper (if they were signed on while doing so)  and you can even respond to such information by asking the identified readers for a review!

Not everyone thinks such features are a positive (privacy!) but the point here is that they are innovating much quicker and IRs, at least the average IRs are lagging. Often I feel it is akin to library vendors talking about bringing "Social features" into catalogues in 2012 and expecting us librarians to cheer.

Others such as Dorothea Salo in  Innkeeper at the Roach Motel have long pointed out the many shortcomings of IR software like Dspace. Under the section "Institutional repository software", she lists a depressing inventory of problems with IRs.

These include poor UX, lack of tracking statistics, siloed repositories which lack inter-operation-ability and the lack of batch uploading and download tools, the inability to support document versioning (something subject repositories do decently well), means faculty won't use IRs not even for the final version.

Add outdated protocols like OAI-PMH (which Google Scholar ignores) and the realities of how most IRs are a mix of full-text and metadata, rather than 100% full text as envisioned, IRs have had a uphill task.

Most of the above was written back in 2007, I'm unsure if much has changed since then.

5. IRs lack mass  

When was the last time you went specifically to the IR homepage to do something besides deposit a paper?

How about the last time you decided to go to your IR homepage to search for a topic?

IRs just simply don't have enough central mass (one institution's output is insignificant even if it was all full-text)  to be worth visiting to browse and search compared to say a typical Subject repository.

As such, the most common way for a user to end up on a IR page or more likely just a pdf download is via Google Scholar.

Is this a problem? In a way it is because the lack of reasons for authors to visit the IRs means that any possible social networking effect is not present and as the saying goes out of sight, out of mind.


I would like to say here that I fully respect efforts and achievements of my colleagues & librarians around the world who directly manage the IR. It's can't be an easy task particularly since many can be labouring under what The Loon calls the coordinator syndome (though hopefully this problem has diminished over the years given that scholarly communication jobs are better understood, see also the tongue in cheek "How to Scuttle a Scholarly Communication

Still looking at my points, it seems that a big unifying point is that economies of scale matter and repositories at the institutional level aren't the right level to work in. Lorcan Dempsey would put it as researchers preferring to work at the network scale as opposed to the institution scale.

The point here is while some IRs have achieved some success eg MIT hitting 44% total output deposited (and consider that MIT is a early pioneer and leader of the open access movement), many have failed to attract all but the most minimal amount of deposits.

Perhaps this is purely anecdotal, but my impression is while you can find researchers who put their papers on Subject repositories/Social networking researcher sites AND institution repositories (aka researchers who just crave visibility and are willing to juggle multiple profiles and sites) or those who just put in the former only, it's rare to find those who only put things in the IR and nowhere else.

Various studies (e.g this and this ) are starting to show more and more free text are reside in sites like say ResearchGate than institutional repositories.

This doesn't augur well.

I'm not saying though it's not possible to coerce researchers to deposit into IRs.

For example it seems an Immediate-Deposit/Optional-Access model like that done by the University of Liège seems to achieve much success by making researchers deposit all their papers on publication whether it can be released open access or not immediately or at all. This coupled with a understanding that papers not submitted into the IR will not be considered for performance purposes seems sufficient to cause high rates of compliance.

However doing so  is going against the wishes of the researchers who seem to naturally not favor open access via IRs and it seems to me would rather do it via SR, researchgate or even through gold OA (if money is available).

A lot of problems I suggested for IR can have solutions, for instance more standardization of IRs would be one. More resources poured into doing UX to understand needs and motivations of researchers is another. Librarians can either push or pull full text to/from subject repositories on behalf of authors (via SWORD), work out a way to aggregate statistics across repositories perhaps. I've read COUNTER  is working on this to standardise downloads, but I wonder if one could have ORCID like system that aggregates such COUNTER statistics of all papers registered to you?

But one wonders , perhaps this is a space librarians should cede if other methods work better.

With the rise of solutions like SocArXiv bioRxiv and engrXiv , perhaps institutions should start running or sharing responsibility for aggregation of output at higher levels such as via subject repositories or even national repositories?

Of course, we all agree "solutions" like researchgate and are not solutions at all because they are owned by commercial entities and might disappear at any moment.

But is it possible to have both the advantage of scale and centralization and yet be immune if not resistant to take-overs by commercial entities? Can subject repositories be the solution?

In any case let me end off with Eric's words.

"The IR is not equivalent with Green Open Access. The IR is only one possible implementation of Green OA. With the IR at a dead end, Green OA must pivot towards alternatives that have viable paths forward: personal repositories, disciplinary repositories, social networks, and innovative combinations of all three."

What do you think? Are institutional repositories a dead end? Or are they needed as part of the eco system alongside subject repositories? I am frankly unsure myself.

Additional note : As I write this, there is some discussion about the idea of retiring IRs for CRIS . The idea seems to be that instead of running two systems that barely talk to one another, one should opt for a all in one system. There is grave suspicion by some against such a move because of the entities who own the software. How this factors into my arguments above I am still mulling over. 

On a personal note, I will be taking a month off my usual blogging schedule and will resume in Oct 2016. 
