Sunday, January 24, 2016

Look back at 10 top posts on librarianship I am proudest of (2012-2015)

It's the beginning of 2016, and nostalgia once again makes me look back at my past posts to see how they have stood the test of time.

The last time I did this was in  December 2011's Top 12 library blog posts I am proudest of and covered the first 3 years of this blog, so this post will cover the period from 2012-2015.

Of the 80 odd posts since these are the ones I am happiest with.

How academic libraries may change when Open Access becomes the norm (Aug 2014)






Written in 2014, this still reflects my current (as of 2016) thinking about the future of academic libraries. In this article, I argue that the eventual triumph of open access will have far reaching impacts on academic libraries with practically no library area escaping unscathed.

The article predicts that in a mostly open access environment, the library's traditional role in fulfillment and to some extent discovery will diminish. 

Libraries may move towards supporting publishing of open access journals (perhaps via layered journals or similar) or focusing on special collections, supporting Lorcan Dempsey's inside-out view

Given that currently faculty view academic libraries mainly in the role of purchasers, I suggest to survive academic libraries will start shifting towards expertise based services like Research data management, GIS, information literacy etc.

I end by suggesting the trick for academic libraries is to figure out the right way and time to shift resources away from current traditional roles. Perhaps the percentage of content your faculty uses/cited that is available for free could be a useful indicator of when to shift roles.

I don't have much I would change to this article as events since 2014 show that open access has continued to gain momentum. Perhaps if I wrote it now, I would mention a little about open education resources (OER).

Also check out the companion piece How should academic library websites change in an open access world? (Oct 2014) and for another strategy type article, Library and Blue Ocean strategies (I) - the case of discovery services (Dec 2013)







This article together with the others in the series, How are libraries designing their search boxes? (II)How are libraries designing their search boxes? (III) - Articles, Databases and Journals and Branding library discovery services - what are libraries doing? were massive surveys I did to study how ARL - Summon using libraries were branding the Summon search and exposing it as search boxes on their library homepages.

Echoing surveys I did on library mobile websites in the 2010s , it was done at the time when I was figuring out the testing and implementation of Summon. I spent a massive amount of time studying this, as I remember I was really fascinated by this topic. 

I would like to think many other academic librarians found these articles interesting and useful as it is now the 8th most viewed article ever.



Written when I was getting confident that I had a mature understanding of library discovery services, I believe it was a pretty fair summary of the current understanding on the state of library discove
This was a pretty popular article that presenters on discovery at conferences often pointed to when they wanted a way to quickly point to a concise summary of what was generally agreed in 2013.

Also catch the follow up - 6 things I am wondering about discovery (Oct 2013)



The most recent post on this list. In the early days of this blog, I would constantly post about various new online tools, web services that I found that were useful. For example in 2010, after I just acquired my first smartphone and then tablet, my posts were full of posts on apps and Twitter services.

In recent years, I did fewer of such posts, though I did dutifully write about history memory based apps, gamification, curation tools  and presentation tools like Haiku and Storify.


But still my favorite post is the recent post on how libraries are using Trello. It's amazing how many ways libraries have used it for their purposes from managing renewals, tracking troubleshooting requests , liaison work and more. 



How a "Facebook for researchers" platform will disrupt almost everything (April 2012)


Written in 2012, I wrote about the rise of sites like Mendeley that I described as "Facebook for researchers"

Back then I predicted they would start to occupy and then dominate a central part of the scholarly communication ecosystem and disrupt the following areas.

  • Discovery - Users would start to prefer searching  in them for discovery purposes (partly due to superb recommender systems possible by capturing tons of user data)
  • AuthorIDs - Users would prefer research profiles to other author unique IDs
  • Analytics - Due to the capacity audience they gained, they would have a host of user analytics that could be used for their own benefit.
Writing today in 2016, I think I wasn't too far off the mark. Mendeley grew from strength to strength and was eventually acquired by Elsevier who quickly recognized their growing value in 2013. Today Mendeley stands with Academia.edu and ResearchGate as the third surviving contenders to the throne.

Other players like Springer followed the lead of Elsevier by acquiring Papers in Nov 2012 (yet another reference manager) and Proquest started to push their cloud based reference manager - Flow (now renamed Refworks flow) in 2014, trying to leverage their dominance in the library discovery and database business in the process.
This sudden interest in reference managers is no big surprise, companies are figuring out that being where the researchers are, and owning their workflow is essential as I set out in the article.

In terms of detailed predictions I was mostly right as well.

I used to receive comments from graduate students asking why our discovery service was not as good as searching in Mendeley, and with the implementation of recommender systems, I have no doubt there is a portion of users who take much of their discovery service to such systems

At the time I wrote the article, I failed to make the distinction between author profiles on one hand and author identifiers. A single unique author identifier like ORCID could and should happily live alongside multiple author profile systems in Mendeley, Google , CRIS Systems etc

Today I am glad to report while, author profiles on Mendeley and ResearchGate and in particular Google remain popular, support of ORCID is or is nearing tipping point with publishers requiring authors to submit ORCIDs with their papers. This coupled with crossref's auto-update functionality , probably signals a bright future for ORCID.





As anyone who manages or leads the library discovery service team will tell you, much of one's responsibility as the lead is to answer to stakeholders (in particular other librarians) on relevancy issues in the discovery service 

I would write many times on relevancy ranking issues but I am proudest of this post that explains why nested boolean of the form

(A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3)  are counter productive in library discovery.

This is based on an understanding of how the type of environment where boolean started off is a lot different today.  We no longer operate in a environment where there is no full text available for matching, where there are no big mega-indexes and where users expect precise exact search matches as opposed to search systems with helpful features that autoexpand the search such as stemming.

The day library discovery died - 2035 (September 2013)


Yet another library discovery piece, but this one was different because it was written tongue in the check.

"A tongue in a cheek, thought experiment or perhaps precautionary tale of the ultimate fate of library discovery services in 2035. 

With a sigh, Frank Murphy, head of library systems of Cambridge-Yale University made a quick gesture at his computing device and the system began to shut down the library discovery service for the last time. Or at least that was what he intended but the gesture based device - a distant descendant of the Kinect device refused to comply."

My first and so far only attempt at writing fiction on this blog, watch out for the little twist at the end.

This piece of fiction describes one of four possible fates I expect might happen in Four possible Web Scale Discovery Future Scenarios (Dec 2014). 

Also check out What would Steve Jobs say? Reinventing the library catalogue (Oct 2013) for another post written in a similar style. 






5 things Google Scholar does better than your library discovery service (July 2015)


Besides spending the last 3 years thinking almost obsessively on library discovery services, it was natural I eventually became fascinated with the similarities and differences between Library discovery services closest rival - Google Scholar.

This "series" began with How is Google different from traditional Library OPACs & databases? (May 2012) and also included

However it was this latest article that directly pointed out the strengths of Google Scholar against library discovery services that blew up eventually being cited in various places from Marshall Breeding's NISO White Paper and Horizon reports : Library edition.





I found readers of my blog were not just interested in library discovery services but more directly in Google Scholar.

The 8 surprisingly things I learnt about Google Scholar came about because I was tasked to "teach" Google Scholar to faculty who basically wanted to know how to "rank high in Google Scholar for their articles".

That's a impossible task of course, as nobody knows the exact way Google Scholar ranks articles, and as far as I know there was no SEO (Search engine optimization) experts in Google Scholar.

In any case, I tried to know as much as I could about Google Scholar from various sources including public documentation and pulling together details from articles written by others who had experimented with Google Scholar.

I was surprised by the reaction to the article as it seems what surprised me about how Google Scholar worked was new to many too. As of today it's the 5th most viewed article!

Also check out the recent but popular 6 common misconceptions when doing advanced Google Searching (Oct 2015) which explores the common mistakes advanced users used to library database syntax (aka librarians) often make.





A all-time top 10 viewed post , this was a fun post surveying how libraries were exploiting memes for marketing. I went on to do the wildly successful library memes contest that I eventually presented on at Internet Librarian in 2012.



See also : 
More good library related video that spoofs movies or tv (April 2013)
What are library facebook pages using as cover photos? A survey  (March 2012)


Conclusion

These 10 articles I think is a fair representation of my most read articles from 2012-2015. Half of them relate to the issue of discovery, both library and commercial systems and this perhaps fairly reflects my obsession at the time.

Towards the later part of the period, perhaps disillusioned by the growing belief that in the long run libraries will be slowly pushed out of the discovery business, I began interested in open access and also started to play trend spotter or strategist with a couple of "strategy" management articles.

What I will be interested next is anyone's guess, though I believe that article and book discovery while not a 100% solved issue is increasingly becoming easier, and the next challenge that awaits us is the handling of data.

Some people have asked me, how much  time I spend on my blog posts and one even perhaps not too kindly suggested that blogging was my job.

In all seriousness, I really can't honestly tell you how much time I have spent on my blog. With over 216 posts in all since I started blogging in 2008 and at a conservative 5 hours per post (including editing), I have easily spent over 1,000 hours blogging, mostly during weekends, often after work on weekdays. Add in the time researching and thinking it could be between 3,000 to 5,000 hours in the last 8 years.

Will I stop one day?  My average posting rate per year is trending down in a somewhat predictable fashion.

2009 - 4.0 per month
2010 - 4.0 per month
2011 - 3.1 per month
2012 - 2.9 per month
2013 - 1.7 per month
2014 - 1.3 per month
2015 - 1.0 per month


Most library bloggers who started before me have since long stopped blogging, so I may too one day.

Until then, I thank you all who continue to subscribe or read and share my posts. 

Thursday, December 31, 2015

A quick review of Moto 360 v1.0


Devices and workflow - A mini review of Moto 360

In the early years of my blog, I used to post about my use of my new devices like iPhone (2010), iPad (2011) and web services that I use.

I've now switched over to the android world, current setup is Android Note 4 + Nexus 7 (2013).

A few months back in Oct I got a Moto 360 v1 at a fairly cheap price to try out the Android wear smartwatch.



I did my due diligence, being an older model I was aware the processor was a bit slow, and the battery life wasn't the best. It would still last me 20+ hours with ambient mode off.

I was okay with charging every night, and even the speed, but in the end I found the smartwatch pretty much useless to me except as a time piece (I usually don't wear watches).

The key thing to realise about android wear watches currently is that, it has only very limited functionality and it's main purpose is to display notifications on your watch, so you can see them without looking at your phone.

Any android app that has no particular android wear support will basically just show notifications on your watch.



If you swipe it, you will just get an option to open it in your phone.



Even apps with official android wear support and sit in your watch but managed via the android wear app sitting on your phone, often isn't very useful. The main issue is that the watch is too small for a keyboard (there are some apps that include a keyboard like a browser android wear app but it's unusable mostly), so if you get say and email or a text via Gmail or Hangouts, there is limited functionality.





Typically you can either choose a canned reply or use voice input.




So if you a big user of Google Now, and of giving commands by voice, the android wear smartphones are made for you. \

If on the other hand you are like me and find it's odd to give commands via voice (of if you find your voice tends to throw off the voice to text recognition software), you will find Google android wear watches lose much of its point.

To be fair, there are some nice things about having a smart watch.

If you are a exercise fan, android wear would be useful with Google Fit, but with Moto it's confusing because you get the various Moto apps.

I've been using Wear Mini Launcher to make the watchface somewhat Apple watch like, a swipe in of the clockface from the right and I see my favourite apps.



As you can see I do Foursquare check-ins with it


This can be useful if you want to check-in quickly before returning to a conversation with a friend. A drawback of doing it this way compared to using your phone is that often the choices listed to check-in do not include where you are. If that happens you have no choice but to use your phone.

Useful is also the ability to quickly check Google Keep or your Calender without opening your phone.



It is also no surprise to realise that the more you are in Google's ecosystem the more you benefit from using android wear watches. So if you are like me and use Google reminders, Google Keep, Google calender etc (especially now that google reminders from keep, inbox, google now etc are all unified and displayed in Google calender) , Google now notifications can be amazing.

Other useful features are like using your watch as a torchlight and calling for rides or using maps.

When I first got my smart watch as per my normal fashion I download and played with many geeky and gimmicky features (e.g controlling camera shots using the moto360, playing bubble shooter on the watch) but in the end they were just gimmicks.

The funny thing is, the best thing I like about the Moto360 is the most mundane thing ever, that is the watch face!

I knew I wanted a round smartwatch and while Moto 360 gets complaints about the ugly "flat tire" at the bottom of the screen (see below), I barely notice it.


In exchange the moto360 gets a light ambient sensor so it's brightness varies based on amount of light. I've found it more than adequate even outdoors in sunlight.

My main complaint about moto360 watch face is that to save power, it will turn itself off after a few seconds. There's a ambient mode, but in v1 not only is it power hungry , it doesn't keep the watch face "always on". You typically in both modes will have to flick your wrist for the screen to turn on, and there's a slight delay that can be irritating if all you want is to look at the time.

New generation of watches like LG Urban I think solve this issue and the watchface can generally be turned on all the time, with reasonable battery life (recharge once a night).

When I first got the watch, I found vibrations were strong and obvious, these days I barely notice them. It's unclear if this is a software or hardware issue but it seems common.

I really have no other complaints about the physical quality of the moto 360 , I have zero experience with watches but to my eyes it looks pretty elegent and the best thing is if you are bored with the clockface, you can switch it easily via the watch or via an app on your phone.


There's a growing community around hobbyist producing and sharing free watch faces, so you will never be bored with the same old clockface! Some can be classic watchfaces others can be dynamic. I personally favour classic ones.

Overall though, if you want to get into android smartwatches it pays to temper your expectations.

While some of the flaws of the moto 360 are down to inferior hardware that the new generation of watches like Moto 360 v2 , Huawei watch or LG Urbane solve, currently android wear OS itself is limited in functionality. For example currently even if your smart watch has a speaker it cannot be used but this is changing.

If you are getting one, do not expect a replacement for your phone, or for the phone to make a major change to your difestyle and you will be fine.

In the long term, I expect smart watches will start to catch on perhaps in 2016 or 2017, but somehow I can't shake the feeling even if it does, it will be just a temporary stage between more generic "wear" software embedded in clothing etc.


Conclusion

This has been a year of change for me, as I moved to a new institution in late Feb 2015 and I spent most of the year trying to learn and adapt to the new environment.

This probably deserves a post of its own (maybe in Feb 2016 when I complete a year of service), but the experience has been very interesting and not without its challenges as it has been a strange blend of "Seen it, done it" and the opposite "this is almost a 360 degree mirror image of what I am used to" feelings.

I always had a bit of a impostor syndrome and starting anew at a new institution obviously worsened it. But with time this has lessened as I start to get a grip on the situation.

I am focusing now on library analytics, a relatively new unexplored area and I am looking at early explorers like Libwebrarian with great interest and trying to sort out my thinking in this area.

As a result my blogging output as suffered somewhat though I still try to maintain one post a month , while maintaining quality. I anticipate the blogging rate should remain the same or even rise in 2016.

In this day and age, interest in blogs has waned and not many librarian blogs are still operating. As such for those of you who still subscribe to my blog whether via email or RSS, thank you for following me on my journey in librarianship.

Have a happy new year and I will see you in 2016!





Monday, December 7, 2015

Measuring the value of contributed metadata in web scale discovery services

One of the more interesting issues around the rise of web scale discovery service systems like Summon, Primo, Ebsco Discovery Service and Worldcat Local is the place of abstracting and indexing (A&I) databases like Scopus, Web of Science or more disciplinary specific databases like Psycinfo.

While Publishers of full text like Sage, Springer, IEEE eventually realized it was to their benefit to contribute metadata to the index of web scale discovery services because it increased the find-ability of their full text to users on discovery services (IEEE going so far as to study obstacles to getting their content indexed in discovery services) and hence increased demand for their content, it was less clear why Abstract and indexing (A&I) databases should contribute their metadata to the discovery index.

 So for example let's say a user searches in a discovery service like Primo and finds the following record.



As you can see above this record is contributed by the A&I database Web of Science.

The user then clicks on View Online to see where to get the full text.



As seen above, the user can click on either targets/destinations of Proquest or DOAJ to get access to the full text via either two full text sources on those sites. (The links are generated using an Openurl resolver)

A&I services are left out in the cold


Let's recap the transaction.

The user is happy because he gets access to items he would have otherwise missed. Similarly the discovery service (Primo's Exlibris soon to be under Proquest) gains from making more items discoverable.

The actual content provider of the item (in the above case Proquest or DOAJ) is happy too, his content gets discovered and usage of his content will go up and be recorded.

The only one left out from this happy transaction is the A&I database vendor - Web of Science. As the user never actually goes into the A&I database, he may not even realize he just benefited from the library's subscription of the A&I database.

Usage of the A&I may in fact fall, as some libraries have reported, particularly if they are aware or dimly grasp that the same records in A&I database can be found in the discovery service.

This is an issue that is well recognized by NISO's  Open Discovery Initiative (ODI). Of course, most A&I databases require that the library be a mutual subscriber of most the A&I database and the discovery service before you can benefit from the metadata, so if the library values the metadata provided by the A&I, A&I databases will continue to be subscribed.

But here lies the rub, how do you know the metadata from the A&I database is making the difference in helping discovery? Particularly since many full-text providers are also giving away the metadata. Sure the A&I may have more or better metadata but how do you know it is making the difference?

Measuring the value of metadata/records contributed 


Up to recently, I wasn't aware of anyway to measure the value of the metadata contributed by a source (A&I, Publisher, Aggregator etc). However while playing around with Exlibris' Alma and Primo analytics, and lurking on the mailing list I noticed a interesting email by a UNSW librarian regarding the "Link resolver usage subject area" in Alma analytics.

Here's part of the message

"If the source has a colon in it, a user either was a staff member  testing the link within Alma, or got access to an article from within a database by being referred back to the uresolver to see if you have a subscription that covers it."

The first part is fairly straight forward so you will see sources listed such as

EBSCO:Business Source Complete  - 220 requests

ProQ:ABI/INFORM Global - 110 requests

info:sid/www.isinet.com:WoK:WOS - 55 requests

Elsevier:Scopus - 20 requests


Here we are talking about link resolver requests (typically branded Findit@xlibrary) from these databases. In the above example, we have link resolver requests from Business Source Complete, Proquest ABI/Inform Global , Web of Science on Web of Knowledge Platform and Scopus.



So the above shows users searching in Scopus and when they click on Find it @ SMU Library, the clicks will be recorded as source Elsevier:Scopus


This is pretty much standard affair if you are familiar with link resolvers.

The part that left me quite excited was this

"If the source has an underscore or is just some letters eg “wj” (Wiley journals) then the user got access to the article from a PCI record in Primo". Note : PCI = Primo Central Index, the name of the discovery service index.

If I read this correctly it means not only can we see link resolver requests from databases and the discovery service, we can actually see which source contributed the record that appeared in the Primo discovery service! 



So in the above statistics we can see that there were 4,666 clicks on records in the discovery service Primo with metadata from Scopus (scopus).  Similarly we can see 9,362 clicks on records in Primo with metadata from Wiley (wj) and from Web of Science (wos) 11,268.




So going back to the above example, when a user clicks on a record that originated in Web of Science but found in Primo, the click will be recorded as from the source "wos".


So it seems at least with Primo, we can now measure the value of the metadata provided by different sources!

In fact, in my preliminary tests for my institution, when counting clicks on records in the Primo discovery service, Web of Science as a source of records/metadata came in 3th compared to other sources. So it's pretty important.

Dealing with multiple versions of the same article/item


Discovery services of course often get more than one version of the same item from various sources. For any article, they may get metadata from Aggregators, Full text providers, A&I or other sources or on the same article.

How Primo handles it is that it will attempt to match and group records it "thinks" is the same article and display only one in the search results initially (Anyone know if this can be adjusted? or is covered in The Primo Technical Manuals?).  In the example below, the record from JSTOR is the main record showing.




I haven't tested this, but I assume if you click on "view online" without clicking on "view all versions", only one source (the record that is displayed that comes from that source in the above case JSTOR) will be credited.


Of course, you can click on "View all versions" to see other versions. This is very similar to how Google Scholar works.



Of course each of these records while they are very similar, do differ in small ways as they are from different sources. In my example, the records from MEDLINE/PubMed have slightly better subject headings and if I try to search with these subject headings, it is that record from MEDLINE/PubMED that appears as the main record in the search result as it no longer matches the record from JSTOR.

So far this makes a lot of sense, though there might be some squabbling over which source to "credit" discovery to if the search query matches more than one possible record.

Eg. If I do a search for a title combined with a subject search and records from two possible sources are matched should I credit discovery to both equally?

Grouping multiple records vs Single super record approach



The problem is that some discovery systems like Proquest's Summon practices a "Super record" approach.

"The index includes very rich metadata brought together from multiple sources using an innovative match-and-merge function. Match-and-merge allows Summon to take the best data from these sources and bring it together to complete a rich “super record.” - source

While this sounds like what Primo is doing it's actually quite different. In Primo while the system groups different versions of the same article, each version record is still retained seperately as you can see from clicking on "view all versions".

In Summon what happens is that if multiple versions of the same article is available from multiple sources, a "Match-and-merge" function will try to build a single merged/deduped "super record".

The super-record might include

a) Title/author/author supplied keywords from the publisher A and aggregator B
b) Subject headings from multiple A&I databases eg. Scopus and Pubmed
c) Table of contents from aggregator C

and so on.

I can see the attraction of such an approach, and from a user point of view it's probably cleaner as the user doesn't care which source contributes to the item getting discovered, so all he wants to see is one "super-record" with all the combined available data on it.

See for example the same article record in Summon below



Above you can see just a single record and the sources used to create the "super record" are listed. Under subjects you can see the combined entries drawn from various sources.

Because it's a single super record, you also increase chances of discovery. So for example if the person happens to be searching for the following three together in a advanced search

a) Title
b) Subject from Source A
c) eISSN from Source B

It will match Summon's super record but not any of Primo's individual records because no single record has all 3 items.

But a combined super record presumably means that it's going to be harder to do the same as what Primo did. Since it's all one record, when an article is found in Summon, how do you know which source contributed to the discovery?

Of course it's not impossible that Summon could still retain individual source records similar to Primo and use that to give credit to sources for aiding discovery.....


Conclusion

I'll end here with a statement from a ODI paper.

"A&I services have a complex set of issues relative to the ecosystem of index-based discovery. The producers of these services naturally have an interest in preserving their value, especially in being assured that libraries will continue to maintain their subscriptions should they contribute to discovery services.

Decisions regarding whether to participate in discovery services are not straightforward. Discovery
services not tuned to make use of the specialized vocabularies, abstracts, and other mechanisms inherent in the native A&I product may underexpose the resource. Aggregators that license A&I content and fulltext resources from other providers may not have the rights to further distribute that content. 

Discovery services must limit access to proprietary content, such as abstracts and specialized vocabularies to authenticated users affiliated with mutually subscribing institutions. Given these factors, among many others, A&I resources must be treated with special consideration within the body of content providers that potentially contribute to discovery services."

The ability to credit discovery to particular sources as found in Alma analytics goes some way in helping encourage more A&I services to contribute content to discovery services as their value can now be tracked.

Still this development does not completely solve all the concerns of A&I services.

For example, there is concern that relevancy algorithms in some discovery services may systematically under-privilege content contributed by A&Is (for example by weighting full text more than subject headings), leading to a devaluing of their content. See for example the exchange of letters between EBSCO, Ex Libris and Orbis Cascade Alliance.

In fact, the ability to track contributions to discovery from sources could backfire and lead libraries to undervalue A&I sources , now that they can finally see the impact of metadata contributed by them.

This is a hard nut to track. While one could come up with metrics that measure % of top X results that contain A&I sources (e.g the latest Primo Analytics provide something along that lines for results from Primo/Primo Central/Ebsco API) , it's still not possible to agree on what % should be reasonable as there is no gold standard for relevancy algothrims to compare against.














Sunday, November 15, 2015

Libraries and Trello - How are librarians using it?

I am always trying out new tools to organize and improve my work and my current workflow involves the use of Google Keep , Dropbox and Google Now reminders (having moved away from Google tasks).

But I haven't had much experience with project management tools, but recently I began playing again with the light weight tool Trello.

Formally it is a electronic kanban tool. But if you are are not familiar with the concept you can see it as a digital version of post-it notice boards and it is very helpful for collaboration purposes.

https://trello.com/b/PB9Cr94M/student-research-project-board

The idea is pretty simple. You setup boards (corresponds to projects) which has lists. Lists (the vertical columns above) are broken up into individual cards. Each card typically corresponds to a task and you can assign people to each card and their photos will appear on the card.

You can further customize by adding colored labels to each card. Cards can have checklists, comments and can be populated via email. Lastly you can attach files via Dropbox, Google drive, One drive etc.

If you are into GTD (getting it done) methodology , a common idea is to have lists for "Doing", "Done", "To do" etc and dragging each card/task to each list as needed.

Like most productivity tools, Lifehack has a great guide on how to use it on a personal basis, from planning a vacation, setting up a family status board , planning a wedding, or pretty much anything you can think of.

In the past two years, Trello has become very popular in libraries. There are many reasons but the main one is that it's free with almost no limitations. There is no ads, no restrictions on number of boards you can create or members you can add etc.

Here for some of the ways they have been used for tracking workflows and/or project management


  • Package management & resource vendor negotiation
  • Electronic resource troubleshooting
  • Web site redesign project
  • Strategic planning, department planning
  • Marketing campaigns
  • Information literacy classes + faculty Liason work



For package management & resource negotiation 


I think given the nature of the tool, it's no surprise the technical service departments in libraries use it quite a bit.

Both NCSU and Duke University are examples of this and they recently held a webinar to talk about how they use it in technical services work.



I particularly like their package management board (See below). They color code cards based on publisher (e.g Sage, Elsevier) and you can then filter by cards to see for example cards to do with Wiley.



Another nice board is the board they setup for the license team for negotiating resources. There are as many as seven members on the team and the process for negotiating can get confusing.


There is clever use of checklists for negotiation that are all copied from a master template to help track the process.



For more see the article - Who's on First?: License Team Workflow Tracking With Trello


For Electronic Resources Troubleshooting


At Oakland University, Meghan Finch combines Trello with Zapier to organize tracking of requests involving electronic resources troubleshooting.

In the paper entitled "Using Zapier with Trello for Electronic Resources Troubleshooting Workflow", she explains that her board consists of the following lists

  • To Do
  • Tier II
  • Waiting
  • Completed
  • Get Done
  • How To
  • Honey Badger Tips

She has her own workflow setup on how to drag the cards from each list.

The issue here is how does she handle troubleshooting reports submitted by users? For sure she doesn't want to manually create cards for them in Trello.

She solved it by using a combination of Zapier and Trello's build-in feature to create cards based on emails.

In her library, her link resolver - 360link has a link to a simple online form for users to submit reports of problems with eresources.

The form once submitted sends an email to their e-resources mailing list.

She uses Zapier to automate the process of pulling out the data from the email sent to the e-resources mailing list and then sends another email to Trello to popular the Trello board which is all automated using Zapier.

If you are not familar with Zapier, it's similar to IFTTT , which allows you to automatic workflows between a large number of apps by creating trigger and actions that happen when the trigger occurs. (See my past posts on Zapier and IFTTT)



For website redesign projects


Amanda L. Goodman User Experience Librarian, Darien Library uses it for getting track of tasks for website design (among other things).

http://theproductivelibrarian.com/2015/02/20/using-trello-to-get-things-done/



https://trello.com/b/ebX9XX7E/university-of-illinois-library-prototyping-service-2015



For Strategic planning or department planning


Megan Hartline writes "Transparency is one of the more challenging aspects of leadership. Letting people in your group and across your organization know what you’re doing, what your priorities are, and what projects are up next takes a huge amount of conscious communication." 

She then suggests Trello as a way to visualize and draw attention to the major projects each library department/unit or even the whole library is focusing on. 

The idea of a "At the glance" view of the major projects going on in a department or even over the whole library is pretty common in fact.

Below is NCSU's Serials Unit Projects board



Here's University of Minnesota Trello Board






Librarians doing IL and faculty engagement

This book briefly suggests that teaching librarians "create a course board for all course projects and assign students to different groups". It goes on to say that because all actions on each card is recorded, students can see the contributions by their fellow classmates.

More concretely Robert Heaton of Utah State University suggests that Trello can be used to keep track of work done by prior Subject Liaisons, so that the new librarian can benefit from a Trello board filled with information such as Faculty CVs, prior relationships and more.



 http://digitalcommons.usu.edu/cgi/viewcontent.cgi?article=1066&context=lib_present


For tracking of marketing campaigns

Champaign Public Library of Evelyn C. Shapiro writes

"I ran some experiments through the fall and winter and went full-on with this strategy for the spring and summer seasons. Now I have a visual "board" with every event and promotion, including a place to store—and serve up—all my content organized by event or promotion, with a separate "card" for each event. It's set up by season, with columns organized by month. Each card includes:

  • the marketing copy we're using in getting the word out
  • approved images (separate ones for website, lobby slide, e-news, Facebook, plus extras provided by presenters)
  • associated URLs (event bit.lys, related videos, subject-specific or book-specific "deep" links into our Polaris catalog, presenter websites)
  • collected notes from the presenter or the in-house staff sponsor of the event
  • any special acknowledgments that need to be included in promotions"




http://www.academiclibrarymarketing.com/blog/cool-tools-using-trello-to-manage-marketing-resources-processes


Conclusion

Regardless of the type of librarian job we do, we are consistently doing projects that potentially involve large number of collaborators. As such Trello seems to be a useful tool that can be used in many situations.

If you have been using Trello, how have you been using it? Is it easy to get buyin to use the tool?


Friday, October 23, 2015

6 common misconceptions when doing advanced Google Searching

As librarians we are often called upon to teach not just library databases but also Google and Google Scholar.

Unlike teaching other search tools, teaching Google is often tricky because unlike library databases where we can have insider access through our friendly product support representative as librarians we have no more or no less insight into Google which  is legendary for being secretive.

Still, given that Google has become synonymous with search we should be decently good at teaching it.

I've noticed though, often when people teach Google, particularly advanced searching of Google, they fall prey to 2 main types of errors.

The first type of error involved not keeping up to date and given the rapid speed that Google changes, we often end up teaching things that no longer work.

The second type of error is perhaps more common to us librarians. We often carry over the usual methods and assumptions from Library databases expecting them to work in Google when sadly they don't.

It is very difficult to detect both types of errors because Google seems to be designed to fail gracefully, for example it may simply silently ignore symbols you add that don't work.

Also the typical Google search brings back estimated count of results. e.g. "about" X million so it's hard to see if your search worked as expected.

As I write this blog post in Oct 2015, what follows is some of the common errors and misconceptions I've seen about searching in Google while doing research on the topic. Some of the misconceptions I knew about, a few surprised me. Of course by the time you read this post,  a lot is likely to be obsolete!

The 6 are

  • Using depreciated operators like  tilde (~) and plus (+) in search strings
  • Believing that all terms in the search string will definitely be included (in some form)
  • Using AND in search strings works
  • Using NOT in search strings works
  • Using asterisk (*) as a character wildcard or truncation  in search strings works
  • 6. Using parenthesis (  (    ) ) in search strings to control order of operators works



1. Using depreciated operators like  tilde (~) and plus (+) in search strings


As of writing these are the list of operators supported by Google, anything else is probably not supported, so if you are teaching people to use tilde (~) , or plus operator (+) please stop.

About tilde (~)


Karen Blakeman explains here what it used to do.

"Although Google automatically looks for variations on your terms, placing a tilde before a word seemed to look for more variations and related terms. It meant that you didn’t have to think of all the possible permutations of a word. It was also very useful if you wanted Google to run your search exactly as you had typed it in except for one or two words.

The Verbatim option tells Google to run your search without dropping terms or looking for synonyms, but sometimes you might want variations on just one of the words. That was easily fixed by placing a tilde before the word"

However as of June 2013 tilde (~) no longer works. (See official explanation).

About plus operator (+)


Another discontinued operator often still taught is the plus (+) Operator.

The plus operator used to force Google to match against the exact search term as you typed them. In other words,  "It turned off synonymization and spell-correction".  So for example if you searched +library , it would match library exactly and wouldn't substitute it for libraries or librarians for example.

However as of Oct 2011, it no longer works. (See official explanation)

According to Google help page, the plus operator is now used for Google+ pages or Blood types! (It generally can see the plus at the end eg C++ etc.)

If you wanted to force exact keywords you should add quotes around even single words. Eg. "library"

Of course we librarians know double quotes also have another purpose, they force words to be in an exact phrase say "library systems" . This works in Google as per normal.

Interesting enough in the latest Google Power Searching course (September 2015), Daniel Russell, mentions that you can do quotes within quotes to combine phrase searching with exact search around a single word.

For example he recommends searching "daniel "russell" " (note the nested quotes) because "daniel russell" alone gets him results with Daniel Russel (note only one 'L')




Another option if you want as near to as possible to what you typed in is to use the verbatim mode (which is kind of like + operator but for everything typed) 


       

As noted in the video above, even in that mode, the order of operations is not enforced, so you should use double quotes on top of verbatim mode for further control.

I believe even verbatim mode or using quotes around single words doesn't absolutely stop Google from occasionally "helping" by dropping search terms if including those search terms causes too many results to disappear - sometimes called  "Soft AND", more about that next.


2. Believing that all terms in the search string will definitely be included (in some form)


I've mentioned this before in the past, but Google practices what some call a "Soft AND", it will usually include all terms searched but occasionally one of the search terms will be dropped.




In the above Power Searching Video, Daniel explains that when you search for term1 term2 term3 you might find some pages with only term1 term2 but not term3. He states that some pages rank so highly on just term1 and term2 that Google will drop term3.

What's the solution? He recommends doing the intext operator. So for example term1 term2 intext:term3 , where the intext operator will force term3 to be on the page.

Note you can do phrase search together with intext as well, eg. intext:"library technology"


3.  Using AND in search strings

Believe it or not Google does not explicitly support the AND string in search.

For example neither the official google help or the official Google power searching course mention the AND operator!

Let me be clear, of course if you do something like library systems  , Google will do an implicit AND and combine the terms together (subject to the issue stated above).

But what I am saying is you shouldn't type something like library AND systems (whether AND, and, AnD, aNd etc) because at best it is ignored because it is too common (a stop word), though occasionally it may actually just search and match the word AND like a normal term!

To avoid such issues just drop the AND and do library systems

As an aside, OR works as per normal, and the power searching course states it's the only case sensitive operator.

4. Using NOT in search strings

Many of us Librarians are too used to literally typing NOT to exclude results. So for example we will automatically do libraries NOT systems ,not knowing this fails.

What you should do of course to exclude terms is to use the minus (-) operators. For example, try libraries -systems


5. Using asterisk (*) as a character wildcard or truncation in search strings

Another thing that doesn't work is that you can't find variant words of a search term by using * behind a string of letters.

For example the following doesn't work , organ* 

I believe Google automatically decides on stemming already so you don't need to do this to find words with the root of organ.

What works is something entirely different like this

a * saved is a * earned

The official guide says * is used as "a placeholder for any unknown or wildcard terms" , so you can match things like a penny saved is a penny earned where * can stand for 1 or more words.

But see tip 7 for interaction with site operator. 



6. Using parenthesis (  (    ) ) in search strings to control order of operators


This one is perhaps most shocking if you are unaware. When we combine AND with OR operators, a common question to ponder is, which operator has precedence?

My testing with various library databases shows that there is no one standard, some databases favour OR first others favour AND .

So it is a favourite trick of librarians to just cut through the complication and just use parenthesis to avoid having to memorise how it works in different databases.

So we love to do things like

(library AND technology) OR systems

First off we already said in #2 you shouldn't use AND in the search so let's try

(library technology) OR systems

But I am sorry to inform you that doesn't work too. In fact, the parenthesis is ignored , actually what Google sees is

library technology OR systems


Don't believe me? See here, here and here.

On Quora , a Google software engineer (search quality) says this


So what happens when you do something like library technology OR systems ?
In fact it's the equalvant of a library database search with library AND (technology OR systems)



It looks to me that OR has precedence which makes more sense to me than the other way around.

So what happens if you want (a b) OR (x y) ? Typing that out won't work in Google since it actually gives you a AND (b OR x) AND Y, but here's a complicated untested idea.

7. Bonus tips


Around operator

There is a semi-official operator known as the Around function. It allows you to match words that are within X words. This seems to be the same to a proximity operator without order.



So for example you can do

"library technology" AROUND(9) "social"


As noted by Dan Russell , AROUND needs to be in caps. For more details.


Combining asterisks with site operator

I guess everyone knows about the useful site: function . But did you know it works with wildcards as spotted here?





There's a lot more detail here that I recommend you read for interaction between wildcards and site operators. Combine it with the minus (-) operator for more fun!



Conclusion

As you can see while Google does generally support Boolean searching loosely (though it often does unexpected things like drop terms and may or may not include common words searched), the exact details are very different!

If you want to know more into the nuts and bolts of boolean operators in Google, I highly recommend




Thursday, October 15, 2015

Of full text , thick metadata , and discovery search

As my institution recently switched to Primo, nowadays I lurk in the Primo mailing list. I am amused to note that in many ways the conversation on it is very similar to what I experienced when lurking in the Summon mailing list. (One wonders if in time to come this difference might become moot but I digress).

Why don't the number of results make sense?


A common thread that occurs on such mailing lists from time to time and that often draws tons of responses is a game I call "Do the number of results make sense?".

Typically this would begin with some librarian or (technical person tasked to support librarians) bemoaning the fact that they (or their librarians) find that the number of results shown are not "logical".

For example someone would post a email with a subject like "Results doesn't make sense". The email would look like this (examples are made up).

a) Happy birthday    4,894
b) Happy birth*    3,623                                      
c) Happy holidays  20,591
d) Happy holid*    8,455
e) Happy OR birthday 4,323                                    

The email would then point out that it made no sense that number of results in b) and d) were lower than in a) and c) respectively. Or that e) Should have more results than a).

Other variants would include using quotes, or finding that after login (which usually produces more results due to results appearing from mutually licensed content providers) the number of results actually fell etc.

The "reason" often emerges that the web scale discovery service whether Summon Or Primo is doing something "clever" that isn't transparent to the user that results in a search that isn't strictly boolean logic.

In the past, I've seen cases such as

* Summon doing stemming by default but dropping it when boolean operators was used (might have changed now)
* Primo doing metadata search only by default but expanding to matching full text if the number of results dropping below a certain number.

I've discussed in the past How is Google different from traditional Library OPACs & databases?  and in this way web scale discovery services are somewhat similar to Google in that they don't do strict boolean and can do various adjustments to try to "help the user" at the cost of predictability and often transparency if the user wasn't given warning.

Matching full text or not?

In the most recent case I encountered in the Primo mailing list, it was announced there would be a enhancement to add a displayed message indicating that the search was expanded to match full text.

This lead to a discussion on why Primo couldn't simply match on full text all the time, or at least provide a option to do either like how EBSCO Discovery Services does.


MIT Libraries's Ebsco Discovery services searches in full text by default but you can turn it off.


An argument often made is that metadata match only, improves relevancy , in particular known item searching which makes up generally about 40-60% of searches.

For sure this makes relevancy ranking much easier since not bothering to consider matches in full text means the balancing act between ranking matches in full text vs metadata can be avoided.

In addition, unlike Google or Google Scholar, the discovery service index is extremely diverse including some content that is available in metadata only formats while others includes full text or are non text items (eg DVDs, videos).

Even if the items contain full text, they range from length in terms of a single page or paragraph to thousands of pages (for a book).

Not needing to consider this difference makes relevancy ranking much easier.


Metadata thick vs thin

Still a metadata match only approach ignores potentially useful information for full text and it's still not equally "fair", because content with "Thick metadata" still has a advantage over "Thin metadata".

I am not familiar with either term until Ebsco began to talk about it. See abstract below.


https://www.ebscohost.com/discovery/content/indexing


Of course "other discovery services" here refer mainly to Proquest's Summon (and Exlibris's Primo), which has roughly the same articles in the index but because they obtain the metadata directly from the publisher have limited metadata basically , article title, author, author supplied keywords etc.

While thick metadata would generally have controlled vocabulary, table of contents etc


The 4 types of content in a discovery index


So when we think about it, we can classify content in a discovery service index along 2 dimensions

a) Full text vs Non-full text
b) Thick metadata vs Thin metadata


Some examples of the type of content in the 4 quadrants

A) Thick Metadata, No Full text - eg. Abstracting & Indexing (A&I) databases like Scopus, Web of Science, APA Psycinfo etc, MARC records

B) Thick Metadata, Full text - eg. Ebsco databases in Ebsco Discovery Service, combined super-records in Summon that include metadata from A&I databases like Scopus and full text from publishers

C) Thin metadata, No Full text - eg Publisher provided metadata with no full text, Online video collections, Institutional repository records?

D) Thin metadata, Full text - eg Many publisher provided content to Summon/Primo etc.


What are the different ways the discovery service could do ranking?


Type I - Use metadata only - Primo approach (does expand to full text match if number of results falls below a threshold)

Type II - Use metadata and full text - Summon approach

Type III - Use full text mostly plus limited metadata - Google Scholar approach?

Type IV - User selects either Type I or II as an option - Ebsco Discovery Service approach


The Primo approach of mainly using metadata (and occasionally matching full text only if number of results are below a certain threshold) as I said privileges content that has thick metadata (Class A and B) over thin metadata (Class C and D) but is neutral with regards on whether full text is provided.

Still compare this with a approach like Summon that uses both metadata and full text. Here full text becomes important regardless of whether you have thin metadata or thick metadata it helps to have full text as well.

All things equal would a record that has thick metadata but no full text (Class A) rank higher than one that has thin metadata but has full text? (Class D).

It's hard to say depending on the algorithm used to weight full text vs metadata fields,I could see it going either way. Depends on the way things are weighted I can see it going either way.

My own past experience with Summon seem to show that there are times where full text matches seem to dominate metadata. For example searching for Singapore AND a topic, can sometimes yield me plenty of generic books on Singapore that barely mention the topic over more specific items. I always attributed it to the overwhelming match of the word "Singapore" in such items.

The fear that the mass of full text overrides metadata is the reason why some A&I providers are generally reluctant to be included their content in discovery services. This is worsened by the fact that currently there is no way to measure the additional benefit A&I's bring to the discovery experience, as their metadata once contributed will appear alongside other lower quality metatdata in the discovery service results.

If by chance the library has access to full-text via Open URL resolution, users will just be sent to the full text provider while the metadata contributed by the A&I database that contributed to the discovery of the item in the first place is not recognised and the A&I is bypassed. This is one of the points acknowledged in the Open Discovery Initative reports and may be addressed in the future.

In fact implementation of discovery services can indeed lead to a fall in usage of A&I databases in their native interfaces as most users no longer need to go directly to the native UI. Add the threat from Google Scholar, you can understand why A&I providers are so wary.

I would add that this fear that discovery services (except for Ebsco which already host content from A&Is like APA's PsychInfo) will not properly rank metadata from A&Is is not a theoretical one.

Ebsco in the famous exchange between Orbis Cascade alliance and Exlibris,  claims that

As you are likely aware, leading subject indexes such as PsycINFO, CAB Abstracts, Inspec, Proquest indexes, RILM Abstracts of Music Literature, and the overwhelming majority of others, do not provide their metadata for inclusion in Primo Central. Similarly, though we offer most of these databases via EBSCOhost, we do not have the rights to provide their metadata to Ex Libris. Our understanding is that these providers are concerned that the relevancy ranking algorithm in Primo Central does not take advantage of the value added elements of their products and thus would result in lower usage of their databases and a diminished user experience for researchers. They are also concerned that, if end users are led to believe that their database is available via Primo Central, they won't search the database directly and thus the database use will diminish even further.

Interestingly, Ebsco discovery service itself splits the difference between Primo and Summon and allows librarians to set the default of whether to include matching in full text or metadata only but allows users to override the default.

From my understanding default metadata only search in EDS libraries is pretty popular because many librarians feel metadata only searching provides more relevant results.

I find this curious because EBSCO is on record for stating that their relevancy ranking places the highest priority on their subject headings rather than title, as they are justly proud of the subject headings they have.

One could speculate EBSCO of all discovery services would weigh metadata more than full text, but librarians still feel relevancy can be improved by ignoring full text!


Content Neutrality?

With the merger of Proquest and Exlibris , we are now down to one "content neutral" discovery service.

One of the fears I've often heard is that librarians fear Ebscohost would "push up" their own content in their discovery service and to some extent people fear the same might occur in Summon (and now Exlibris) for Proquest items.

Personally, I am skeptical of this view (though I wouldn't be surprised if I am wrong either).  but I do note that for discovery vendors that are not content neutral, it's natural that their own content will have at the very least full text if not thick metadata while other content from other sources is likely to have poor quality metadata and possibly no full text unless efforts are taken to obtain them.

This itself would lead to their own content floating to the top even without any other evil doing.

To be frank, I don't see a way to "Equalize" everything , unless one ignores full text and also only ranks on a very limited set of thin metadata that every content has.


Ignoring metadata and going full text mostly?

Lastly while there are discovery services that rank based on metadata but ignore full text, it's possible but strange to think of a Type of search that is the exact opposite.

Basically such a system ranks only or mostly on full text and not on metadata (whether thick or thin)

The closest analogy I can think of for this is Google or Google Scholar.

All in all, Google Scholar I guess is a mix of mostly full text and thin metadata so this helps make relevancy ranking easier since we are ranking across similar types of content.

Somehow though Google Scholar still manages to do okay.... though as I mentioned before in
5 things Google Scholar does better than your library discovery service has a big advantage as

"Google Scholar serves one particular use case very well - the need to locate recent articles and to provide a comprehensive search." compared to the various roles library discovery services are expected to play including known item search of non-article material.

Conclusion

Honestly, the idea that libraries would want to throw well available data such as full text to achieve better relevancy ranking is a very odd one to me. 

That said we librarians also carefully curate the collections that are searchable in our discovery index rather than just adding everything available or free , so this idea of not using everything is not a new concept I guess.


Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...