Friday, November 12, 2010

Using Blekko to search across thousands of library sites

Say you want to find out where libraries are placing their social media buttons on the portal. Is it at the bottom of the page? Top right? Elsewhere? How do you find out besides polling users? Can Google help?

Maybe I need to work on my Google-fu but the best I can do is site:Edu facebook library which isn't very good since it pulls out Facebook pages themselves or tips on how to handle Facebook and also excludes non-US sites.

This search is a lot better don't you agree? 

How about another search? This time I was tasked with revamping the library help pages. Browsing a few of those pages, I noticed most academic libraries had a "How do I find book" section and I wanted to find more. 

Again this search gives you a nice result page .

In this case of course, a google search of how do i find a book site:Edu gives you reasonable results  until you want to do it for public libraries AND academic libraries.

Of course, many of you recognise that the search above uses Blekko, which allows you to create "Slashtags" which are listings of curated sites. You then run the search keyword over those sites/slashtags.

If you are already familiar with it and want to try it out right now go to Blekko and do the following search

You will need to enter /aarontay/library to indicate that you are using a slashtag created by me.

Other slashtags you can use created by me include

aarontay/academiclibrary - academic libraries in USA (2,224 sites)
aarontay/publiclibrary - public libraries in USA (2,969 sites)
aarontay/africamiddleeastlibrary - libraries in Africa/Middle-east (116 sites)
aarontay/asialibrary - libraries in Asia (345 sites)
aarontay/australialibrary - libraries in Australia (160 sites)
aarontay/canadalibrary - libraries in Canada (314 sites)
aarontay/southamericalibrary  - libraries in South America (157 sites)
aarontay/europelibrary - libraries in Europe (1,136 sites)

aarontay/library - covers all of the above slashtags

site counts are as of Nov 13.

What follows will describe how I created the slashtags and a little review of Blekko from librarian point of view.


Blekko, the new hot search engine has being touted as yet another Google killer. Blekko comes from a long line of so-called Google killers, including Cuil and Wolfram Alpha.

Blekko is of course no such thing. The main gimmick is that you can create "slashtags" or listings of sites you want to run the search against. 

So for example you can put in say your favorite tech sites into a slashtag /tech, and then do a search humor /tech to search for the word "humor" over those sites.

This of course reminds librarians immediately of Google custom search engines and it's cousins. The main difference is that Blekko allows you to combine slashtags , so you can combine two or more slashtags to further refine the search. So assuming you setup 2 slashtags /humour (list of comic sites) and /tech (list of tech sites) you can do

cat /humour /tech

You can also use slash tags created by other people (if not private) and this is where Blekko can conceivably use crowd sourcing to become sort of a wikipedia of Search.

Librarians are of course no strangers to Custom search engines using it to offer topic searches of valuable resources. One that I use regularly is one by David Oldenkamp that searches over 300 intergovernmental organizations (based on urls).

Using Blekko

One thing I always wanted to do was to create a custom search over all library portal/websites. While others have done so for librarian or library blogs to my knowledge no-one has done this yet for library websites.

Inspired by this guide , I decided to try creating a slashtag/custom search that covers only library sites.

Blekko allow you to bulk import lists of sites using text files, OPML or do a search and pull in the results. But first I needed to find a source for a complete listing of library websites.

I immediately thought of lib-web-cats , but couldn't figure out how to scrape the results efficently. not to mention I wasn't sure about the legality. But in the end, I settled on LibWebs  (see below) maintained by Thomas Dowling which is under creative commons and should allow remixing. Below is an example of one page of academic libraries in USA (Northwest)

I put each webpage of library urls (by country or in the case of US by type) through Link Leacher   

This then yielded text files for each page of urls which I imported into Blekko.

Instead of putting them all into one big slashtag, I divided them into small slashtags

/academiclibrary - academic libraries in USA
/publiclibrary - public libraries in USA
/africamiddleeastlibrary - libraries in Africa/Middle-east
/asialibrary - libraries in Asia
/australialibrary - libraries in Australia
/canadalibrary - libraries in Canada
/southamericalibrary  - libraries in South America
/europelibrary - libraries in Europe

In fact you could download by region for academic libraries & public libraries in USA, and by countries for the non-US libraries, so you could create very granular slashtags such as /northwesternacadmic libraries and then combine them into bigger and higher order slashtags (see later) but I wasn't interested at that level of granularity.

in addition I did

But what happens if you want to search them all at the same time? I created a slashtag /library which includes these other slashtags (except /libraryblog). Yes, you can create slashtags in slashtags which make them really flexible compared to Google custom search engines.

Library slashtag which includes other slashtags

One thing to note is that each user can create their own slashtags. As I'm using my own slashtag I enter the following search in Blekko 

keyword /library 

Anybody else who wants to use my slashtag (which is possible as I didn't make it private), should enter in Blekko instead

keyword aarontay/library 

This is to signify that you are using a slashtag by user aarontay rather than your own /library slashtag.

Blekko vs Google Custom search

Why not use Google Custom search engine? To some extent Google custom search is more powerful than Bleeko as you can specify indexing periods, create synonyms, specify refinements (facets), set up URL patterns, wildcards etc.

One issue with Google custom search is that you can put in a maximum of 5,000 urls. Blekko has the same limit (when I tried there are a secondary error that prevented over than 1,000 urls to be uploaded at one time) but as already mentioned you can circumvent this by putting slashtags in slashtags.

My current /library has indeed  greater than 5,000 urls by combining several slashtags. 

The ability to put slashtags in slashtags is probably Blekko's best feature as you can create very flexible listings by combining various slashtags.

Why not Blekko? 

I notice Blekko does have some issues.  I was wondering "How many libraries promote FourSquare on their webpages?" , so I did

Surprisingly I only got 3 hits. While FourSquare is still strictly a early adopter feature, I'm pretty sure there are more than 3 libraries supporting it.

Doing a normal google search I found that for instance the University of Technology (UTS), Sydney does promote Foursquare on their homepage, yet Blekko fails to surface it. I'm not sure what is wrong since does exist in the slashtag, perhaps the page isn't indexed yet in Blekko?

I remember reading somewhere that Blekko does index every 14 days, so perhaps this problem will disappear eventually when it goes and crawls those pages, but I'm guessing Google custom search engine probably would have less of this problem due to a larger index in the first place? Or am I doing something wrong?

This isn't a complete review of Blekko, see this, this and this , I didn't talk about other Blekko features like the ability to follow slashtags, add editors to collaborate on slashtags, look at detailed SEO (Search Engine Optimization) data, review results from slashtag as RSS and more.

The other thing I didn't compare was whether it was worth while to convert any existing Google custom search engines to Blekko slashtags . I'm not sure if doing so will improve or worsen results.

One issue though that makes Blekko IMHO unsuitable for users is that, while you can embed search boxes for each slashtag (see the button next to the RSS feed button) , I was surprised that all it does is to create a normal blekko searchbox with the slashtag included! Below shows our it looks when you embed the box.

This makes it undesirable for use with our users, who might just remove the slashtags and replace with their own keywords which would give you a normal blekko search instead of the custom one we created.

Do give Blekko a try with my custom slashtags and tell me what you think.
blog comments powered by Disqus

Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...