Monday, July 14, 2014

Why Nested Boolean search statements may not work as well as they did

At library school, I was taught the concept of nested boolean. In particular, I was taught a particular search strategy which goes like this.

  • Think of a research topic
  • Break them up into major concepts - typically 3 or more - eg A, B, C
  • Identify synonyms for each concept (A1,A2, A3 ; B1, B2, B3 ; C1, C2, C3
  • Combine them in the following manner

(A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3)

We like many libraries have created videos on it as well.




If you are a academic librarian who has even taught a bit of information literacy, I am sure this is something you show in classes. You probably jazzed it up by including wildcards (such as teen*) as well.






Databases also encourage this search pattern


I am not sure how old this technique is, but around 2000ish? databases also started to encourage this type of structured search.




Above we see Ebscohost platform and in my institution this "Advanced search" is set to default. You can see a similar UI (whether as default or advanced search) in JSTOR, Engineering Village, Proquest platforms etc.

A lecturer when I was in library school even claimed credit (perhaps jokingly) for encouraging databases into this type of interface.

Recently I noticed a slight variant on this theme where the default search would show only one search box (because "users like the Google one box" according to a webinar I attended), but if you clicked on "add field" or similar you would see a similar interface. Below shows Scopus.




After clicking Add search field, you get the familiar structured/guide search pattern




You see a similar idea in the latest refresh of Web of Science, a default single search box but with a option to expand it to a structured search pattern. Below we see Web of Science with "Add another field" selected twice.


Lastly even Summon 2.0 which generally has a philosophy of keeping things simple got into the act and from what I understand under pressure from librarians finally came up with a advanced search that brought tears of joy to power users. 





But are such search patterns really necessary or useful?


In the first few years of my librarianship career, I taught such searches in classes without thinking much of it. 

It feels so logical, so elegant, it had to be a good thing right? Then I began studying and working on web scale discovery services, and the first doubts began to appear. I also started noticing when I did my own research I rarely even did such structured searches.

I also admit to be influenced by Dave Pattern's tweets and blog posts, but I doubt I will ever be as strongly in the anti-boolean camp.

But I am going to throw caution to the wind and try to be controversial here and say that I believe increasingly such a search pattern of stringing together synonyms of concepts generally does not improve the search results and can even hurt them

There is of course value in doing this exercise of thinking through the concepts and figuring out the correct language used by Scholars in your discipline, but most of the time doing so does not improve the search results much especially if you are simply putting common variants of words eg different variants of say PREVENT or ECONOMIC which is what I see many searches do.

That's because many of the search systems we commonly use increasingly are no longer well adapted to such searches even though they used to be in the past


Our search tools in the past

Think back to the days of the dawn of the library databases. They were characterized by the following

  1. Metadata (including subject terms) + abstract only - did not including full text
  2. Precise searching - what you enter is what you get search
  3. low levels of aggregation - A "large database" would maybe have 1 million items if you were lucky
In such conditions, most searches you ran had very few results. If you were unlucky you would have zero results. 

Why?

Firstly the search matched only over metadata + abstract and not full text. So if you searched for "Youth" and it just happened that the abstract and title the author decided on using "Teenager", you were sunk.

Also this was compounded by the fact that in those days, searches were also very precise. There was no autostemming that automatically covered variants of words (including British vs American spelling), so you had to be careful to include all the variants such as plurals, and other related forms. 

Lastly, It is hard to imagine in the days of Google Scholar with estimated 100 million documents (and Web Scale discovery systems that could potentially match that) but in those days databases were much smaller and fragmented with much smaller indexes and as such the most common result would be zero hits or at best a few dozen hits.






Summon full index (Scholarly filter on) showing about 100 million results



This is why the (A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3) nested boolean technique was critical to ensure you expanded the extremely precise search to increase recall.

Add the fact that search systems like Dialog were charged per search or on time - so it was extremely important to craft the near-perfect search statement in one go to do efficient searching.

I will also pause to note that relevancy ranking of results could be available but when you have so few results that you could reasonably look through say 100 or less, you would just scan all the results, so whether it was ranked by relevancy was moot really.

Today's search environment has changed

Fast forward to today.

Full-text databases are more common. In fact, to many of our users and younger librarians, "databases" would imply full-text databases and they would look in dismay when they realized they were using a abstract and indexing database and wonder why in the world people would use something that might not give them instant gratification of a full text item. I fully understood some old school librarians would consider this definition to be totally backwards but......

Also the fact you are searching full-text rather than just metadata changes a lot. If an article was about TEENAGERS, there is pretty good odds you could find TEENAGER and probably, YOUTH, ADOLESCENCE etc in the full text of the book or article as well, so you probably did not need to add such synonyms to pick them up in the result set anyway.

Moreover as I mentioned before , increasingly databases under the influence of Google are starting to be more "helpful", by autostemming by default and maybe even adding related synonyms, so there was no real need to add variants for color vs colour say or for plural forms anyway.

Even if you did a basic

A AND B AND C -  you would have a reasonable recall, thanks to autostemming, full text matching etc.

All this meant you get a lot of results now even with a basic search.


Effect of full-text searching + relative size of index + related words


Don't believe this change in search tools makes a difference? Let's try the ebscohost discovery service for a complicated boolean search because unlike Summon it makes it easy to isolate the effect of each factor.




MIT's EDS

Let's try this search for finding studies for a systematic review

depression treatment placebo (Antidepressant OR "Monoamine Oxidase Inhibitors"  OR "Selective Serotonin Reuptake Inhibitors" OR "Tricyclic Drugs") ("general  practice" OR "primary care") (randomized OR randomised OR random OR trial)


Option 1 : Apply related words + Searched full text of articles - 51k results

Option 2 : Searched full text of articles ONLY -  50K results

Option 3 : Apply related words ONLY - 606 results

Option 4 : Both off - 594 results 

The effect of apply related keywords is slight in this search example possibly because of the search terms used, but we can see full text matches make a huge difference.

Option 4 would be what you get for "old school databases". In fact, you would get less than 594 results in most databases, because Ebsco Discovery service has a huge index far larger than any such databases.





To check, I did an equivalent search in one of the largest traditional abstracting and indexing database Scopus and I found 163 results (better than you would expect based on the relative sizes of Scopus vs EDS).

But 163 is still manageable if you wanted to scan all results, so relevancy ranking can be poor and it doesn't matter as much really.


Web scale discovery services might give poor results with such searches 

I know many librarians will be saying, doing nested Boolean actually improves their search, and even if it doesn't what's the harm?

First, I am not convinced that people who say nested boolean improves the results of their search have actually done systematic objective comparisons or whether it is based on impression that I did something more complicated so the results must be better. I could be wrong.

But we do know that many librarians and experienced users are saying the more they try to carry out complicated boolean searches the worse the results seem to be in discovery services such as Summon.

Matt Borg of Sheffield Hallam University wrote of his experience implementing Summon.

He found that his colleagues reported "their searches were producing odd and unexpected results."

"My colleagues and I had been using hyper stylised searches, throwing in all the boolean that we could muster. Once I began to move away from the expert approach and treated Summon as I thought our first year undergrads might use it, and spent more time refining my results, then the experience was much more meaningful." - Shoshin

I am going to bet that those "hyper stylised searches" were the nested boolean statements.

Notice that Summon like Google Scholar actually fits all 3 characteristics of a modern search I mentioned above that are least suited for such searches
  • Full text search
  • High levels of aggregation (typical libraries implementing Summon at mid-size universities would have easily 300 million entries)
  • autosteming was on by default - quotes give a boost to results with exact matches.
All this combine to make complicated nested Boolean searches worse I believe.


Poor choices of synonyms and overliberal use of wildcards can make things worse

I will be first to say the proper use of keywords is the key to getting good results. So a list of drugs names combined by an OR function, or a listing of philosophers, concepts etc - association of concepts would possible give good results.

The problem here is that most novice searchers don't have an idea what are the keywords to list in the language of the field, so often because they are told to list keywords they may overstretch and add ones that make things worse.

Say you did

(A1 OR A2 OR A3) AND (B1 OR B2 OR B3) AND (C1 OR C2 OR C3)

Perhaps you added A3, B3, C3 though they aren't exactly what you are looking for but "just in case".

Or perhaps you decided it wouldn't hurt to be more liberal in the use of wildcards which led to matches of words you didn't intend. 

Or perhaps the keyword A3, B3, C3 might be used in a context that is less appropriate that you did not expect. Remember unlike typical databases, Summon is not discipline specific, so a keyword like "migration" could be used in different disciplines. 

The fact that web scale discovery searched through so much content, there would be a high chance of getting A3 AND B3 AND C3 entries that were not really that relevant when used in combination.

Even if all the terms you chose were appropriate, the fact that they could be matched in full text could throw off the result.

If A2 AND B2 AND C2 all appeared in the full text in an "incidental" way, they would be a match as well. Hence creating even more noise.

And when you think about it, the problems I mention will get even worse. as each of the keywords would be autostemmed (which may lead to results you don't expect depending on how aggressive autostemming is) exploding the results.

My own personal experience with Summon 2.0 is that often the culprit is the match in full-text. Poorly chosen "synonyms" could often surface and even be pushed up.

The "explosion" issues is worsen by full text matches in books

In Is Summon alone good enough for systematic reviews? Some thoughts.  , I was studying to see if Summon could be used for systematic reviews. A very important paper, pointed out that Google Scholar was a poor tool for doing systematic reviews, because of the lack of precision features like lack of wildcards, limited character length, inability to nest boolean more than 1 level etc, and I had speculated Summon lacking these issues would be a better tool.

Somewhat surprising to me was when I tried actually to do so.

Sometimes, when I did the exact same search statement in both Google Scholar and Summon, number of Summon results usually exploded, showing more results than Google Scholar!

Please note that when I say "exact same search statement" I mean that precisely.

So for example, one of the searches done in Google Scholar to look for studies was

depression treatment placebo (Antidepressant OR "Monoamine Oxidase Inhibitors" 
OR "Selective Serotonin Reuptake Inhibitors" OR "Tricyclic Drugs") ("general 
practice" OR "primary care") (randomized OR randomised OR random OR trial)

Google Scholar found 17k results, while Summon (with add results beyond library collection to get the full index) shows 35K. 

Why does Summon have more than double the number of results?  

This was extremely unexpected because we generally suspect Google Scholar has a larger index and Google Scholar is more liberal in interpreting search terms as they may substitute terms with synonyms, while Summon at best includes variant forms of keywords (plurals, british/amercian spelling etc

But If you look at the content types of the results of the 35k results you get a clue.





A full 22k of the 35k results (62%) are books! If you remove those than the number of results make more sense. 

Essentially books which can be indexed in full text have a high chance of been discovered since they contain many possible matches and this gets worse the more ORs you pile on. Beyond a certain point they might overwhelm your results.

It is of course possible some of the 22k books matched can be very relevant, but it is likely a high percentage of them would be glancing hits and if you are unlucky, other factors might push them up high. 

I did not even attempt to use wildcards to "improve" the results, even though they could work in Summon. When I did that the number of results exploded even more.

As an aside the Hathitrust people have a interesting series of posts on Practical Relevance Ranking for 11 Million Books, basically showing you can't rank books the same way you rank other materials due to the much longer length of the book.

The key to note is that you are no longer getting 50, 100 or even 200 results like in old traditional databases. You are getting thousands. So you can no longer look through all the results, you are totally at the mercy of the relevancy ranking...


The relevancy ranking is supposed to solve all this... and rank appropriately, but does it? Do you expect it to?

A extremely high recall but low precision (over all results), with a poor relevancy ranking makes a broken search. Do you expect the relevancy ranking to handle such result sets resulting from long strings of OR?

With so few users actually doing Boolean in web scale discovery (e.g this library found  0.07% of searches uses OR), should you expect discovery vendors to actually tune for such searches? 


Final thoughts

I am not going to say these types of searches are always useless in all situations, just that often they don't help particularly in cases like Google, Google Scholar, web scale discovery.

Precise searching using Boolean operators has it place in the right database. Such databases would include Pubmed - which is abstract only, allows power field searching, including a very precise MESH system to exploit. The fact that medical searches particularly systematic reviews require comprehensiveness and control is another factor consider.

I also think if you want to do such searches, you should think really hard on just adding one more OR or liberal use of wildcards "just in case". With web scale discovery services searching full-text, and autostemming, a very poor choice will lead to explosion of searches with combinations of keywords found that may not be what you expect.

A strategic use of keywords is the key here, though often for the novice searcher who doesn't know the area, he is as likely to come up with a keyword that might hurt as it might help initially. As such it is extremely important to stress the iterative nature of such searches, so as you figure out more of the subject headings etc you use them in your search.

Too often I find librarians like to give the impression they found the perfect search statement by magic on their first try, which intimidates users. 

I would also highly recommend doing field searches, or metadata only search options if available, if you try such searches and get weird results.

Systems like Ebsco discovery service give you the option to restrict searches to metadata only or not search in full text.




For Summon, if you expect a certain keyword to throw off the search a lot due to full-text matches, doing title/subject term/abstract etc only matches might overcome this.

Try for example


vs



So what do you think? Do you agree that increasingly you find doing a basic search is enough? Or am I understating the value of a nested boolean search? Are there studies showing they increase recall or precision.








Share this!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Related Posts Plugin for WordPress, Blogger...