While Publishers of full text like Sage, Springer, IEEE eventually realized it was to their benefit to contribute metadata to the index of web scale discovery services because it increased the find-ability of their full text to users on discovery services (IEEE going so far as to study obstacles to getting their content indexed in discovery services) and hence increased demand for their content, it was less clear why Abstract and indexing (A&I) databases should contribute their metadata to the discovery index.
So for example let's say a user searches in a discovery service like Primo and finds the following record.
As you can see above this record is contributed by the A&I database Web of Science.
The user then clicks on View Online to see where to get the full text.
As seen above, the user can click on either targets/destinations of Proquest or DOAJ to get access to the full text via either two full text sources on those sites. (The links are generated using an Openurl resolver)
A&I services are left out in the cold
Let's recap the transaction.
The user is happy because he gets access to items he would have otherwise missed. Similarly the discovery service (Primo's Exlibris soon to be under Proquest) gains from making more items discoverable.
The actual content provider of the item (in the above case Proquest or DOAJ) is happy too, his content gets discovered and usage of his content will go up and be recorded.
The only one left out from this happy transaction is the A&I database vendor - Web of Science. As the user never actually goes into the A&I database, he may not even realize he just benefited from the library's subscription of the A&I database.
Usage of the A&I may in fact fall, as some libraries have reported, particularly if they are aware or dimly grasp that the same records in A&I database can be found in the discovery service.
This is an issue that is well recognized by NISO's Open Discovery Initiative (ODI). Of course, most A&I databases require that the library be a mutual subscriber of most the A&I database and the discovery service before you can benefit from the metadata, so if the library values the metadata provided by the A&I, A&I databases will continue to be subscribed.
But here lies the rub, how do you know the metadata from the A&I database is making the difference in helping discovery? Particularly since many full-text providers are also giving away the metadata. Sure the A&I may have more or better metadata but how do you know it is making the difference?
Measuring the value of metadata/records contributed
Up to recently, I wasn't aware of anyway to measure the value of the metadata contributed by a source (A&I, Publisher, Aggregator etc). However while playing around with Exlibris' Alma and Primo analytics, and lurking on the mailing list I noticed a interesting email by a UNSW librarian regarding the "Link resolver usage subject area" in Alma analytics.
Here's part of the message
"If the source has a colon in it, a user either was a staff member testing the link within Alma, or got access to an article from within a database by being referred back to the uresolver to see if you have a subscription that covers it."
The first part is fairly straight forward so you will see sources listed such as
EBSCO:Business Source Complete - 220 requests
ProQ:ABI/INFORM Global - 110 requests
info:sid/www.isinet.com:WoK:WOS - 55 requests
Elsevier:Scopus - 20 requests
Here we are talking about link resolver requests (typically branded Findit@xlibrary) from these databases. In the above example, we have link resolver requests from Business Source Complete, Proquest ABI/Inform Global , Web of Science on Web of Knowledge Platform and Scopus.
So the above shows users searching in Scopus and when they click on Find it @ SMU Library, the clicks will be recorded as source Elsevier:Scopus
The part that left me quite excited was this
So going back to the above example, when a user clicks on a record that originated in Web of Science but found in Primo, the click will be recorded as from the source "wos".
So it seems at least with Primo, we can now measure the value of the metadata provided by different sources!
Dealing with multiple versions of the same article/item
How Primo handles it is that it will attempt to match and group records it "thinks" is the same article and display only one in the search results initially (Anyone know if this can be adjusted? or is covered in The Primo Technical Manuals?). In the example below, the record from JSTOR is the main record showing.
I haven't tested this, but I assume if you click on "view online" without clicking on "view all versions", only one source (the record that is displayed that comes from that source in the above case JSTOR) will be credited.
Of course each of these records while they are very similar, do differ in small ways as they are from different sources. In my example, the records from MEDLINE/PubMed have slightly better subject headings and if I try to search with these subject headings, it is that record from MEDLINE/PubMED that appears as the main record in the search result as it no longer matches the record from JSTOR.
So far this makes a lot of sense, though there might be some squabbling over which source to "credit" discovery to if the search query matches more than one possible record.
Eg. If I do a search for a title combined with a subject search and records from two possible sources are matched should I credit discovery to both equally?
Grouping multiple records vs Single super record approach
The problem is that some discovery systems like Proquest's Summon practices a "Super record" approach.
"The index includes very rich metadata brought together from multiple sources using an innovative match-and-merge function. Match-and-merge allows Summon to take the best data from these sources and bring it together to complete a rich “super record.” - source
While this sounds like what Primo is doing it's actually quite different. In Primo while the system groups different versions of the same article, each version record is still retained seperately as you can see from clicking on "view all versions".
In Summon what happens is that if multiple versions of the same article is available from multiple sources, a "Match-and-merge" function will try to build a single merged/deduped "super record".
The super-record might include
a) Title/author/author supplied keywords from the publisher A and aggregator B
b) Subject headings from multiple A&I databases eg. Scopus and Pubmed
c) Table of contents from aggregator C
and so on.
I can see the attraction of such an approach, and from a user point of view it's probably cleaner as the user doesn't care which source contributes to the item getting discovered, so all he wants to see is one "super-record" with all the combined available data on it.
See for example the same article record in Summon below
Above you can see just a single record and the sources used to create the "super record" are listed. Under subjects you can see the combined entries drawn from various sources.
Because it's a single super record, you also increase chances of discovery. So for example if the person happens to be searching for the following three together in a advanced search
b) Subject from Source A
c) eISSN from Source B
It will match Summon's super record but not any of Primo's individual records because no single record has all 3 items.
But a combined super record presumably means that it's going to be harder to do the same as what Primo did. Since it's all one record, when an article is found in Summon, how do you know which source contributed to the discovery?
Of course it's not impossible that Summon could still retain individual source records similar to Primo and use that to give credit to sources for aiding discovery.....
I'll end here with a statement from a ODI paper.
"A&I services have a complex set of issues relative to the ecosystem of index-based discovery. The producers of these services naturally have an interest in preserving their value, especially in being assured that libraries will continue to maintain their subscriptions should they contribute to discovery services.
Discovery services must limit access to proprietary content, such as abstracts and specialized vocabularies to authenticated users affiliated with mutually subscribing institutions. Given these factors, among many others, A&I resources must be treated with special consideration within the body of content providers that potentially contribute to discovery services."
Still this development does not completely solve all the concerns of A&I services.
For example, there is concern that relevancy algorithms in some discovery services may systematically under-privilege content contributed by A&Is (for example by weighting full text more than subject headings), leading to a devaluing of their content. See for example the exchange of letters between EBSCO, Ex Libris and Orbis Cascade Alliance.
In fact, the ability to track contributions to discovery from sources could backfire and lead libraries to undervalue A&I sources , now that they can finally see the impact of metadata contributed by them.
This is a hard nut to track. While one could come up with metrics that measure % of top X results that contain A&I sources (e.g the latest Primo Analytics provide something along that lines for results from Primo/Primo Central/Ebsco API) , it's still not possible to agree on what % should be reasonable as there is no gold standard for relevancy algothrims to compare against.