INEX 2014 Social Book Search Track

Evaluation results for the 2014 Social Book Search Suggestion task

Version 2, 16-may-2014

The evaluation results shown below are based on the official INEX 2014 SBS topic set based on the LibraryThing discussion groups and the user profiles and catalogues of the topic creators.

This year we manually labelled each book suggestion in the forum topic threads to determine its relevance value. The mapping of labels to relevance values is explained below.
Only the first ISBN of a work is counted as a result. Lower-ranked ISBNs of the same work are ignored. If an ISBN at rank 7 maps to the same work as an ISBN at rank 4, we ignore the ISBN at rank 7 and move the ISBN at rank 8 to rank 7.
The mapping of multiple ISBNs to the same work are based on the mapping from LibraryThing, from the file thingISBN.xml.gz (version of 27 January 2009). We filtered these mappings on the ISBNs in the Amazon/LibraryThing collection (the resulting file is amazon-lt.isbn.thingID.gz).

These are the official Qrels:

inex14sbs.qrels: Qrels derived from the books recommended on the LibraryThing discussion threads of 680 topics. The mapping of labels to relevance values is explained below.

The Qrels set uses the LibraryThing work IDs as document IDs. For evaluation the ISBNs in the submitted runs are mapped to LT work IDs as well, with the highest ranked ISBN being mapped to the work ID and lower ranked ISBNs mapped to the same ID removed from the results list.

Deduplication: mapping ISBNs to LibraryThing work IDs

Only the first ISBN of a work is counted as a result. Lower-ranked ISBNs of the same work are ignored. If an ISBN at rank 7 maps to the same work as an ISBN at rank 4, we ignore the ISBN at rank7 and move the ISBN at rank 8 to rank 7.
The mapping of multiple ISBNs to the same work is based on the mapping from LibraryThing, from the file thingISBN.xml.gz (version of 27 January 2009). From this file we extracted only the mappings for ISBNs in the Amazon/LibraryThing collection, which are available here: amazon-lt.isbn.thingID.gz
A perl script to map ISBNs to IDs in run is also available: deduplicate_simple.pl. This script requires amazon-lt.isbn.thingID.gz for the mappings.

Evaluation results

The official evaluation measure is nDCG@10.

Run	nDCG@10	MRR	MAP	R@1000
USTB - run6.SimQuery1000.rerank_all.L2R_RandomForest	0.303	0.464	0.232	0.390
USTB - run4.newXml.rerank_all.L2R_RandomForest	0.142	0.258	0.102	0.390
HAFSI - 326	0.142	0.275	0.107	0.426
USTB - run3.newXml.rerank_all.L2R_Coordinate	0.138	0.256	0.101	0.390
USTB - run5.newXml.rerank_all.L2R_RankNet	0.133	0.246	0.098	0.390
USTB - run2.newXml.rerank_T	0.131	0.246	0.096	0.390
USTB - run1.newXml.feedback	0.128	0.246	0.095	0.390
LSIS - InL2	0.128	0.236	0.101	0.441
AAU - run1.all-plus-query.all-doc-fields	0.127	0.239	0.097	0.444
AAU - run3.all-plus-query.all-doc-fields	0.120	0.227	0.090	0.425
CYUT - Type2QTGN	0.119	0.246	0.086	0.340
CYUT - 0.95AverageType2QTGN	0.119	0.243	0.085	0.332
HAFSI - 328	0.117	0.226	0.088	0.392
HAFSI - 329	0.116	0.217	0.087	0.392
HAFSI - 325	0.115	0.214	0.087	0.392
LSIS - InL2Feedback	0.114	0.230	0.094	0.434
HAFSI - 324	0.112	0.214	0.086	0.392
LSIS - InL2tagFeedback	0.102	0.212	0.075	0.388
UvA - inex14.ti_qu.fb.10.50.5000	0.097	0.179	0.073	0.421
UMD - Full_TQG_fb.10.50_0.0000227_50.trec	0.097	0.188	0.069	0.328
UMD - Social_TQG_fb.10.50_0.0000222_50.trec	0.096	0.184	0.067	0.327
UMD - Full_TQG_fb.10.50_0.0000255_100.trec	0.096	0.188	0.068	0.328
UvA - inex14.ti_qu_gr.fb.10.50.5000	0.095	0.162	0.074	0.436
UvA - inex14.ti_qu.5000	0.095	0.173	0.073	0.412
UMD - Full_TQG_fb.10.50_traditional.trec	0.095	0.185	0.068	0.328
UvA - inex14.ti_qu_gr.5000	0.094	0.163	0.074	0.418
UMD - Full_TQ_fb.10.50_0.0000247_100.trec	0.092	0.176	0.064	0.321
UMD - Full_T_fb.10.50_0.0000260_100.trec	0.070	0.139	0.047	0.253
ISMD - 354	0.067	0.123	0.049	0.285
LSIS - sdm_Rating	0.062	0.120	0.047	0.314
LSIS - sdm_concept	0.056	0.118	0.039	0.253
ISMD - 341	0.056	0.106	0.042	0.236
LSIS - sdm_tag_feedback	0.055	0.112	0.040	0.267
HAFSI - 345	0.052	0.113	0.037	0.383
ISMD - 350	0.048	0.090	0.036	0.211
AAU - run2.query.all-doc-fields	0.047	0.090	0.035	0.304
ISMD - 355	0.038	0.089	0.026	0.124
CYUT - 0.95RatingType2QTGN	0.034	0.101	0.021	0.200
CYUT - 0.95WRType2QTGN	0.028	0.084	0.018	0.213
ISMD - 342	0.010	0.018	0.007	0.081

Operationalisation of forum judgement labels

Students from the Aalborg University Copenhagen, Royal School of Library and Information Science (Copenhagen) and the Oslo and Akershus University College have labelled the LibraryThing forum topic threads and the suggestions in those threads.

The topic label were used to select topics for the 2014 SBS task. In total, 680 topics were selected.
The suggestions labels, in combination with the user profiles, were used to determine the relevance value of each book suggestion in the thread.

Forum members can mention books for many different reasons. We want the relevance values to distinguish between books that were mentioned as positive recommendations, negative recommendations (books to avoid), neutral suggestions (mentioned as possibly relevant but not necessarily recommended) and books mentioned for some other reason (not relevant at all).

Furthermore, we want to differentiate between recommendations from members who have read the book they recommend and members who haven't. We assume the recommendation to be of more value to the searcher if it comes from someone who has actually read the book.

Finally, we distinguish between suggestions of books that the user already had in their catalogue versus books that the user added after getting a suggestion from others.

Terminology

works added by topic creator before they were suggested on the forum are *Pre-catalogued*
works added by topic creator after they were suggested on the forum are *Post-catalogued*
first mention of work is *suggestion*
subsequent mentions of work are *replies*
has_read means suggester has read the book
not_read means suggester hasn't read the book
not_sure means labeller couldn't tell

Simplifying assumptions

if creator has book catalogued before it's suggested: treated as known -> the creator already knows about this book so suggesting it has little value.
if creator adds book to catalogue after it's suggested: treated as highly relevant -> this is signal that the creator acted upon the suggestion, so the suggestion has value to the creator.
not sure if suggester has read book: treated as not read -> for creator its not clear to value not_sure recommendation more than not_read recommendation.
not sure if suggester is positive/negative/neutral: treated as neutral -> for creator there is no signal this book is better/worse than neutral suggestions.
has read overrules not read (opinion of someone who read the book is more valuable than of someone who didn't)
pos + neg == neu (positive and negative cancel each other out).
topic creator has read is rv=0 (if creator has already read the book, the suggestion has no value for search).
Topic creator's judgements overrule others (we mainly care about topic creator's opinion.)
Topic creator mentioning single work multiple times, last mention is used as judgement (assumption is that creator based last judgement on the thread discussion or having bought/read the book in the meantime).

Decision tree: determine which judgements to use

1 - Work mentioned once -> there is only one judgement, use that
2 - Work mentioned multiple times
  2.1 - topic creator mentions work
    2.1.1 - topic creator *suggests* neutral -> use replies (go to 2.2)
    2.1.2 - topic creator *suggests* pos/neg -> use creator judgement
    2.1.3 - topic creator *replies* -> use creator judgement only
  2.2 - topic creator doesn't mention work
    2.2.1 - there are some has_read suggestions/replies -> use has_read judgements
    2.2.2 - there are no has_read suggestions/replies -> use all judgements

Decision tree: turn judgements into relevance values

When a work is mentioned, its base relevance value is rv=2.

1 - catalogued by topic creator
  1.1 - post-catalogued -> rv=8
  1.2 - pre-catalogued -> rv=0
2 - single judgement
  2.1 - creator has_read judgement
    2.1.1 - creator pos/neg/neu -> rv=0
  2.2 - creator not_read judgement
    2.2.1 - creator positive -> rv= 8
    2.2.2 - creator neutral -> rv=2
    2.2.3 - creator negative -> rv=0
  2.3 - other has_read judgement
    2.3.1 - has_read positive -> rv= 4
    2.3.2 - has_read neutral -> rv=2
    2.3.3 - has_read negative -> rv=0
  2.4 - other not_read judgement
    2.4.1 - not_read positive -> rv= 3
    2.4.2 - not_read neutral -> rv=2
    2.4.3 - not_read negative -> rv=0
3 - multiple judgements
  3.1 - multi has_read judgements
    3.1.1 - some positive, no negative -> rv=6
    3.1.2 - #positive > #negative -> rv=4
    3.1.3 - #positive == #negative -> rv=2
    3.1.4 - all neutral -> rv=2
    3.1.5 - #positive < #negative -> rv=1
    3.1.6 - no positive, some negative -> rv=0
  3.2 - multi not_read judgements
    3.2.1 - some positive, no negative -> rv=4
    3.2.2 - #positive > #negative -> rv=3
    3.2.3 - #positive == #negative -> rv=2
    3.2.4 - all neutral -> rv=2
    3.2.5 - #positive < #negative -> rv=1
    3.2.6 - no positive, some negative -> rv=0

If you have questions, please send an email to marijn.koolen@uva.nl.

Imprint | Data protection | Contact someone about INEX