Results

Results of the 2012 round are available here.

Overview

The goal of the snippet retrieval track is to determine how best to generate informative snippets for search results. Such snippets should provide sufficient information to allow the user to determine the relevance of each document, without needing to view the document itself. Participating organisations will compare both the effectiveness of their focused retrieval systems as well as the effectiveness of their snippet generation systems, to others.

Changes from 2011

The following is an overview of the changes made to the track since last year:

Full document-based assessment will now be used in addition to snippet-based assessment, so that both the snippet and the full document are assessed by the same assessor.
To keep the assessment load manageable now that document-based assessment is included, the number of topics and snippets has been reduced:
- There are 35 topics in total (down from 50 in 2011)
- There will be 20 snippets per topic (down from 100 in 2011)
The document title will be shown alongside each snippet in the assessment software – it will not be necessary to include the document title in the snippet itself (although this is not forbidden).
Snippets are now limited to 180 characters (down from 300 in 2011).
There will be a baseline run included in the evaluation, consisting of the first 180 characters of each document in the reference run.

Data

The snippet retrieval track will use the INEX 2009 Wikipedia collection. Some topics will be recycled from previous ad hoc tracks, while others will be brand new.

Task

Participating organisations will submit a ranked list of documents, and corresponding snippets. For those organisations uninterested in developing their own focused retrieval system, a reference run will be provided, consisting of a ranked list of documents, for which snippets should be created.

Each run will be submitted in the form of an XML file (format described below). Each submission should contain at least 20 snippets per topic, with a maximum of 180 characters per snippet. The snippets themselves may be created in any way you wish – they may consist of summaries, passages from the document, or any other text at all. Note that the document title will be shown alongside the snippet in the assessment software, so it is not necessary to include it within the snippet itself (although it is not disallowed).

Participants may submit more than one submission; however submissions must be ranked in order of importance, as not all submissions may be evaluated. Please note that participants will have to assess an additional package for each submission received.

Submission format

The DTD for the submission format is as follows.


<!ELEMENT inex-snippet-submission (description,topic+)>
<!ATTLIST inex-snippet-submission
  participant-id CDATA #REQUIRED
  run-id CDATA #REQUIRED
>
<!ELEMENT description (#PCDATA)>
<!ELEMENT topic (snippet+)>
<!ATTLIST topic
  topic-id CDATA #REQUIRED
>
<!ELEMENT snippet (#PCDATA)>
<!ATTLIST snippet
  doc-id CDATA #REQUIRED
  rsv CDATA #REQUIRED
>

Each submission must contain the following:

participant-id: The participant number of the submitting institution (available here)
run-id: A run ID, which must be unique across all submissions sent from a single participating organisation.
description: A short description of the approach used.

Every run should contain the results for each topic, conforming to the following:

topic: Contains a ranked list of snippets, ordered by decreasing level of relevance of the document which they represent.
topic-id: The ID number of the topic.
snippet: A snippet representing a document (defined by doc-id). Must be 300 characters or less, and contain no line breaks.
doc-id: The document ID.
rsv: The document's score.

An example submission in the correct format is given below.


<?xml version="1.0"?>
<!DOCTYPE inex-snippet-submission SYSTEM "inex-snippet-submission.dtd">
<inex-snippet-submission participant-id="20" run-id="QUT_Snippet_Run_01">
    <description>A description of the approach used.</description>
    <topic topic-id="2011001">
        <snippet doc-id="16080300" rsv="0.9999">...</snippet>
        <snippet doc-id="16371300" rsv="0.9998">...</snippet>
        ...
    </topic>
    <topic topic-id="2011002">
        <snippet doc-id="1686300" rsv="0.9999">...</snippet>
        <snippet doc-id="1751300" rsv="0.9997">...</snippet>
        ...
    </topic>
    ...
</inex-snippet-submission>

Relevance assessments

Participating organisations will be required to perform assessment or other participants' submissions. For each submission received from a participating organisation, that organisation will be given an assessment package (the size of a single submission) to assess.

This year, both snippet-based and document-based assessment will be used. For each topic, the assessor will read through the details of the topic, after which they will read through each snippet, and determine whether or not the underlying document is relevant to the topic. After snippet-based assessment is complete, they will then assess the documents for relevance based on the full document text.

Because document-based assessment is being used in addition to snippet-based assessment this year, the number of topics and snippets has been reduced to keep the assessment load reasonable. There are 35 topics in total (down from 50 in 2011), and there will be 20 snippets/documents to assess for each topic (down from 100 snippets in 2011).

Evaluation

The primary evaluation metric, and the one which determines the ranking, is the geometric mean of recall and negative recall (GM), averaged over all topics: sqrt(TN/(TN+FP) * TP/(TP+FN)).

Other metrics include:

Mean prediction accuracy (MPA) — the percentage of results the assessor correctly assessed, averaged over all topics: (TP+TN)/(TP+FN+TN+FP)
Mean normalised prediction accuracy (MNPA) — the average of the relevant results correctly assessed, and the irrelevant results correctly assessed, averaged over all topics: 0.5*TP/(TP+FN) + 0.5*TN/(TN+FP)
Recall — the percentage of relevant documents correctly assessed, averaged over all topics: TP/(TP+FN)
Negative recall (NR) or specificity — the percentage of irrelevant documents correctly assessed, averaged over all topics: TN/(TN+FP)
Positive agreement (PA) — the conditional probability of agreement between snippet assessor and document assessor (i.e. ground truth), given that one of the two judged relevant. Equivalent to F1 score: 2*TP/(2*TP+FP+FN)
Negative agreement (NA) — the conditional probability of agreement between snippet assessor and document assessor (i.e. ground truth), given that one of the two judged irrelevant: 2*TN/(2*TN+FP+FN)

Schedule

Because the track has had a late start, it will not be possible for any evaluation to be completed before the submission deadline for pre-proceedings papers. To ensure that some evaluation results are made available before the workshop itself, there will be two rounds of submissions. Round 1 will be due one month before the workshop, with the results aimed to be released a week ahead of the workshop. Submissions for Round 2 will be due one month after the workshop. Results from Round 2 will be included in the final proceedings.

The schedule is as follows:

Release of final topics:	Jul 18
Submissions due:	Oct 19
Assessment phase:	November
Release of evaluation results:	Early December

Organizers

Matthew Trappett
QUT
matthew.trappett@qut.edu.au

Shlomo Geva
QUT
s.geva@qut.edu.au

Andrew Trotman
University of Otago
andrew@cs.otago.ac.nz

Falk Scholer
RMIT
falk.scholer@rmit.edu.au

Mark Sanderson
RMIT
mark.sanderson@rmit.edu.au

Contact someone about INEX