|
The goal of the snippet retrieval track is to determine how best to generate informative snippets for search results. Such snippets should provide sufficient information to allow the user to determine the relevance of each document to their query, without needing to view the document itself. Participating organisations will compare the effectiveness of their generated snippets to others'.
For the convenience of those who have participated in the Snippet Retrieval track in either 2011 or 2012, there is a list of changes at the end of this document.
A set of topics (or queries) has been provided, and each one has a corresponding set of search results, taken from the document collection (described below). The task is to automatically generate text snippets for each of these search results, the goal being to provide sufficient information for the user to determine the relevance of the underlying document.
Each run will be submitted in the form of an XML file (format described below). Each submission should contain the exact same documents as the provided reference run.
Each snippet may contain a maximum of 180 characters – any snippets longer than this will be truncated. The snippets may be created in any way that you wish – they may consist of summaries, passages from the document, or any other text at all. Note that the document title will be shown alongside the snippet in the assessment software, so it is not necessary to include it within the snippet itself (although this is not explicitly disallowed).
Participants may submit more than one submission; however submissions must be ranked in order of importance, as it is possible that not all submissions may be able to be evaluated. Please note that participants will have to assess an additional snippet assessment package for each submission received.
This year, the Snippet Retrieval track will be using the same document collection as the Tweet Contextualisation track, based on a dump of the English Wikipedia from November 2012. A link to the full dataset, with user name and password, is given here
The set of topics is the same as used in 2012. There are 35 topics in total, and the reference run contains 20 results for each topic. Since the task is to generate snippets for the documents given in the reference run, a link to an archive containing only those 700 documents (as well as the reference run submission file itself) can be found here.
The documents are in a simple XML format. Documents consist of a title (title), an abstract (a) and sections (s). Each section has a sub-title (h). Abstract and sections are made of paragraphs (p) and each paragraph can have entities (t) that refer to Wikipedia pages. The DTD is given below:
<!ELEMENT xml (page)+>
<!ELEMENT page (ID, title, a, s*)>
<!ELEMENT ID (#PCDATA)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT a (p+)>
<!ELEMENT s (h, p+)>
<!ATTLIST s o CDATA #REQUIRED>
<!ELEMENT h (#PCDATA)>
<!ELEMENT p (#PCDATA | t)*>
<!ATTLIST p o CDATA #REQUIRED>
<!ELEMENT t (#PCDATA)>
<!ATTLIST t e CDATA #IMPLIED>
Note: the reference run provided is already in the correct format. To generate a valid submission, only the snippet text itself needs to be modified, as well as the metadata describing the run (participant-id, run-id, and description).
The DTD for the submission format is as follows.
<!ELEMENT inex-snippet-submission (description,topic+)>
<!ATTLIST inex-snippet-submission
participant-id CDATA #REQUIRED
run-id CDATA #REQUIRED
>
<!ELEMENT description (#PCDATA)>
<!ELEMENT topic (snippet+)>
<!ATTLIST topic
topic-id CDATA #REQUIRED
>
<!ELEMENT snippet (#PCDATA)>
<!ATTLIST snippet
doc-id CDATA #REQUIRED
rsv CDATA #REQUIRED
>
Each submission must contain the following:
Every run should contain the results for each topic, conforming to the following:
A correct submission will look similar to this:
<?xml version="1.0"?>
<!DOCTYPE inex-snippet-submission SYSTEM "inex-snippet-submission.dtd">
<inex-snippet-submission participant-id="20" run-id="QUT_Snippet_Run_01">
<description>A description of the approach used.</description>
<topic topic-id="2013001">
<snippet doc-id="7286939" rsv="0.9999">...</snippet>
<snippet doc-id="1760504" rsv="0.9998">...</snippet>
...
</topic>
<topic topic-id="2013002">
<snippet doc-id="11733666" rsv="0.9999">...</snippet>
<snippet doc-id="3659889" rsv="0.9997">...</snippet>
...
</topic>
...
</inex-snippet-submission>
Participating organisations will be required to perform assessment on other participants' submissions. Both snippet-based and document-based assessment will be used, with evaluation based on comparing these two sets of assessments.
For each submission received from a participating organisation, that organisation will be given a snippet assessment package (the size of a single submission) to assess. For each topic, the assessor will read through the details of the topic, after which they will read through each snippet, and determine whether or not the underlying document is relevant to the topic. This is expected to take around 1-2 hours per package. Ideally, each package should be assessed by a different person if feasible.
Additionally, each participating organisation will be required to perform one assessment of the document assessment package. For each of the 35 topics, the assessor is shown the full text of each of the 20 documents. They must read through enough of the document to determine whether or not it is relevant to the topic. This is expected to take around 3-7 hours, depending on the assessor.
Only one set of document assessment needs to be completed by each participating group, although additional assessments are welcome. Please note, however, that if a given assessor is performing both snippet assessment and document assessment, the document assessment must be performed last, to avoid any bias caused by familiarity with the full documents.
Evaluation is based on comparing the snippet assessments with the consensus formed by all of the the submitted document assessments, which is treated as a ground truth.
The final set of evaluation metrics to be used is still to be decided, but will include at least the following:
For the benefit of those who have participated in the Snippet Retrieval track in either of the previous two years that it has been run, the following lists outline how this year's track compares to each of the previous years.