INEX 2011 Question Answering Track
Home | About | 2012 | 2013 | 2014

Overview

The QA task to be performed by the participating groups of INEX 2011 is contextualizing tweets, i.e. answering questions of the form "what is this tweet about?" using a recent cleaned dump of the Wikipedia. The general process involves:

We regard as relevant passages segments that both

For evaluation purposes, we require that the answer uses ONLY elements or passages previously extracted from the document collection. The correctness of answers is established by participants exclusively based on the support passages and documents.

Participants are required to submit at least one completely automatic run. However, manual runs are strongly encouraged. Are considered as manual, runs that require a human intervention at any level of the process. These interventions should be clearly stated and documented.

Task description

Motivation for the Task

The underlying scenario is when receiving a tweet with an url on a small terminal like a phone, provide the user with synthetic contextual information grasped from a local XML dump of the wikipedia. The answer needs to be built by aggregation of relevant XML elements or passages.

The aggregated answers will be evaluated according to the way they overlap with relevant passages (number of them, vocabulary and bi-grams included or missing) and the "last point of interest" marked by evaluators. By combining these measures, we expect to take into account both the informative content and the readability of the aggregated answers.

Results to Return

An short summary of less that 500 words, exclusively made of aggregated passages extracted from the wikipedia corpus.

Automatic summarization systems by extraction are strongly encouraged to participate.

Relevance assessments

Each assessor will have to evaluate a pool of answers of a maximum of 500 words each. These answers will be an agglomeration of wikipedia passages.

Evaluators will have to mark:

  1. The "last point of interest", i.e. the first point after which the text becomes out of context because of:
  2. all relevant passages in the text, even if they are redundant.

Systems will be ranked according to the:

Document collection

The document collection has been rebuilt based on a recent dump of the English wikipedia from April 2011 (we left a copy of this dump here). Since we target a plain xml corpus for an easy extraction of plain text answers, we removed all notes and bibliographic references that are difficult to handle and kept only the 3,217,015 non empty wikipedia pages (pages having at least on section).

Resulting documents are made of a title (title), an abstract (a) and sections (s). Each section has a sub-title (h). Abstract end sections are made of paragraphs (p) and each paragraph can have entities (t) that refer to wikipedia pages. Therefore the resulting corpus has this simple DTD:

<!ELEMENT xml (page)+> <!ELEMENT page (ID, title, a, s*)> <!ELEMENT ID (#PCDATA)> <!ELEMENT title (#PCDATA)> <!ELEMENT a (p+)> <!ELEMENT s (h, p+)> <!ATTLIST s o CDATA #REQUIRED> <!ELEMENT h (#PCDATA)> <!ELEMENT p (#PCDATA | t)*> <!ATTLIST p o CDATA #REQUIRED> <!ELEMENT t (#PCDATA)> <!ATTLIST t e CDATA #IMPLIED> This corpus is available in two file formats (2.7 Go each):

A complementary list of non wikipedia entities extracted from pages using LIMSI tools is also available here.

Baseline system

A baseline XML-element retrieval system powered by Indri is available online with a standard CGI interface. The index covers all words (no stop list, no stemming) and all XML tags. Participants that do not wish to build their own index could use this one by downloading it or by using it online (More information here or contact eric.sanjuan@univ-avignon.fr).

You can also query this baseline system in batch mode using this perl program. It uses input files as this one. See its synopsis for more details.

Topics

2011 and on ...

The selected 132 topics for 2011 are available here. Each topic includes the title and the first sentence of a New York Times paper that were twitted at least two months after the wikipedia dump we use. For each topic we manually checked that there is related information in the document collection. We can provide the content of the papers to participants on an individual basis but the objective of the task remains to contextualize only the twitted information.

Past topics

2009 - 2010 topics also available here and anonymized best runs for 2010 participants are also available here. These runs can be used to smooth new systems.

Result Submission

Fact sheet

Format for results

It is a variant of the familiar TREC format with additional fields:
<qid> Q0 <file> <rank> <rsv> <run_id> <column_7> <column_8> <column_9>

Here: The remaining three columns depend on the chosen format (text passage or offset).

Textual content:

raw text is given without XML tags and without formatting characters (avoid "\n","\r","\l"). The resulting word sequence has to appear in the file indicated in the third field. This is an example of such output:

1 Q0 3005204 1 0.9999 I10UniXRun1 The Alfred Noble Prize is an award presented by the combined engineering societies of the United States, given each year to a person not over thirty-five for a paper published in one of the journals of the participating societies. 
1 Q0 3005204 2 0.9998 I10UniXRun1 The prize was established in 1929 in honor of Alfred Noble, Past President of the American Society of Civil Engineers.
1 Q0 3005204 3 0.9997 I10UniXRun1 It has no connection to the Nobel Prize , although the two are often confused due to their similar spellings.

File Offset Length format (FOL)

In this format passages are given as offset and length calculated in characters with respect to the textual content (ignoring all tags) of the XML file. File offsets start counting a 0 (zero). Previous example would be the following in FOL format:

1 Q0 3005204 1 0.9999 I10UniXRun1 256 230
1 Q0 3005204 2 0.9998 I10UniXRun1 488 118
1 Q0 3005204 3 0.9997 I10UniXRun1 609 109

The results are from article 3005204. The first passage starts at the 256th character (so 257 characters beyond the first character), and has a length of 239 characters.

Schedule

18/Oct/2011 Release of final set of questions available here.
18/Nov/2011 28/Nov/2011 (Extended) Submission deadline for Results
21/Nov/2011 10/Dec/2011 Release of QA semi-automatic evaluation results by organizers
10/Dec/2011Release of manual evaluation by participants

Results

The evaluation software is here.

Organizers

Patrice Bellot
LSIS - Aix-Marseille University

Josiane Mothe
IRIT, University of Toulouse

Véronique Moriceau
LIMSI-CNRS, University Paris-Sud 11

Eric SanJuan
LIA, University of Avignon
eric.sanjuan@univ-avignon.fr

Xavier Tannier
LIMSI-CNRS, University Paris-Sud 11

Imprint | Data protection | Contact someone about INEX