|
The goal of the Linked Data track is to investigate retrieval techniques over a combination of textual and highly structured data, where RDF properties carry additional key information about semantic relations among data objects that cannot be captured by keywords alone. We intend to investigate if and how structural information could be exploited to improve ad-hoc retrieval performance, and how it could be used in combination with structured queries to help users navigate or explore large result sets via Ad-hoc queries, or to address Jeopardy-style natural-language queries which are translated into a SPARQL-based query format.
The Linked Data track thus aims to close the gap between IR-style keyword search and Semantic-Web-style reasoning techniques. Our goal is to bring together different communities and to foster research at the intersection of Information Retrieval, Databases, and the Semantic Web.
The Linked Data track will use a subset of DBpedia and YAGO2s together with a recent dump of Wikipedia core articles. In addition to these reference collections, we will also provide two supplementary collections: (1) to lower the participation threshold for participants with IR engines, a fusion of XML-ified Wikipedia articles with RDF properties from both DBpedia and YAGO2s, and (2) to lower the participation threshold for participants with RDF engines, a dump of the textual content of Wikipedia articles in RDF. Participants are explicitly encouraged to make use of more RDF facts available from DBpedia and YAGO2s, in particular for processing the reasoning-related Jeopardy topics.
For INEX 2013, we will explore two different retrieval tasks that continue from INEX 2012:
The final set of 144 Ad-hoc task search topics for the INEX 2013 Linked Data track has been released is now available for download.
You may want to consider the topics from the 2012 LOD track together with the qrels for training.
<qid> Q0 <file> <rank> <rsv> <run_id>
Here:
An example submission is:
2013001 Q0 12 1 0.9999 2013UniXRun1
2013001 Q0 997 2 0.9998 2013UniXRun1
2013001 Q0 9989 3 0.9997 2013UniXRun1
It contains three results for topic 2013001. The first result is the Wikipedia page with ID "12". The second result is the page with ID "997", and the third result is the page with ID "9989". Mappings between DBpedia URI's and Wikipedia page ID's are available from the DBpedia PageIDs which is part of the reference collection. Please restrict your results to the DBpedia URI's provided in the list of valid DBpedia URI's.
The Jeopardy task investigates retrieval techniques over a set of natural-language Jeopardy clues, which have been manually translated into SPARQL query patterns and enhanced with keyword-based filter conditions.
The final set of 105 Jeopardy task search topics for the INEX 2013 Linked Data track has been released is now available for download.
You may want to consider the topics from 2012 together with the qrels for training.
We illustrate the topic format with the example of topic 2012301 from the set of the 2012 topics. It is represented in XML format as follows:
<topic id="2012374" category="Politics">
<jeopardy_clue>Which German politician is a successor of another politician who stepped down before his or her actual term was over, and what is the name of their political ancestor?</jeopardy_clue>
<keyword_title>German politicians successor other stepped down before actual term name ancestor</keyword_title>
<sparql_ft>
SELECT ?s ?s1 WHERE {
?s rdf:type <http://dbpedia.org/class/yago/GermanPoliticians> .
?s1 <http://dbpedia.org/property/successor> ?s .
FILTER FTContains (?s, "stepped down early") .
}
</sparql_ft>
</topic>
The <jeopardy_clue> element contains the original Jeopardy clue as a natural-language sentence; the <keyword_title> element contains a set of keywords that has been manually extracted from this title and will be reused as part of the ad-hoc task; and the <sparql_ft> element contains the result of a manual conversion of the natural-language sentence into a corresponding SPARQL query. The <category> attribute of the <topic> element may be used as an additional hint for disambiguating the query.
In the above query, ?s is a variable for an entity of type http://dbpedia.org/class/yago/GermanPoliticians (first triple pattern), and it should be in a http://dbpedia.org/property/successor relationship with another entity ?s1. The FTContains filter condition restricts ?s to entities that should be associated with the keywords "stepped down early" via its corresponding Wikipedia article.
Since this particular variant of SPARQL with full-text filter conditions cannot be run against a standard RDF collection (such as DBpedia or YAGO) alone, participants are encouraged to develop individual solutions to index both the RDF and textual contents of the Wikipedia-LOD collection in order to process these queries.Each participating group may submit up to 3 runs. Each run can contain a maximum of 1000 results per topic, ordered by decreasing value of relevance (although we expect most topics to have just one or a few entities or sets of entities as targets). The results of one run must be contained in one submission file (i.e. up to 3 files can be submitted in total). For relevance assessment and evaluation of the results we require submission files to be in the familiar TREC format, however containing one row of target entities (denoted by their Wikipedia page ID's, which are available in the reference collection through <http://dbpedia.org/ontology/wikiPageID>
properties) that denote each query result. Each row of target entities must reflect the order of query variables as specified by the SELECT clause of the Jeopardy topic. In case the SELECT clause contains more than one query variable, the row should consist of a comma- or semicolon-separated list of target entity ID's.
<qid> Q0 <file> <rank> <rsv> <run_id>
Here:
An example submission is:
2012374 Q0 12;24 1 0.9999 2012UniXRun1
2012374 Q0 997;998 2 0.9998 2012UniXRun1
2012374 Q0 9989;12345 3 0.9997 2012UniXRun1
Here are three results for topic "2012374"; this topic requests two entities per result since it has two variables in the SELECT clause. The first result is the entity pair (denoted by their Wikipedia page ID's) with the ID's "12" and "24". The second result is the entity pair with the ID's "997" and "998", and the third result is the entity pair with the ID's "9989" and "12345". Mappings between DBpedia URI's and Wikipedia page ID's are available from the DBpedia to Wikipedia page links which is part of the reference collection. Please restrict your results to the DBpedia URI's provided in the list of valid DBpedia URI's.
The effectiveness of the retrieval results submitted by the participants will be evaluated using the classical IR metrics, e.g. MAP, P@5, P@10, NDCG, and possibly others. All run submissions will be pooled, and we will set up a crowdsourcing interface on Amazon Mechanical Turk for the evaluation of these pools. Jeopardy runs will be evaluated in an entity-centric manner, i.e., only one (or a few) Wikipedia page ID's that point to the target entities which are demanded by the queries will be considered to be relevant. Wikipedia pages that are only mentioning the actual target entities will not be considered to be relevant.
January 17 | Reference collections available for download | |
February 1 | Supplementary collections available for download | |
March 15 | Topics for Ad-hoc & Jeopardy tasks distributed | |
May 15 | Run submission deadline | |
May 15-31 | Relevance assessments | |
June 8 | Release of assessments and results | |
June 15 | Submission of CLEF 2013 Working Notes papers | |
June 30 | Submission of CLEF 2013 Labs Overviews | |
Sep. 23-26 | CLEF 2013 Conference |