Adhoc Track (2009-2010) and Wikipedia Collection (2009-)
Documents
This collection is a 2,666,190 article dump of the Wikipedia taken on 8 October 2008, it is annotated with the 2008-w40-2 version of YAGO.
It is 50.7GB in size. It was prepared by Ralf Schenkel. For details please see (and cite) Ralf Schenkel, Fabian M. Suchanek,
Gjergji Kasneci (2007): YAWN: A semantically annotated Wikipedia XML corpus, 12. GI-Fachtagung fur Datenbanksysteme in Business, Technologie
und Web (BTW 2007), Aachen, Germany, March 2007.
Books and Social Search / Social Book Search Track (2011, 2012, 2013, 2014)
Documents
2011 Book Collection (from Amazon and LibaryThing, 7.1GiB)
Corpus License Agreement needed, see here for further information how to access the collection.
official qrels for the 2014 Suggestion task. official qrels for the 2013 Social Book Search task. official qrels for the 2012 Social Book Search task. inofficial qrels over all 300 topics of the INEX 2012 Social Book Search task. official qrels for the INEX 2012 Prove It! task. official qrels for 2011 using the LibraryThing work ID of books suggested in the LT discussion threads. The ISBNs in the submitted runs are mapped to LT work IDs as well, with the highest ranked ISBN being mapped to the work ID and lower ranked ISBNs mapped to the same ID removed from the results list. expanded qrels for 2011 with the work IDs expanded to all matching ISBNs. Multiple search results mapping to the same ID all contribute to the score. qrels for 2011 derived from the Mechanical Turk relevance judgements, for 24 of the 211 topics. perl script to map ISBNs to IDs in run. This script requires amazon-lt.isbn.thingID.gz for the mappings.
See the track page for the official datasets.
The auxiliary Wikipedia-LOD 2.0 collection is available from here (MPI Informatik server).
Wikipedia-LOD Collection V1.1 (2012)
The Wikipedia-LOD collection is available from here (MPI Informatik server).
It consists of 8 files in 7z format and contains approximately 2.7 million XML articles. The uncompressed size of the collection is 60 GB.
A DTD for the XML collection is available here.
2010 IMDB Collection (1.4GB) cleaned for use with INEX 2010 toolset Information courtesy of The Internet Movie Database (http://www.imdb.com). Used with permission.
This collection is the IMDB plain text files from the web site and dated as 2010-4-23 (converted into XML).
It is available for personal and non-commercial use. See the IMDb Licence.