HOME     |      PUBLICATIONS     |      PROJECTS     |      TEACHING     |      RESOURCES         

Ekaterini Ioannou

Software Technology and Network Applications Laboratory

Department of Electronic & Computer Engineering
Technical University of Crete
University Campus
73100, Crete, HELLAS

ioannou AT softnet.tuc.gr
EkateriniIoannou AT acm.org

Entity Request/Query Collections

This page provides our current collections with entity requests that can be used to evaluate methodologies
for entity linkage as well as entity search. Each entity request is accompanied with a small set of urls from the
systems in which they are described (e.g., Wikipedia url, OKKAM id, etc.). We are trying to collect requests from
various data sources (e.g., structured and unstructured data), so the requests of each collection have a different
format. The following paragraphs describe these collections.

(A) People Infromation

Entity requests (~7.000) from short description of people's data from Wikipedia. The text describing these
people was crawled and then processed using the OpenCalais extractor to extract the contained entities.
Some examples:
  - "Evelyn Dubrow" Country="US" Position="womenlabor advocate"
  - "Chris Harman" Position="activist" Position="journalist"
>>>  download

(B) News Events & Web Blog

Wikipedia contains small summaries of news events reported in various online systems, such as BBC and
Reuters. The OpenCalais extractor identified the entities in the events and for each entity it provided
the entity type along with a few name-value attributes describing the entity. We also used the OpenCalais
extractor to identify the entities described in a small set of blogs discussing political events, e.g.,
http://www.barackoblogger.com/. The following is a subset of ~1.500 entity requests resulted from this process.
Some examples:
  - Alex Rodriguez
  - name="Charles Krauthammer"
>>>  download

(C) Historical Records

Web pages sometimes contain local lists of entities, for example members of organizations, or historic records
in online newspapers. The following 183 entity requests corresponds to such data, i.e., no extraction process
was involved. Some examples:
  - Don Henderson British actor
  - George Wald American scientist recipient of the Nobel Prize in Physiology or Medicine
>>>  download

(D) DBPedia

Our last collection contains ~2.200 entity requests created from the srtuctured DBPedia data. Some examples:
  - name="Jessica Di Cicco" occupation="Actress Voice actress" yearsactive="1989-present"
  - birthname="Jessica Dereschuk" eyeColor="Green" ethnicity="WhiteCaucasian"
       hairColor="Blonde" years="2004"
>>>  download


Enabling Entity-Based Aggregators for Web 2.0 data   ::   pdf,   poster,   more info
Ekaterini Ioannou, Claudia Niederee, Yannis Velegrakis
In Proceedings of the 19th International World Wide Web Conference, April 26-30 2010, Raleigh, NC, USA.

Last modified: April 2011