HOME     |      PUBLICATIONS     |      PROJECTS     |      TEACHING     |      RESOURCES         
 


Ekaterini Ioannou

Software Technology and Network Applications Laboratory

Department of Electronic & Computer Engineering
Technical University of Crete
University Campus
73100, Crete, HELLAS


Emails:
ioannou AT softnet.tuc.gr
EkateriniIoannou AT acm.org
 

News Articles Dataset

The dataset includes 94.829 news artiles, posted in Google News website. The RDF data
is extracted from these articles using the OpenCalais web service. This results to a total of
2.711.217 RDF triples. There are two files for each article: (a) the RDF file, and (b) the text file
(extension .txt). Download (1.2 Gbytes compressed, 12 Gbytes uncompressed)

Acknowledgments

Efficient Semantic-Aware Detection of Near Duplicate Resources   ::   pdf,   more info
Ekaterini Ioannou, Odysseas Papapetrou, Dimitrios Skoutas, Wolfgang Nejdl
In Proceedings of the 7th Extended Semantic Web Conference (ESWC), 30 May - 03 June 2010, Heraklion, Greece.




 
Last modified: April 2011