HeisenData: Towards a Next-Generation Uncertain-Data Management System

Contract Information

Programme:	SEVENTH FRAMEWORK PROGRAMME
Programme Acronym:	FP7-PEOPLE
Contract Type:	MARIE CURIE ACTIONS-INTERNATIONAL RE-INTEGRATION GRANTS
Start Date:	2010-03-01
End Date:	2014-02-28
Contract No:	249217
Role for SoftNet:	prime
Funding for SoftNet:	100.000 Euros
Principal Investigator for SoftNet:	Minos Garofalakis

Project Information

Official Web Site: http://heisendata.softnet.tuc.gr/

Several real-world applications need to manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings, e.g., for motion prediction and human behavior modeling; information-extraction tools can assign different possible labels with varying degrees of confidence to segments of text, due to the uncertainties and noise present in free-text data. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex correlation patterns present in real-life data.

Unfortunately, to date, approaches to Probabilistic Database Systems (PDBSs) have relied on somewhat simplistic models of uncertainty that can be easily mapped onto existing relational architectures: Probabilities are typically associated with individual data tuples, with little or no support for capturing data correlations. This research proposal aims to design and build a novel, extensible PDBS that supports a broad class of statistical models and probabilistic-reasoning tools as first-class system objects, alongside a traditional relational-table store. Our proposed architecture will employ statistical models to effectively encode data-correlation patterns, and promote probabilistic inference as part of the standard database operator repertoire to support efficient and sound query processing. This tight coupling of relational databases and statistical models represents a major departure from conventional database systems, and many of the core system components need to be revisited and fundamentally re-thought.

The proposed research will attack several of the key challenges arising in this novel PDBS paradigm (including, query processing, query optimization, data summarization, extensibility, and model learning and evolution), build usable prototypes, and investigate key application domains (e.g., information extraction).