As the scale of today’s networked techno-social systems continues to increase, the analysis of their global phenomena becomes increasingly difficult, due to the continuous production of streams of data scattered among distributed, possibly resource-constrained nodes, and requiring reliable resolution in (near) real-time. The goal of LIFT is to enable the local detection of global phenomena and the efficient and effective detection of phase changes in very large data streams, where it is impossible or ineffective to accumulate all data into a single place. In addition, this will give rise to new methods for analyzing privacy-sensitive data, where it is not desirable to move data away from the point where it is collected. This will be facilitated by developing a theory based on the novel Safe-Zone-Approach and related methodologies.
Several real-world applications need to manage and reason about large amounts of data that are inherently uncertain. For instance, pervasive computing applications must constantly reason about volumes of noisy sensory readings, e.g., for motion prediction and human behavior modeling; information-extraction tools can assign different possible labels with varying degrees of confidence to segments of text, due to the uncertainties and noise present in free-text data. Such probabilistic data analyses require sophisticated machine-learning tools that can effectively model the complex correlation patterns present in real-life data. The proposed research will attack several of the key challenges arising in this novel PDBS paradigm (including, query processing, query optimization, data summarization, extensibility, and model learning and evolution), build usable prototypes, and investigate key application domains (e.g., information extraction).