Informatik-Kolloquium Di 28.05.2010, 10 Uhr
Informatik-Kolloquium
am Freitag, den 28.05.2010, um 10:00 Uhr
im Raum U151
es spricht: Prof. Dr. Jörg Sander University of Alberta, Edmonton, Canada
über: Finding Non-Redundant, Statistically Significant Regions in High Dimensional Data: an Approach to Projected and Subspace Clustering
Zusammenfassung:
Projected and subspace clustering algorithms search for clusters of points in subsets of attributes. Projected clustering algorithms look for several disjoint clusters so that each cluster exists in its own subset of attributes. Subspace clustering algorithms search for all clusters in all subsets of attributes, typically producing a large number of overlapping clusters. A general problem with many existing approaches is that it is difficult to assess whether the reported clusters are an artefact of the algorithm or whether they actually represent non-random structure of the data. In this talk, I am going to present a different problem formulation and method for finding projected clusters that aims at extracting axis-parallel regions that stand out in the data in a statistical sense. The set of axis-parallel, statistically significant regions that exist in a given data set is typically highly redundant. Our method is based on the assumption that those regions are "generated" by a much smaller subset of non-redundant, statistically significant regions. Finding such a set of regions/projected clusters using an exhaustive search is infeasible, but it seems possible to design heuristic approaches that can work well in many scenarios. In the talk, I will present our method STATPC, and a comprehensive experimental evaluation that shows that STATPC can significantly outperforms existing projected and subspace clustering algorithms in terms of accuracy.
Alle Interessierten sind herzlich eingeladen.