Dies sind die archivierten Webseiten des Lehrstuhls für Programmierung und Softwaretechnik (PST).
Die Seiten des Software and Computational Systems Lab (SoSy) finden Sie auf https://www.sosy-lab.org/.

Informatik-Kolloquium Do, 07.04.2011, 10 Uhr

— abgelegt unter:

Prof. Dr. Reynold Cheng: Explore or Exploit? Effective Strategies for Disambiguating Large Databases, University of Hong Kong

Was
  • Kolloquium
Wann 07.04.2011
von 10:00 bis 12:00
Wo Raum 157
Termin übernehmen vCal
iCal

Einladung zum Vortrag

=========================================================

am Donnerstag, den 07.04.2011, um 10.00 Uhr
-------------------------------------------------
im Raum 157, Oettingenstr. 67
-------------------------------------------------

es spricht: Prof. Dr. Reynold Cheng, University of Hong Kong

über: Explore or Exploit? Effective Strategies for Disambiguating Large Databases

Zusammenfassung:

Data ambiguity is inherent in applications such as data integration,
location-based services, and sensor monitoring. In many situations, it
is possible to “clean”, or remove, ambiguities from these databases. For
example, the GPS location of a user is inexact due to measurement
errors, but context information (e.g., what a user is doing) can be used
to reduce the imprecision of the location value. In order to obtain a
database with a higher quality, we study how to disambiguate a database
by appropriately selecting candidates to clean. This problem is
challenging because cleaning involves a cost, is limited by a budget,
may fail, and may not remove all ambiguities. Moreover, the statistical
information about how likely database objects can be cleaned may not be
precisely known. We tackle these challenges by proposing two kinds of
algorithms. The first type makes use of greedy heuristics to make
sensible decisions; however, these algorithms do not make use of
cleaning information and require user input for parameters to achieve
high cleaning effectiveness. We propose the Explore-Exploit (or EE)
algorithm, which gathers valuable information during the cleaning
process to determine how the remaining cleaning budget should be invested.

We also study how to fine-tune the parameters of EE in order to achieve
optimal cleaning effectiveness. Experimental evaluations on real and
synthetic datasets validate the effectiveness and efficiency of our
approaches.