Are the Labels Informative Enough? - Semi–Supervised Probabilistic Distance Clustering and the Uncertainty of Classification
Department of Industrial Engineering, METU
In this study we first discuss unsupervised and semi-supervised clustering and then focus on the latter one. Semi–supervised clustering is an attempt to reconcile clustering (unsupervised learning) and classification (supervised learning, using prior information on the data.) These two modes of data analysis are combined in a parameterized model. The results (cluster centers, classification rule) depend on the parameter θ, an insensitivity to θ indicates that the prior information is in agreement with the intrinsic cluster structure, and is otherwise redundant. This explains why some data sets in the literature give good results for all reasonable classification methods. The uncertainty of classification is represented here by the geometric mean of the membership probabilities, shown to be an entropic distance related to the Kullback–Leibler divergence.
Cem Iyigun is an Asssociate Professor in the Industrial Engineering Department at Middle East Technical University (METU). Prior to joining METU in 2009, he worked as a visiting assistant professor in Management Science and Information Systems department at Rutgers Business School. He received his Ph.D. in 2007 from Rutgers Center for Operations Research (RUTCOR) at Rutgers University. His research interests lies primarily in data mining problems. He works on clustering, classification algorithms, time series clustering and classification models with the applications on bioinformatics, climatology and forecasting. His research also includes continuous facility location problems and hub location problems.
Friday, December 22, 2017 at 4.00 pm in IE03