Bag-level Representations for Multiple Instance Learning
Mustafa Gökçe Baydoğan
Department of Industrial Engineering, Boğaziçi University
Abstract
Classification, one of the important class of supervised learning problems, vastly takes place in data mining tasks. In traditional classification tasks, each object is represented with a feature vector and the aim is to predict the label of the object given some training data. However this modest approach becomes weak when the data has a certain structure. For example, in image classification, images are segmented into patches and instead of a single feature vector, then a set of feature vectors derived from the patches are used to represent each image. This way, important information regarding the certain invariances such as location and scale can be taken into account. Change of object representation provides benefits for a wide range of applications such as bioinformatics, document retrieval, computer vision and etc. This type of applications fits well to Multiple Instance Learning (MIL) setting where each object is referred to as bag and each bag contains certain number of instances.
In MIL setting, instance label information is unavailable, which makes it difficult to apply regular supervised learning. To resolve this problem, researchers devise methods focusing on certain assumptions regarding the instance labels. However, it is not a trivial task to determine which assumption holds for a new type of MIL problem. A bag-level representation based on instance characteristics does not require assumptions about the instance labels and is shown to be successful in MIL tasks. These approaches mainly encode bag vectors using bag-of-features type of representations.
This talk will discuss two bag-level representation approaches. The first one implicitly learns a generalized Gaussian Mixture Model (GMM) on the instance feature space and transforms this information into a bag-level summary. The second one proposes linear programming approaches to solve the MIL classification problem. Our experiments on a large database of MIL problems (from computer vision and text mining domain) show that both approaches are highly scalable and their performance is competitive with the state-of-the-art algorithms.
This is a joint work with Emel Seyma Kucukasci and Z. Caner Taskin.
Short Bio
Mustafa Gökçe Baydoğan is an assistant professor in Department of Industrial Engineering at Boğaziçi University, Istanbul, Turkey. Before joining Boğaziçi University, he worked as a postdoctoral research assistant in the Security and Defense Systems Initiative at Arizona State University (ASU) between 2012-2013. He received his Ph.D. degree in Industrial Engineering from ASU in 2012. His B.S. and M.S. degrees are in Industrial Engineering both from Department of Industrial Engineering at Middle East Technical University, Ankara, Turkey in 2006 and 2008 respectively. His current research interests focus on statistical learning, with applications in temporal data mining (time series and images) and data mining for massive, multivariate data sets.
Details about him and his work can be reached through: www.mustafabaydogan.com.
Venue
Friday, November 3, 2017 at 4.00 pm in IE03