nips nips2011 nips2011-168 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
Reference: text
sentIndex sentText sentNum sentScore
1 For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). [sent-8, score-0.732]
2 We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. [sent-12, score-0.267]
3 1 Introduction Traditional image categorization methods usually consider an image as one indiscrete entity, which, however, neglects an important fact that the semantic meanings (labels) of an image mostly arise from its constituent regions, but not the entire image. [sent-14, score-0.7]
4 For example, the labels “person” and “car” associated with the query image in Figure 1 are only characterized by the regions in two bounding boxes, respectively, rather than the whole image. [sent-15, score-0.367]
5 In recent years, image representation techniques using semi-local, or patch-based, features, such as SIFT, have demonstrated some of the best performance in image retrieval and object recognition applications. [sent-17, score-0.439]
6 Our task is to learn class specific distance metj rics Mk and significance coefficients wk from the training data, with which we compute the C2B distances from the classes to a query image X for classification. [sent-19, score-1.093]
7 Armed with these patch-based features, image categorization is recently formulated as a multi-instance learning (MIL) task [1–6]. [sent-22, score-0.241]
8 Under the framework of MIL, an image is viewed as a bag, which contains a number of instances corresponding to the regions in the image. [sent-23, score-0.299]
9 If any of these instances is related to a semantic concept, the image will be associated with the corresponding label. [sent-24, score-0.366]
10 1 Learning Class-to-Bag (C2B) distance for multi-instance data In MIL data objects are represented as bags of instances, therefore the distance between the objects is a set-to-set distance. [sent-27, score-0.807]
11 Compared to traditional single-instance data that use vector distance such as Euclidean distance, estimating the Bag-to-Bag (B2B) distance in MIL is more challenging [7, 8]. [sent-28, score-0.614]
12 Measuring the distance between images (bags) and classes was first introduced in [9] for object recognition, which used the Bag-to-Class (B2C) distance instead of the C2B distance. [sent-32, score-0.746]
13 Given a triplet constraint (i, p, n) that image i is more relevant to class p than it is to class n, the C2B distance formulates this as Dpi < Dni , while the B2C distance formulates this as Dip < Din . [sent-33, score-1.041]
14 To be more specific, for the C2B distance we only need to parameterize training instances, which are available during the training phase. [sent-35, score-0.423]
15 We first explore these challenges, as well as to find opportunities to enhance the C2B distance introduced above. [sent-40, score-0.317]
16 Due to the well-known semantic gap between low-level visual features and high-level semantic concepts [10], choosing an appropriate distance metric plays an important role in establishing an effective image categorization system, as well as other general MIL models. [sent-42, score-0.808]
17 Existing metric learning methods [5,6] for multi-instance data only learned one global metric for an entire data set. [sent-43, score-0.271]
18 However, multi-instance data by nature are highly heterogeneous, thus a homogeneous distance metric may not suffice to characterize different classes of objects in a same data set. [sent-44, score-0.55]
19 To this end, we consider to learn multiple distance metrics, one for each class, for a multi-instance data set to capture the correlations among the features within each object category. [sent-46, score-0.393]
20 The metrics are learned simultaneously by forming a maximum margin optimization problem with the constraints that the C2B distances from the correct classes of an training object to it should be less than the distances from other classes to it by a margin. [sent-47, score-0.7]
21 Figure 2: The learned SCs of the instances in a same image when they serve as training samples for different classes. [sent-64, score-0.385]
22 For example, in Figure 2(a) the SC of the horse instance in class “person” is 0. [sent-65, score-0.333]
23 As a result, the horse instance contributes a lot in the C2B distance from class “horse” to a query image, while having much less impact in the C2B distance from class “person” to a query image. [sent-68, score-1.194]
24 , a label is assigned to a bag as long as one of its instance belongs to the class. [sent-72, score-0.463]
25 As a result, although a bag is associated with a class, some, or even most, of its instances may not be truly related to the class. [sent-73, score-0.414]
26 For example, in the query image in Figure 1, the instance in the left bounding box does not contribute to the label “person”. [sent-74, score-0.442]
27 Intuitively, the instances in a super-bag of a class should not contribute equally in predicting labels for a query image. [sent-75, score-0.379]
28 Ideally, the SC of an instance with respect to its true belonging class should be large, whereas its SC with respect other classes should be small. [sent-79, score-0.333]
29 In Figure 2, we show the learned SCs for the instances in some images when they serve as training samples for their labeled classes. [sent-80, score-0.286]
30 Because the image in Figure 2(a) has two labels, this image, thereby its two instances, serves as a training sample for both class “person” (left) and “horse” (right). [sent-81, score-0.323]
31 Although the learned SC of the horse instance is very low when it is in the super-bag of “person” (0. [sent-82, score-0.265]
32 2 Learning C2B distance for multi-instance data via M3 I approach In this section, we first briefly formalize the MIL problem and the C2B distance for a multi-instance data set, where we provide the notations used in this paper. [sent-88, score-0.61]
33 Then we gradually develop the proposed M3 I approach to incorporate the class specific distance metrics and the locally adaptive SCs into the C2B distance, together with its learning algorithms. [sent-89, score-0.626]
34 , xni is a bag of ni i i j instances, where xi ∈ Rp is a vector of p dimensions. [sent-95, score-0.409]
35 The class assignment indicator yi ∈ {0, 1}K is a binary vector, with yi (k) = 1 indicating that bag Xi belongs to the k-th class and yi (k) = 0 T K otherwise. [sent-96, score-0.549]
36 , each bag belongs to exactly one K class, the data set is a single-label data set; if k=1 Yik ≥ 1, i. [sent-103, score-0.387]
37 , each bag may be associated with more than one class label, the data set is a multi-label data set [11–14]. [sent-105, score-0.429]
38 In the setting of MIL, we assume that (I) bag X is assigned to the k-th class ⇐⇒ at least one instance of X belongs to the k-th class; and (II) bag X is not assigned to the k-th class ⇐⇒ no instance of X belongs to the k-th class. [sent-106, score-1.048]
39 3 For convenience, we denote P (Xi ) as the classes that bag Xi belongs to (positive classes), and N (Xi ) as the classes that Xi does not belong to (negative classes). [sent-108, score-0.529]
40 , a set consisting of all the instances in every bag belonging to a class: Sk = s1 , . [sent-112, score-0.42]
41 , smk = xj | Yik = 1 , k i k (1) where sj is an instance of Sk that comes from one of the training bags belonging to the k-th class, k and mk = {i|Yik =1} ni is the total number of the instances in Sk . [sent-115, score-0.863]
42 Note that, in single-label data K where each bag belongs to only one class, we have Sk ∩ Sl = ∅ (∀ k = l) and k=1 mk = N i=1 ni . [sent-116, score-0.643]
43 In multi-label data where each bag (thereby each instance) may belong to more than one class [11–14], we have Sk ∩ Sl = ∅ (∀ k = l) and K mk ≥ N ni , i. [sent-117, score-0.685]
44 i The elementary distance from an instance in a super-bag to a bag is defined as the distance between this instance and its nearest neighbor instance in the bag: d sj , Xi = sj − ˜j sk k k 2 M , (2) where ˜j is the nearest neighbor instance of sj in Xi . [sent-120, score-1.727]
45 sk k Then we compute the C2B distance from a super-bag Sk to a data bag Xi as following: mk mk sj − ˜j sk k d s j , Xi = k D (Sk , Xi ) = j=1 2 . [sent-121, score-1.567]
46 1 Parameterized C2B distance of the M3 I approach Because the C2B distance defined in Eq. [sent-123, score-0.558]
47 In order to capture the second-order statistics of the input data that could potentially improve the subsequent classification [5, 6], we consider to use the Mahalanobis distance with an appropriate distance metric. [sent-129, score-0.584]
48 With the recognition of the high heterogeneity in multiinstance data, instead of learning a global distance metric as in existing works [5, 6], we propose to K learn K different class specific distance metrics {Mk }k=1 ⊂ Rp×p , one for each class. [sent-130, score-0.989]
49 Note that, using class specific distance metrics is only feasible with the distance between classes and bags (either C2B or B2C distance), because we are only concerned with intra-class distances. [sent-131, score-1.028]
50 In contrast, traditional B2B distance needs to compute the distances between bags belonging to different classes involving inter-class distance metrics, which inevitably complicates the problem. [sent-132, score-0.948]
51 (3), we compute C2B distance using the Mahalanobis distance as: mk sj − ˜j sk k D (Sk , Xi ) = T Mk sj − ˜j sk k . [sent-134, score-1.439]
52 Now we further develop the C2B distance defined in Eq. [sent-136, score-0.279]
53 We propose a locally adaptive C2B distance by weighting the instances in a super-bag upon their relevance to the corresponding class. [sent-138, score-0.523]
54 Due to the weak association between the instances and the bag labels, not every instance in a superbag of a class truly characterizes the corresponding semantic concept. [sent-139, score-0.767]
55 For example, in Figure 2(a) the region for the horse object is in the super-bag of “person” class, because the entire image is labeled with both “person” and “horse”. [sent-140, score-0.39]
56 As a result, intuitively, we should give a smaller, or even no, weight to the horse instance when determining whether to assign “person” label to a query 4 j image; and give it a higher weight when deciding “horse” label. [sent-141, score-0.403]
57 To be more precise, let wk be the weight associated with sj , we wish to learn the C2B distance as following: k mk j wk sj − ˜j sk k D (Sk , Xi ) = T Mk sj − ˜j sk k . [sent-142, score-1.676]
58 (5) j=1 j Because wk reflects the relative importance of instance sj when determining the label for the k-th k class, we call it as the “Significance Coefficient (SC)” of sj . [sent-143, score-0.555]
59 2 Procedures to learn Mk and wk Given the parameterized C2B distance defined in Eq. [sent-145, score-0.52]
60 In the following, we formulate two maximum margin optimization problems to learn the two sets of j target variables Mk and wk , one for each of them. [sent-149, score-0.275]
61 Let dM sj , Xi = sj − ˜j sk k k T m 1 we denote dki = dM s1 , Xi , . [sent-165, score-0.447]
62 T Mk sj − ˜j , sk k T , by the defini(8) In order to make use of the standard large-margin classification framework and simplify our derivaT T T tion, following [17] we expand our notations. [sent-175, score-0.323]
63 Similarly, we expand the distance vectors and let dipn be a vector of the same length as w, such that all its entries are 0 except the subranges corresponding to class p and class n, which are set to be −dpi and dni respectively. [sent-181, score-0.644]
64 It is straightforward to to verify that T T wn dni − wp dpi = wT dipn . [sent-182, score-0.25]
65 The positivity constraint on the elements of w is due to the fact that our goal is to define a distance function which, by definition, is a positive definite operator. [sent-188, score-0.279]
66 Upon solution, we obtain w, which can be decomposed into the expected j instance weights wk for every instance with respect to its labeled classes. [sent-194, score-0.371]
67 3 Label prediction using C2B distance Solving the optimization problems in Eq. [sent-196, score-0.279]
68 (10) for an input multi-instance data set D, we obtain the learned class specific distance metrics Mk (1 ≤ k ≤ K) and the significance coefficients j j wk (1 ≤ k ≤ K, 1 ≤ j ≤ mk ). [sent-198, score-0.998]
69 Given a query bag X, upon the learned Mk and wk we can compute the parameterized C2B distances D (Sk , X) (1 ≤ k ≤ K) from all the classes to the query bag using Eq. [sent-199, score-1.162]
70 For single-label multi-instance data, in which each bag belongs to one and only one class, we assign X to the class with the minimum C2B distance, i. [sent-202, score-0.475]
71 For multi-label multi-instance data, in which one bag can be associated with more than one class label, we need a threshold to make prediction. [sent-205, score-0.377]
72 For every class, we learn a threshold from the training N K data as bk = i=1 Yik D (Sk , Xi ) / i=1 Yik , which is the average of the C2B distances from the k-th class to all its training bags. [sent-206, score-0.354]
73 Then we determine the class membership for X using the following rule: assign X to the k-th class if D (Sk , X) < bk , and not otherwise. [sent-207, score-0.247]
74 Due to the unsatisfactory performance and high computational complexity of machine vision models using B2B distance, a new perspective to compute B2C distance was presented in [9]. [sent-209, score-0.314]
75 [18] further developed this method by introducing distance metrics, to achieve better results with a small amount of training. [sent-214, score-0.279]
76 1, B2C distance is hard to parameterize in may real world applications. [sent-216, score-0.319]
77 To tackle this, we propose to use C2B distance for multi-instance data. [sent-217, score-0.279]
78 As demonstrated in literature [5, 6], learning a distance metric from training data to maintain class information is beneficial for MIL. [sent-219, score-0.554]
79 Recognizing this, we propose to learn multiple distance metrics, one for each class. [sent-221, score-0.313]
80 Due to the weak label association in MIL, instead of considering all the instance equally important, we assign locally adaptive SCs to every instance in training data. [sent-224, score-0.427]
81 Locally adaptive distance was first introduced in [16, 17] for B2B distance. [sent-225, score-0.319]
82 Compared to it, the proposed C2B distance is more advantageous. [sent-226, score-0.279]
83 First, C2B distance measures the relevance between a class and a bag, hence label prediction can be naturally made upon the resulted distance, whereas an additional classification step [16] or transformation [17] is required when B2B distance is used. [sent-227, score-0.774]
84 Second, C2B distance directly assesses the relations between semantic concepts and image regions, hence it could narrow the gap between high-level semantic concepts and low-level visual features. [sent-228, score-0.701]
85 Last, but not least, our C2B distance requires significantly less computation. [sent-229, score-0.279]
86 Specifically, the triplet constraints used in C2B model are constructed between classes and bags whose number is O N K 2 , while those used in B2B model [16, 17] are constructed between bags with number of O N 3 . [sent-230, score-0.416]
87 1 Classification on single-label image data Because the proposed M3 I approach comprises two components, class specific metrics and significant coefficients, we implement four versions of our approach and evaluate their performances: (1) the simplest C2B distance, denoted as “C2B”, computed by Eq. [sent-309, score-0.43]
88 (3), in which no learning is involved; (2) C2B distance with class specific metrics, denoted as “C2B-M”, computed by Eq. [sent-310, score-0.386]
89 (4); (3) C2B distance with SCs, denoted as “C2B-SC” by Eq. [sent-311, score-0.279]
90 (5) and set Mk = I; and (4) the C2B distance computed by proposed M3 I approach using Eq. [sent-312, score-0.279]
91 These two methods are not MIL methods, therefore we consider each instance as an image descriptor following [9, 18]. [sent-318, score-0.246]
92 2 Classification on multi-label image data Multi-label data refers to data sets in which an image can be associated with more than one semantic concept, which is more challenging but closer to real world applications than single-label data [21]. [sent-329, score-0.531]
93 (3) Distance metric (DM) method [5] and (4) MildML method [6] learn a global distance metric from multi-instance data to compute B2B distances, therefore an additional classification step is 7 Table 3: Classification performance of comparison on PASCAL VOC 2010 data. [sent-334, score-0.519]
94 For example, the instance of the person in the inner bounding box of the image in Figure 2(b) has comparably higher SC than the car instance in the outer bounding box when considering “person” class. [sent-448, score-0.614]
95 These observations are consistent with our intuitions and theoretical analysis, because the person instance contribute considerably large in characterizing the “person” concept, whereas it contributes much less, or even possibly harmful, in characterizing the “car” concept. [sent-450, score-0.276]
96 These interesting results provide concrete evidences to support the proposed M3 I method’s capability in revealing the semantic insight of a multi-instance image data set. [sent-452, score-0.289]
97 5 Conclusions In this paper, we proposed a novel Maximum Margin Multi-Instance Learning (M3 I) method, which, instead of using the B2B distance as in many existing methods, approached MIL from a new perspective using the C2B distance to directly assess the relevance between classes and bags. [sent-453, score-0.724]
98 We applied the proposed M3 I method in image categorization tasks on three benchmark data sets, one for single-label classification and two for multi-label classification. [sent-455, score-0.267]
99 Multiple instance metric learning from automatically labeled bags of faces. [sent-500, score-0.333]
100 Learning globally-consistent local distance functions for shape-based image retrieval and classification. [sent-572, score-0.474]
wordName wordTfidf (topN-words)
[('mil', 0.35), ('distance', 0.279), ('bag', 0.27), ('scs', 0.261), ('ipn', 0.241), ('mk', 0.235), ('sk', 0.199), ('wk', 0.179), ('person', 0.169), ('image', 0.164), ('horse', 0.144), ('mimlboost', 0.141), ('mimlsvm', 0.141), ('bags', 0.133), ('metrics', 0.133), ('sj', 0.124), ('sc', 0.112), ('class', 0.107), ('instances', 0.103), ('semantic', 0.099), ('query', 0.098), ('classes', 0.097), ('xi', 0.092), ('yik', 0.091), ('metric', 0.09), ('arlington', 0.088), ('distances', 0.083), ('instance', 0.082), ('dipn', 0.08), ('categorization', 0.077), ('wang', 0.074), ('classi', 0.074), ('dni', 0.071), ('dpi', 0.071), ('locally', 0.067), ('belongs', 0.065), ('car', 0.063), ('dm', 0.062), ('margin', 0.062), ('assesses', 0.06), ('mildml', 0.06), ('object', 0.054), ('triplet', 0.053), ('sdp', 0.053), ('nie', 0.053), ('texas', 0.053), ('training', 0.052), ('ni', 0.047), ('belonging', 0.047), ('label', 0.046), ('labels', 0.046), ('cance', 0.043), ('truly', 0.041), ('heterogeneity', 0.041), ('kamangar', 0.04), ('miml', 0.04), ('smk', 0.04), ('superbag', 0.04), ('adaptive', 0.04), ('parameterize', 0.04), ('learned', 0.039), ('opportunities', 0.038), ('ding', 0.038), ('dd', 0.038), ('pascal', 0.038), ('eccv', 0.038), ('images', 0.037), ('voc', 0.036), ('heng', 0.035), ('perspective', 0.035), ('huang', 0.034), ('relevance', 0.034), ('learn', 0.034), ('assign', 0.033), ('meanings', 0.032), ('frome', 0.032), ('objects', 0.032), ('regions', 0.032), ('retrieval', 0.031), ('multilabel', 0.03), ('traditional', 0.03), ('sp', 0.03), ('resulted', 0.029), ('mahalanobis', 0.029), ('hua', 0.029), ('labeled', 0.028), ('wp', 0.028), ('parameterized', 0.028), ('bounding', 0.027), ('serve', 0.027), ('sl', 0.027), ('performances', 0.026), ('ck', 0.026), ('recognition', 0.026), ('formulates', 0.026), ('challenges', 0.026), ('data', 0.026), ('contribute', 0.025), ('weak', 0.025), ('cation', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
2 0.19869985 181 nips-2011-Multiple Instance Learning on Structured Data
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
3 0.16388264 165 nips-2011-Matrix Completion for Multi-label Image Classification
Author: Ricardo S. Cabral, Fernando Torre, Joao P. Costeira, Alexandre Bernardino
Abstract: Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. 1
4 0.15141681 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
Author: Bin Zhao, Fei Li, Eric P. Xing
Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1
5 0.14763959 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
6 0.13706739 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
7 0.13424878 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
8 0.12188654 154 nips-2011-Learning person-object interactions for action recognition in still images
9 0.11728967 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
10 0.11667194 180 nips-2011-Multiple Instance Filtering
11 0.11194799 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
12 0.11121328 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
13 0.10766479 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
14 0.10346613 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
15 0.094325483 171 nips-2011-Metric Learning with Multiple Kernels
16 0.090380967 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
17 0.087061115 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
18 0.08383669 231 nips-2011-Randomized Algorithms for Comparison-based Search
19 0.08322081 193 nips-2011-Object Detection with Grammar Models
20 0.081826977 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
topicId topicWeight
[(0, 0.213), (1, 0.113), (2, -0.173), (3, 0.134), (4, 0.064), (5, 0.077), (6, 0.011), (7, -0.031), (8, -0.048), (9, 0.043), (10, -0.018), (11, 0.049), (12, -0.02), (13, 0.052), (14, -0.02), (15, -0.029), (16, -0.143), (17, 0.079), (18, -0.002), (19, 0.003), (20, 0.075), (21, 0.043), (22, 0.195), (23, -0.042), (24, 0.04), (25, 0.067), (26, 0.025), (27, -0.089), (28, 0.051), (29, -0.016), (30, -0.035), (31, 0.007), (32, -0.005), (33, 0.13), (34, 0.031), (35, 0.043), (36, 0.085), (37, 0.105), (38, 0.085), (39, -0.12), (40, 0.014), (41, -0.104), (42, 0.033), (43, 0.001), (44, -0.129), (45, -0.095), (46, 0.054), (47, 0.002), (48, -0.058), (49, -0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.95755237 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
2 0.70191741 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows
Author: Bogdan Alexe, Viviana Petrescu, Vittorio Ferrari
Abstract: We present a computationally efficient technique to compute the distance of highdimensional appearance descriptor vectors between image windows. The method exploits the relation between appearance distance and spatial overlap. We derive an upper bound on appearance distance given the spatial overlap of two windows in an image, and use it to bound the distances of many pairs between two images. We propose algorithms that build on these basic operations to efficiently solve tasks relevant to many computer vision applications, such as finding all pairs of windows between two images with distance smaller than a threshold, or finding the single pair with the smallest distance. In experiments on the PASCAL VOC 07 dataset, our algorithms accurately solve these problems while greatly reducing the number of appearance distances computed, and achieve larger speedups than approximate nearest neighbour algorithms based on trees [18] and on hashing [21]. For example, our algorithm finds the most similar pair of windows between two images while computing only 1% of all distances on average. 1
3 0.6486609 181 nips-2011-Multiple Instance Learning on Structured Data
Author: Dan Zhang, Yan Liu, Luo Si, Jian Zhang, Richard D. Lawrence
Abstract: Most existing Multiple-Instance Learning (MIL) algorithms assume data instances and/or data bags are independently and identically distributed. But there often exists rich additional dependency/structure information between instances/bags within many applications of MIL. Ignoring this structure information limits the performance of existing MIL algorithms. This paper explores the research problem as multiple instance learning on structured data (MILSD) and formulates a novel framework that considers additional structure information. In particular, an effective and efficient optimization algorithm has been proposed to solve the original non-convex optimization problem by using a combination of ConcaveConvex Constraint Programming (CCCP) method and an adapted Cutting Plane method, which deals with two sets of constraints caused by learning on instances within individual bags and learning on structured data. Our method has the nice convergence property, with specified precision on each set of constraints. Experimental results on three different applications, i.e., webpage classification, market targeting, and protein fold identification, clearly demonstrate the advantages of the proposed method over state-of-the-art methods. 1
4 0.64632827 165 nips-2011-Matrix Completion for Multi-label Image Classification
Author: Ricardo S. Cabral, Fernando Torre, Joao P. Costeira, Alexandre Bernardino
Abstract: Recently, image categorization has been an active research topic due to the urgent need to retrieve and browse digital images via semantic keywords. This paper formulates image categorization as a multi-label classification problem using recent advances in matrix completion. Under this setting, classification of testing data is posed as a problem of completing unknown label entries on a data matrix that concatenates training and testing features with training labels. We propose two convex algorithms for matrix completion based on a Rank Minimization criterion specifically tailored to visual data, and prove its convergence properties. A major advantage of our approach w.r.t. standard discriminative classification methods for image categorization is its robustness to outliers, background noise and partial occlusions both in the feature and label space. Experimental validation on several datasets shows how our method outperforms state-of-the-art algorithms, while effectively capturing semantic concepts of classes. 1
5 0.64155078 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
6 0.63424158 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
7 0.61785597 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
8 0.59245729 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
9 0.57685113 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
10 0.56400371 150 nips-2011-Learning a Distance Metric from a Network
11 0.55604523 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
12 0.55362707 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
13 0.50749254 169 nips-2011-Maximum Margin Multi-Label Structured Prediction
14 0.50640631 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
15 0.50599968 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
16 0.49655992 193 nips-2011-Object Detection with Grammar Models
17 0.49252349 171 nips-2011-Metric Learning with Multiple Kernels
18 0.48443693 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
19 0.47319293 293 nips-2011-Understanding the Intrinsic Memorability of Images
20 0.466876 180 nips-2011-Multiple Instance Filtering
topicId topicWeight
[(0, 0.02), (4, 0.09), (14, 0.22), (20, 0.075), (26, 0.038), (31, 0.078), (33, 0.097), (43, 0.041), (45, 0.115), (57, 0.045), (65, 0.013), (74, 0.047), (83, 0.02), (99, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.78529632 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
2 0.71823543 61 nips-2011-Contextual Gaussian Process Bandit Optimization
Author: Andreas Krause, Cheng S. Ong
Abstract: How should we design experiments to maximize performance of a complex system, taking into account uncontrollable environmental conditions? How should we select relevant documents (ads) to display, given information about the user? These tasks can be formalized as contextual bandit problems, where at each round, we receive context (about the experimental conditions, the query), and have to choose an action (parameters, documents). The key challenge is to trade off exploration by gathering data for estimating the mean payoff function over the context-action space, and to exploit by choosing an action deemed optimal based on the gathered data. We model the payoff function as a sample from a Gaussian process defined over the joint context-action space, and develop CGP-UCB, an intuitive upper-confidence style algorithm. We show that by mixing and matching kernels for contexts and actions, CGP-UCB can handle a variety of practical applications. We further provide generic tools for deriving regret bounds when using such composite kernel functions. Lastly, we evaluate our algorithm on two case studies, in the context of automated vaccine design and sensor management. We show that context-sensitive optimization outperforms no or naive use of context. 1
3 0.66999274 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
Author: Joseph Keshet, David A. McAllester
Abstract: We consider latent structural versions of probit loss and ramp loss. We show that these surrogate loss functions are consistent in the strong sense that for any feature map (finite or infinite dimensional) they yield predictors approaching the infimum task loss achievable by any linear predictor over the given features. We also give finite sample generalization bounds (convergence rates) for these loss functions. These bounds suggest that probit loss converges more rapidly. However, ramp loss is more easily optimized on a given sample. 1
4 0.66692996 127 nips-2011-Image Parsing with Stochastic Scene Grammar
Author: Yibiao Zhao, Song-chun Zhu
Abstract: This paper proposes a parsing algorithm for scene understanding which includes four aspects: computing 3D scene layout, detecting 3D objects (e.g. furniture), detecting 2D faces (windows, doors etc.), and segmenting background. In contrast to previous scene labeling work that applied discriminative classifiers to pixels (or super-pixels), we use a generative Stochastic Scene Grammar (SSG). This grammar represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations. Production rules: (i) AND rules represent the decomposition of an entity into sub-parts; (ii) OR rules represent the switching among sub-types of an entity; (iii) SET rules represent an ensemble of visual entities. Contextual relations: (i) Cooperative “+” relations represent positive links between binding entities, such as hinged faces of a object or aligned boxes; (ii) Competitive “-” relations represents negative links between competing entities, such as mutually exclusive boxes. We design an efficient MCMC inference algorithm, namely Hierarchical cluster sampling, to search in the large solution space of scene configurations. The algorithm has two stages: (i) Clustering: It forms all possible higher-level structures (clusters) from lower-level entities by production rules and contextual relations. (ii) Sampling: It jumps between alternative structures (clusters) in each layer of the hierarchy to find the most probable configuration (represented by a parse tree). In our experiment, we demonstrate the superiority of our algorithm over existing methods on public dataset. In addition, our approach achieves richer structures in the parse tree. 1
5 0.66598386 202 nips-2011-On the Universality of Online Mirror Descent
Author: Nati Srebro, Karthik Sridharan, Ambuj Tewari
Abstract: We show that for a general class of convex online learning problems, Mirror Descent can always achieve a (nearly) optimal regret guarantee. 1
6 0.66324711 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
7 0.65960073 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
8 0.65946597 154 nips-2011-Learning person-object interactions for action recognition in still images
9 0.65200889 227 nips-2011-Pylon Model for Semantic Segmentation
10 0.65119243 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
11 0.64451569 303 nips-2011-Video Annotation and Tracking with Active Learning
12 0.63973844 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
13 0.63863003 165 nips-2011-Matrix Completion for Multi-label Image Classification
14 0.63654816 283 nips-2011-The Fixed Points of Off-Policy TD
15 0.63606608 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
16 0.63606334 231 nips-2011-Randomized Algorithms for Comparison-based Search
17 0.63174194 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
18 0.63041025 180 nips-2011-Multiple Instance Filtering
19 0.62920886 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs
20 0.62481797 35 nips-2011-An ideal observer model for identifying the reference frame of objects