nips nips2011 nips2011-151 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. [sent-6, score-0.512]
2 Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). [sent-7, score-0.569]
3 In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. [sent-8, score-0.737]
4 Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. [sent-9, score-0.389]
5 To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. [sent-10, score-0.913]
6 We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. [sent-18, score-0.398]
7 In particular, recent work shows promising results when integrating powerful feature selection techniques, whether through kernel combination [1, 2], sparse coding dictionaries [3], structured sparsity regularization [4, 5], or metric learning approaches [6, 7, 8, 9, 10]. [sent-23, score-0.499]
8 However, typically the semantic information embedded in the learned features is restricted to the category labels on image exemplars. [sent-24, score-0.369]
9 For example, a learned metric generates (dis)similarity constraints using instances with the different/same class label; multiple kernel learning methods optimize feature weights to minimize class prediction errors; group sparsity regularizers exploit class labels to guide the selected dimensions. [sent-25, score-0.508]
10 Unfortunately, this means richer information about the meaning of the target object categories is withheld from the learned representations. [sent-26, score-0.218]
11 1 We propose a metric learning approach to learn discriminative visual representations while also exploiting external knowledge about the target objects’ semantic similarity. [sent-28, score-0.637]
12 First, we construct a tree of metrics (ToM) to directly capture the hierarchical structure. [sent-33, score-0.301]
13 In this tree, each metric is responsible for discriminating among its immediate object subcategories. [sent-34, score-0.396]
14 Specifically, we learn one metric for each non-leaf node, and require it to satisfy (dis)similarity constraints generated among its subtree members’ training instances. [sent-35, score-0.261]
15 We use a variant of the large-margin nearest neighbor objective [11], and augment it with a regularizer for sparsity in order to unify Mahalanobis parameter learning with a simple means of feature selection. [sent-36, score-0.376]
16 Second, rather than learn the metrics at each node independently, we introduce a novel regularizer for disjoint sparsity that couples each metric with those of its ancestors. [sent-37, score-1.029]
17 This regularizer specifies that a disjoint set of features should be selected for a given node and its ancestors, respectively. [sent-38, score-0.556]
18 Intuitively, this represents that the visual features most useful to distinguish the coarse-grained classes (e. [sent-39, score-0.27]
19 The ideas of exploiting label hierarchy and model sparsity are not completely new to computer vision and machine learning researchers. [sent-48, score-0.315]
20 Parameter sparsity is increasingly used to derive parsimonious models with informative features [4, 5, 3]. [sent-50, score-0.236]
21 Our novel contribution lies in the idea of ToM and disjoint sparsity together as a new strategy for visual feature learning. [sent-51, score-0.568]
22 Rather than relying on learners to discover both sparse features and a visual hierarchy fully automatically, we use external “real-world” knowledge expressed in hierarchical structures to bias which sparsity patterns we want the learned discriminative feature representations to exhibit. [sent-53, score-0.725]
23 We demonstrate that the proposed ToM outperforms both global and multiplemetric metric learning baselines that have similar objectives but lack the hierarchical structure and proposed disjoint sparsity regularizer. [sent-56, score-0.848]
24 In addition, we show that when the dimensions of the original feature space are interpretable (nameable) visual attributes, the disjoint features selected for superand sub-classes by our method can be quite intuitive. [sent-57, score-0.532]
25 One way to regularize visual feature selection is to prefer that object categories share features, so as to speed up object detection [19]; more recent work uses group sparsity to impose some sharing among the (un)selected features within an object category or view [4, 5]. [sent-62, score-0.787]
26 We instead seek disjoint features between coarse and fine categories, such that the regularizer helps to focus on useful differences across levels. [sent-63, score-0.476]
27 Good visual metrics can be trained with boosting [20, 21], feature weight learning [6], or Mahalanobis metric learning methods [7, 8, 10]. [sent-65, score-0.589]
28 An array of Mahalanobis metric learners has been developed in the machine learning literature [22, 23, 11]. [sent-66, score-0.261]
29 The idea of using multiple “local” metrics to cover a complex feature space is not new [24, 9, 10, 25]; however, in contrast to our approach, existing methods resort to clustering or (flat) class labels to determine the partitioning of training instances to metrics. [sent-67, score-0.239]
30 No previous work explores mapping the semantic hierarchy to a ToM, nor couples metrics across the hierarchy levels, as we propose. [sent-70, score-0.527]
31 Previous metric learning work integrates feature learning and selection via a regularizer for sparsity [27], as we do here. [sent-72, score-0.556]
32 However, whereas that approach targets sparsity in the linear transformed space, ours targets sparsity in the original feature space, and, most importantly, also includes a disjoint sparsity regularizer. [sent-73, score-0.751]
33 The “orthogonal transfer” by [28] is most closely related in spirit to our goal of selecting disjoint features. [sent-76, score-0.28]
34 However, unlike [28], our regularizer will yield truly disjoint features when minimized—a property hinging on the metric-based classification scheme we have chosen. [sent-77, score-0.476]
35 External semantics beyond object class labels are rarely used in today’s object recognition systems, but recent work has begun to investigate new ways to integrate richer knowledge. [sent-81, score-0.292]
36 While semantic structure need not always translate into helping visual feature selection, the correlation between WordNet semantics and visual confusions observed in [32] supports our use of the knowledge base in this work. [sent-84, score-0.488]
37 Of this work, our goals most relate to [14], but our focus is on learning features discriminatively and biasing toward a disjoint feature set via regularization. [sent-88, score-0.443]
38 Beyond taxonomies, researchers are also injecting semantics by learning mid-level nameable “attributes” for object categorization (e. [sent-89, score-0.251]
39 We show that when our method is applied to attributes as base features, the disjoint sparsity effects appear to be fairly interpretable. [sent-92, score-0.523]
40 We then describe an 1 -norm based regularization for selecting a sparse set of features in the context of metric learning. [sent-94, score-0.4]
41 Building on that, we proceed to describe our main algorithmic contribution, that is, the design of a metric learning algorithm that prefers not only sparse but also disjoint features for discriminating different categories. [sent-95, score-0.683]
42 1 Distance metric learning Many learning algorithms depend on calculating distances between samples, notably k-nearest neighbor classifiers or clustering. [sent-97, score-0.302]
43 While the default is to use the Euclidean distance, the more general Mahalanobis metric is often more suitable. [sent-98, score-0.261]
44 For two data points xi , xj ∈ RD , their (squared) Mahalanobis distance is given by d2 (xi , xj ) = (xi − xj )T M (xi − xj ), M (1) where M is a positive semidefinite matrix M 0. [sent-99, score-0.218]
45 Most methods follow an intuitively appealing strategy: a good metric M should pull data points belonging to the same class closer and push away data points belonging to different classes. [sent-104, score-0.261]
46 LMNN identifies the optimal M as the solution to, min M d2 (xi , xj ) + γ M (M ) = 0 i subject to 1+ j∈x+ i d2 (xi , xj ) M − ξijl (2) ijl d2 (xi , xl ) M ≤ ξijl ; ξijl ≥ 0 . [sent-108, score-0.23]
47 Our approach extends previous work on metric learning in two aspects: i) we apply a sparsity-based regularization to identify informative features (Section 3. [sent-113, score-0.4]
48 2); ii) at the same time, we seek metrics that rely on disjoint subsets of features for categories at different semantic granularities (Section 3. [sent-114, score-0.774]
49 2 Sparse feature selection for metric learning How can we learn a metric such that only a sparse set of features are relevant? [sent-117, score-0.685]
50 Therefore, analogous to the use of 1 -norm by the popular LASSO technique [34], we add the norm of M ’s diagonal elements to the large margin metric learning criterion (M ) in eq. [sent-120, score-0.298]
51 Note that since the matrix trace Trace[·] is a linear function of its argument, this sparse feature metric learning problem remains a convex optimization. [sent-123, score-0.324]
52 3 Learning a tree of metrics (ToM) with disjoint visual features How can we learn a tree of metrics so each metric uses features disjoint from its ancestors’? [sent-125, score-1.562]
53 Using disjoint features To characterize the “disjointness” between two metrics Mt and Mt , we use the vectors of their nonnegative diagonal elements vt and vt as proxies to which features are (more heavily) used. [sent-126, score-0.742]
54 If a feature dimension is used by both metrics heavily, then the competition is high. [sent-130, score-0.239]
55 (4) as a regularization term will encourage different metrics to use disjoint features. [sent-132, score-0.495]
56 Learning a tree of metrics Formally, assume we have a tree T where each node corresponds to a category. [sent-134, score-0.356]
57 We learn a metric Mt to differentiate its children categories c(t). [sent-136, score-0.338]
58 4 To learn our metrics {Mt }T , we apply similar strategies of learning metrics for large-margin t=1 nearest neighbor classifiers. [sent-138, score-0.433]
59 Each metric learning problem is in the style of the sparse feature metric learning eq. [sent-141, score-0.585]
60 However, more importantly, these metric learning problems are coupled together through the disjoint regularization. [sent-143, score-0.541]
61 Our disjoint regularization encourages a metric Mt to use different sets of features from its super-categories—categories on the tree path from the root. [sent-144, score-0.73]
62 Thus, for the large-scale problems we focus on, we use a simpler and computationally more efficient strategy of Sequential Optimization (SO) by sequentially optimizing one metric at a time. [sent-149, score-0.261]
63 Specifically, we optimize the metric at the root node and then its children, assuming the metric at the root is fixed. [sent-150, score-0.602]
64 We then recursively (in breadth-first-search) optimize the rest of the metrics, always treating the metrics at the higher level of the hierarchy as fixed. [sent-151, score-0.281]
65 In addition, the SO procedure allows each metric to be optimized with different parameters and prevents a badly-learned low-level metric from influencing upper-level ones through the disjoint regularization terms. [sent-153, score-0.841]
66 ) Using a tree of metrics for classification Once the metrics at all nodes are learned, they can be used for several classification tasks (e. [sent-155, score-0.45]
67 In this work, we study two tasks in particular: 1) We consider “per-node classification”, where the metric at each node is used to discriminate its sub-categories. [sent-158, score-0.341]
68 Since decisions at higher-level nodes must span a variety of object sub-categories, these generic decisions are interesting to test the learned features in a broader context. [sent-159, score-0.289]
69 We classify an object from the root node down; the leaf node that terminates the path is the predicted label. [sent-162, score-0.306]
70 We stress that our metric learning criterion of eq. [sent-163, score-0.261]
71 Our disjoint regularizer yields a sparse metric that only considers the feature dimension(s) necessary for discrimination at that given level. [sent-186, score-0.7]
72 To evaluate the influence of each aspect of our method, we test it under three variants: 1) ToM: ToM learning without any regularization terms, 2) ToM+Sparsity: ToM learning with the sparsity regularization term, and 3) ToM+Disjoint: ToM learning with the disjoint regularization term. [sent-189, score-0.533]
73 1 Proof of concept on synthetic dataset First we use synthetic data to clearly illustrate disjoint sparsity regularization. [sent-193, score-0.416]
74 Thus, as expected, disjoint sparsity does best, since it selects different features for the super- and sub-classes. [sent-205, score-0.516]
75 1(c)-(e)), we see disjoint sparsity zeros out the unneeded features for the upper-level metric, showed as black squares in the figure (e). [sent-207, score-0.516]
76 For our third dataset VEHICLE-20, we take 20 vehicle classes and 26,624 images from ImageNet, and apply PCA to reduce the authors’ provided visual word features [32] to 50 dimensions per image (The dimensionality worked best for the Global LMNN baseline. [sent-217, score-0.352]
77 Then, we build a compact partial hierarchy over those nodes by 1) pruning out any node that has only one child (i. [sent-222, score-0.233]
78 We generally achieve a sizable accuracy gain relative to the Global LMNN baseline (dark left bar for each class), showing the advantage of exploiting external semantics with our ToM approach. [sent-238, score-0.255]
79 VEHICLE−20 10 vehicle wheeled vehicle craft self−propelled vehicle vessel ship aircraft boat h. [sent-239, score-0.252]
80 he hic av le ie r− ai r er ve sh ip ai rc ra ft t lo ruc co k m ot iv e cr af t bo a w he t el ed a h rs k uc rtr ile tra up ck ck tru pi e ag rb ga r ce le ra rtib e nv co o. [sent-246, score-0.544]
81 m m ea loco st c tri ike ec b el ain nt o ou rtw m o ef cl er cy oot bi sc or ot m n o llo ba p hi rs ai r ne rli e ai an pl t ar oa db ee w e no sp ca ne ol ai nd go nt er lin co −a ir hi cl e −2 lig ht Accuracy improvement whale g. [sent-248, score-0.441]
82 ape even−toed ungulate canine Figure 3: Semantic hierarchy for VEHICLE-20 and the per-node accuracy gains, plotted as above. [sent-249, score-0.268]
83 1 Per-node accuracy and analysis of the learned representations Since our algorithm optimizes the metrics at every node, we first examine the resulting per-node decisions. [sent-257, score-0.271]
84 Multi-Metric LMNN is omitted here, since its metrics are only learned for the leaf node classes. [sent-261, score-0.357]
85 Furthermore, our results are usually strongest when including the novel disjoint sparsity regularizer. [sent-263, score-0.416]
86 While the ATTR variant exposes the semantic features directly to the learner, the PCA variant encapsulates an array of low-level descriptors into its dimensions. [sent-266, score-0.241]
87 Thus, while we can better interpret the meaning of disjoint sparsity on the attributes, our positive result on raw image features assures that disjoint feature selection is also amenable in the more general case. [sent-267, score-0.896]
88 21 Table 2: Multi-class hierarchical classification accuracy and semantic similarity on all three datasets. [sent-351, score-0.324]
89 2 Hierarchical multi-class classification accuracy Next we evaluate the complete multi-class classification accuracy, where we use all the learned ToM metrics together to predict the leaf-node label of the test points. [sent-360, score-0.308]
90 We score accuracy in two ways: Correct label records the percentage of examples assigned the correct (leaf) label, while Semantic similarity records the semantic similarity between the predicted and true labels. [sent-363, score-0.347]
91 Specifically, we calculate the semantic similarity between classes (nodes) i and j using the metric defined in [36], which counts the number of nodes shared by their two parent branches, divided by the length of the longest of the two branches. [sent-366, score-0.553]
92 In the spirit of other recent evaluations [37, 32, 36], this metric reflects that some errors are worse than others; for example, calling a Persian cat a Siamese cat is a less glaring error than calling a Persian cat a horse. [sent-367, score-0.513]
93 This is especially relevant in our case, since our key motivation is to instill external semantics into the feature learning process. [sent-368, score-0.234]
94 Further, in all cases, we see that disjoint sparsity is an important addition to ToM. [sent-370, score-0.416]
95 Conclusion We presented a new metric learning approach for visual recognition that integrates external semantics about object hierarchy. [sent-379, score-0.614]
96 In future work, we are interested in exploring local features in this context, and considering ways to learn both the hierarchy and the useful features simultaneously. [sent-381, score-0.305]
97 Object bank: A high-level image representation for scene classification and semantic feature sparsification. [sent-413, score-0.241]
98 Distance metric learning for large margin nearest neighbor classification. [sent-456, score-0.379]
99 Sharing visual features for multiclass and multiview object detection. [sent-504, score-0.282]
100 Fast solvers and efficient implementations for distance metric learning. [sent-533, score-0.303]
wordName wordTfidf (topN-words)
[('tom', 0.405), ('lmnn', 0.352), ('disjoint', 0.28), ('metric', 0.261), ('metrics', 0.176), ('semantic', 0.141), ('awa', 0.139), ('sparsity', 0.136), ('mahalanobis', 0.125), ('mt', 0.113), ('ijl', 0.107), ('attributes', 0.107), ('semantics', 0.106), ('hierarchy', 0.105), ('features', 0.1), ('regularizer', 0.096), ('object', 0.093), ('visual', 0.089), ('equine', 0.087), ('cat', 0.084), ('vehicle', 0.084), ('node', 0.08), ('categories', 0.077), ('hierarchical', 0.075), ('wordnet', 0.072), ('classi', 0.07), ('persian', 0.07), ('superclass', 0.07), ('ungulate', 0.07), ('taxonomy', 0.066), ('external', 0.065), ('feature', 0.063), ('ra', 0.062), ('similarity', 0.061), ('co', 0.061), ('ca', 0.059), ('baselines', 0.058), ('sh', 0.058), ('dolphin', 0.056), ('whale', 0.056), ('se', 0.055), ('leaf', 0.053), ('attr', 0.052), ('feline', 0.052), ('impostor', 0.052), ('nameable', 0.052), ('oc', 0.052), ('od', 0.052), ('tcrijl', 0.052), ('tusks', 0.052), ('ea', 0.05), ('tree', 0.05), ('en', 0.048), ('learned', 0.048), ('nodes', 0.048), ('accuracy', 0.047), ('ba', 0.046), ('domestic', 0.046), ('siamese', 0.046), ('grauman', 0.046), ('canine', 0.046), ('imagenet', 0.044), ('discriminative', 0.044), ('xj', 0.044), ('category', 0.043), ('vt', 0.043), ('bo', 0.043), ('ee', 0.043), ('distance', 0.042), ('ol', 0.042), ('subclasses', 0.042), ('classes', 0.042), ('ot', 0.042), ('discriminating', 0.042), ('rs', 0.042), ('euclidean', 0.042), ('neighbor', 0.041), ('nearest', 0.04), ('regularization', 0.039), ('le', 0.039), ('distinguish', 0.039), ('global', 0.038), ('ru', 0.038), ('image', 0.037), ('margin', 0.037), ('ers', 0.037), ('un', 0.037), ('label', 0.037), ('exploiting', 0.037), ('ha', 0.036), ('xl', 0.035), ('cvpr', 0.035), ('awaattr', 0.035), ('baleen', 0.035), ('hairless', 0.035), ('killer', 0.035), ('lo', 0.035), ('longneck', 0.035), ('plankton', 0.035), ('stripes', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
2 0.2655035 171 nips-2011-Metric Learning with Multiple Kernels
Author: Jun Wang, Huyen T. Do, Adam Woznica, Alexandros Kalousis
Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection. 1
3 0.18592475 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
Author: Bin Zhao, Fei Li, Eric P. Xing
Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1
4 0.17270271 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
Author: Congcong Li, Ashutosh Saxena, Tsuhan Chen
Abstract: For most scene understanding tasks (such as object detection or depth estimation), the classifiers need to consider contextual information in addition to the local features. We can capture such contextual information by taking as input the features/attributes from all the regions in the image. However, this contextual dependence also varies with the spatial location of the region of interest, and we therefore need a different set of parameters for each spatial location. This results in a very large number of parameters. In this work, we model the independence properties between the parameters for each location and for each task, by defining a Markov Random Field (MRF) over the parameters. In particular, two sets of parameters are encouraged to have similar values if they are spatially close or semantically close. Our method is, in principle, complementary to other ways of capturing context such as the ones that use a graphical model over the labels instead. In extensive evaluation over two different settings, of multi-class object detection and of multiple scene understanding tasks (scene categorization, depth estimation, geometric labeling), our method beats the state-of-the-art methods in all the four tasks. 1
5 0.16868839 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li
Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1
6 0.14763959 168 nips-2011-Maximum Margin Multi-Instance Learning
7 0.13027181 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
8 0.11919285 261 nips-2011-Sparse Filtering
9 0.11539044 150 nips-2011-Learning a Distance Metric from a Network
10 0.09495227 258 nips-2011-Sparse Bayesian Multi-Task Learning
11 0.089757822 231 nips-2011-Randomized Algorithms for Comparison-based Search
12 0.087924182 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
13 0.086511478 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
14 0.085470103 154 nips-2011-Learning person-object interactions for action recognition in still images
15 0.083765082 244 nips-2011-Selecting Receptive Fields in Deep Networks
16 0.083263047 165 nips-2011-Matrix Completion for Multi-label Image Classification
17 0.082737513 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms
18 0.081995517 157 nips-2011-Learning to Search Efficiently in High Dimensions
19 0.081495911 177 nips-2011-Multi-armed bandits on implicit metric spaces
20 0.081320815 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
topicId topicWeight
[(0, 0.254), (1, 0.124), (2, -0.156), (3, 0.098), (4, 0.04), (5, 0.027), (6, 0.062), (7, -0.024), (8, -0.029), (9, -0.101), (10, -0.057), (11, 0.086), (12, -0.048), (13, 0.173), (14, -0.033), (15, 0.034), (16, -0.067), (17, 0.125), (18, 0.048), (19, -0.087), (20, 0.055), (21, -0.033), (22, 0.076), (23, 0.037), (24, -0.007), (25, 0.021), (26, 0.014), (27, -0.06), (28, 0.017), (29, -0.043), (30, -0.052), (31, 0.074), (32, -0.098), (33, 0.034), (34, 0.099), (35, -0.06), (36, -0.038), (37, 0.085), (38, 0.166), (39, -0.037), (40, -0.008), (41, 0.003), (42, -0.125), (43, 0.059), (44, 0.037), (45, 0.088), (46, -0.058), (47, -0.185), (48, -0.009), (49, -0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.96215826 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
2 0.74576622 171 nips-2011-Metric Learning with Multiple Kernels
Author: Jun Wang, Huyen T. Do, Adam Woznica, Alexandros Kalousis
Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection. 1
3 0.69868523 150 nips-2011-Learning a Distance Metric from a Network
Author: Blake Shaw, Bert Huang, Tony Jebara
Abstract: Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges. 1
4 0.65384108 141 nips-2011-Large-Scale Category Structure Aware Image Categorization
Author: Bin Zhao, Fei Li, Eric P. Xing
Abstract: Most previous research on image categorization has focused on medium-scale data sets, while large-scale image categorization with millions of images from thousands of categories remains a challenge. With the emergence of structured large-scale dataset such as the ImageNet, rich information about the conceptual relationships between images, such as a tree hierarchy among various image categories, become available. As human cognition of complex visual world benefits from underlying semantic relationships between object classes, we believe a machine learning system can and should leverage such information as well for better performance. In this paper, we employ such semantic relatedness among image categories for large-scale image categorization. Specifically, a category hierarchy is utilized to properly define loss function and select common set of features for related categories. An efficient optimization method based on proximal approximation and accelerated parallel gradient method is introduced. Experimental results on a subset of ImageNet containing 1.2 million images from 1000 categories demonstrate the effectiveness and promise of our proposed approach. 1
5 0.62796718 168 nips-2011-Maximum Margin Multi-Instance Learning
Author: Hua Wang, Heng Huang, Farhad Kamangar, Feiping Nie, Chris H. Ding
Abstract: Multi-instance learning (MIL) considers input as bags of instances, in which labels are assigned to the bags. MIL is useful in many real-world applications. For example, in image categorization semantic meanings (labels) of an image mostly arise from its regions (instances) instead of the entire image (bag). Existing MIL methods typically build their models using the Bag-to-Bag (B2B) distance, which are often computationally expensive and may not truly reflect the semantic similarities. To tackle this, in this paper we approach MIL problems from a new perspective using the Class-to-Bag (C2B) distance, which directly assesses the relationships between the classes and the bags. Taking into account the two major challenges in MIL, high heterogeneity on data and weak label association, we propose a novel Maximum Margin Multi-Instance Learning (M3 I) approach to parameterize the C2B distance by introducing the class specific distance metrics and the locally adaptive significance coefficients. We apply our new approach to the automatic image categorization tasks on three (one single-label and two multilabel) benchmark data sets. Extensive experiments have demonstrated promising results that validate the proposed method.
6 0.62658447 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition
7 0.58639514 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
8 0.56049746 293 nips-2011-Understanding the Intrinsic Memorability of Images
9 0.55873144 279 nips-2011-Target Neighbor Consistent Feature Weighting for Nearest Neighbor Classification
10 0.53365666 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
11 0.52781677 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing
12 0.50575829 19 nips-2011-Active Classification based on Value of Classifier
13 0.50427973 274 nips-2011-Structure Learning for Optimization
14 0.49469909 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
15 0.49316508 247 nips-2011-Semantic Labeling of 3D Point Clouds for Indoor Scenes
16 0.49185026 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
17 0.48665854 254 nips-2011-Similarity-based Learning via Data Driven Embeddings
18 0.48579562 157 nips-2011-Learning to Search Efficiently in High Dimensions
19 0.48034862 290 nips-2011-Transfer Learning by Borrowing Examples for Multiclass Object Detection
20 0.47608551 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
topicId topicWeight
[(0, 0.025), (4, 0.063), (20, 0.06), (26, 0.021), (31, 0.047), (33, 0.073), (39, 0.26), (43, 0.038), (45, 0.107), (57, 0.066), (65, 0.022), (74, 0.061), (83, 0.028), (84, 0.014), (99, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.78698742 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features
Author: Kristen Grauman, Fei Sha, Sung J. Hwang
Abstract: We introduce an approach to learn discriminative visual representations while exploiting external semantic knowledge about object category relationships. Given a hierarchical taxonomy that captures semantic similarity between the objects, we learn a corresponding tree of metrics (ToM). In this tree, we have one metric for each non-leaf node of the object hierarchy, and each metric is responsible for discriminating among its immediate subcategory children. Specifically, a Mahalanobis metric learned for a given node must satisfy the appropriate (dis)similarity constraints generated only among its subtree members’ training instances. To further exploit the semantics, we introduce a novel regularizer coupling the metrics that prefers a sparse disjoint set of features to be selected for each metric relative to its ancestor (supercategory) nodes’ metrics. Intuitively, this reflects that visual cues most useful to distinguish the generic classes (e.g., feline vs. canine) should be different than those cues most useful to distinguish their component fine-grained classes (e.g., Persian cat vs. Siamese cat). We validate our approach with multiple image datasets using the WordNet taxonomy, show its advantages over alternative metric learning approaches, and analyze the meaning of attribute features selected by our algorithm. 1
2 0.70884126 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
Author: Yi-kai Liu
Abstract: We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed sensing: recovering a sparse vector from a few of its Fourier coefficients. We show that almost all sets of O(rd log6 d) Pauli measurements satisfy the rankr restricted isometry property (RIP). This implies that M can be recovered from a fixed (“universal”) set of Pauli measurements, using nuclear-norm minimization (e.g., the matrix Lasso), with nearly-optimal bounds on the error. A similar result holds for any class of measurements that use an orthonormal operator basis whose elements have small operator norm. Our proof uses Dudley’s inequality for Gaussian processes, together with bounds on covering numbers obtained via entropy duality. 1
3 0.70579016 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky
Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efficiently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1
4 0.67555308 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
Author: Yan Karklin, Eero P. Simoncelli
Abstract: Efficient coding provides a powerful principle for explaining early sensory coding. Most attempts to test this principle have been limited to linear, noiseless models, and when applied to natural images, have yielded oriented filters consistent with responses in primary visual cortex. Here we show that an efficient coding model that incorporates biologically realistic ingredients – input and output noise, nonlinear response functions, and a metabolic cost on the firing rate – predicts receptive fields and response nonlinearities similar to those observed in the retina. Specifically, we develop numerical methods for simultaneously learning the linear filters and response nonlinearities of a population of model neurons, so as to maximize information transmission subject to metabolic costs. When applied to an ensemble of natural images, the method yields filters that are center-surround and nonlinearities that are rectifying. The filters are organized into two populations, with On- and Off-centers, which independently tile the visual space. As observed in the primate retina, the Off-center neurons are more numerous and have filters with smaller spatial extent. In the absence of noise, our method reduces to a generalized version of independent components analysis, with an adapted nonlinear “contrast” function; in this case, the optimal filters are localized and oriented.
5 0.62126887 182 nips-2011-Nearest Neighbor based Greedy Coordinate Descent
Author: Inderjit S. Dhillon, Pradeep K. Ravikumar, Ambuj Tewari
Abstract: Increasingly, optimization problems in machine learning, especially those arising from bigh-dimensional statistical estimation, bave a large number of variables. Modem statistical estimators developed over the past decade have statistical or sample complexity that depends only weakly on the number of parameters when there is some structore to the problem, such as sparsity. A central question is whether similar advances can be made in their computational complexity as well. In this paper, we propose strategies that indicate that such advances can indeed be made. In particular, we investigate the greedy coordinate descent algorithm, and note that performing the greedy step efficiently weakens the costly dependence on the problem size provided the solution is sparse. We then propose a snite of methods that perform these greedy steps efficiently by a reduction to nearest neighbor search. We also devise a more amenable form of greedy descent for composite non-smooth objectives; as well as several approximate variants of such greedy descent. We develop a practical implementation of our algorithm that combines greedy coordinate descent with locality sensitive hashing. Without tuning the latter data structore, we are not only able to significantly speed up the vanilla greedy method, hot also outperform cyclic descent when the problem size becomes large. Our resnlts indicate the effectiveness of our nearest neighbor strategies, and also point to many open questions regarding the development of computational geometric techniques tailored towards first-order optimization methods.
6 0.60642582 19 nips-2011-Active Classification based on Value of Classifier
7 0.56705076 209 nips-2011-Orthogonal Matching Pursuit with Replacement
8 0.56053066 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding
9 0.55857664 127 nips-2011-Image Parsing with Stochastic Scene Grammar
10 0.5581125 154 nips-2011-Learning person-object interactions for action recognition in still images
11 0.55803579 223 nips-2011-Probabilistic Joint Image Segmentation and Labeling
12 0.55579776 227 nips-2011-Pylon Model for Semantic Segmentation
13 0.55554909 168 nips-2011-Maximum Margin Multi-Instance Learning
14 0.55339146 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
15 0.55035347 157 nips-2011-Learning to Search Efficiently in High Dimensions
16 0.54911894 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation
17 0.54625458 303 nips-2011-Video Annotation and Tracking with Active Learning
18 0.54578125 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
19 0.5436787 103 nips-2011-Generalization Bounds and Consistency for Latent Structural Probit and Ramp Loss
20 0.54256254 263 nips-2011-Sparse Manifold Clustering and Embedding