nips nips2006 nips2006-78 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. [sent-4, score-0.461]
2 We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. [sent-5, score-0.858]
3 1 Introduction Many of the most popular current methods for image classification represent images as collections of independent patches characterized by local visual descriptors. [sent-6, score-0.573]
4 Various local descriptors exist with different degrees of geometric and photometric invariance, but all encode the local patch appearance as a numerical vector and the more discriminant ones tend to be high-dimensional. [sent-8, score-0.327]
5 The usual way to handle the resulting set of descriptor vectors is to vector quantize them to produce socalled textons [12] or visual words [5, 22]. [sent-9, score-0.559]
6 The introduction of such visual codebooks has allowed significant advances in image classification, especially when combined with bag-of-words models inspired by text analysis [5, 7, 22, 24, 25]. [sent-10, score-0.556]
7 These methods are generative but some recent approaches focus on building more discriminative codebooks [20, 24]. [sent-13, score-0.286]
8 The above methods give impressive results but they are computationally expensive owing to the cost of assigning visual descriptors to visual words during training and use. [sent-14, score-0.777]
9 One is that (small) ensembles of trees eliminate many of the disadvantages of single tree based coders without losing the speed advantages of trees. [sent-18, score-0.659]
10 The second is that classification trees contain a lot of valuable information about locality in descriptor space that is not apparent in the final class labels. [sent-19, score-0.662]
11 One can exploit this by training them for classification then ignoring the class labels and using them as “clustering trees” – simple spatial partitioners that assign a distinct region label (visual word) to each leaf. [sent-20, score-0.168]
12 de Figure 1: Using ERC-Forests as visual codebooks in bag-of-feature image classification. [sent-24, score-0.556]
13 Randomized Clustering Forests (ERC-Forests) – ensembles of randomly created clustering trees. [sent-25, score-0.148]
14 We show that these have good resistance to background clutter and that they provide much faster training and testing and more accurate results than conventional k-means in several state-of-the-art image classification tasks. [sent-26, score-0.359]
15 In the rest of the paper, we first explain how decision trees can provide good visual vocabularies, then we describe our approach and present experimental results and conclusions. [sent-27, score-0.608]
16 2 Tree Structured Visual Dictionaries Our overall goal is to classify images according to the object classes that they contain (see figure 1). [sent-28, score-0.182]
17 We will do this by selecting or sampling patches from the image, characterizing them by vectors of local visual descriptors and coding (quantizing) the vectors using a learned visual dictionary, i. [sent-29, score-0.953]
18 a process that assigns discrete labels to descriptors, with similar descriptors having a high probability of being assigned the same label. [sent-31, score-0.303]
19 As in text categorization, the occurrences of each label (“visual word”) are then counted to build a global histogram (“bag of words”) summarizing the image (“document”) contents. [sent-32, score-0.29]
20 Unlike text, visual ‘words’ are not intrinsic entities and different quantization methods can lead to very different performances. [sent-34, score-0.3]
21 Computational efficiency is important because a typical image yields 103 –104 local descriptors and data sets often contain thousands of images. [sent-35, score-0.437]
22 Also, many of the descriptors generally lie on the background not the object being classified, so the coding method needs to be able to learn a discriminative labelling despite considerable background ‘noise’. [sent-36, score-0.654]
23 Visual coding based on K-means vector quantization is effective but slow because it relies on nearest neighbor search, which remains hard to accelerate in high dimensional descriptor spaces despite years of research on spatial data structures (e. [sent-38, score-0.588]
24 Component-wise decision trees offer logarithmic-time coding, but individual trees can rarely compete with a good K-means coding: each path through the tree typically accesses only a few of the feature dimensions, so there is little scope for producing a consensus over many different dimensions. [sent-42, score-0.938]
25 Ensembles of random trees [4] seem particularly suitable for visual dictionaries owing to their simplicity, speed and performance [11]. [sent-51, score-0.655]
26 Sufficiently diverse trees can be constructed using randomized data splits or samples [4]. [sent-52, score-0.513]
27 5, ER tree construction is rapid, depends only weakly on the dimensionality and requires relatively little memory. [sent-55, score-0.195]
28 Methods such as [11, 8] classify descriptors by majority voting over the treeassigned class labels. [sent-57, score-0.329]
29 Our method works differently after the trees are built. [sent-59, score-0.351]
30 It uses the trees as spatial partitioning methods not classifiers, assigning each leaf of each tree a distinct region label (visual word). [sent-60, score-0.818]
31 For the overall image classification tasks studied here, histograms of these leaf labels are then accumulated over the whole image and a global SVM classifier is applied. [sent-61, score-0.574]
32 Our approach is thus related to clustering trees – decision trees whose leaves define a spatial partitioning or grouping [3, 13]. [sent-62, score-0.919]
33 Such trees are able to find natural clusters in high dimensional spaces. [sent-63, score-0.351]
34 They can be built without external class labels, but if labels are available they can be used to guide the tree construction. [sent-64, score-0.279]
35 Ensemble methods and particularly forests of extremely randomized trees again offer considerable performance advantages here. [sent-65, score-0.75]
36 The next section shows how such Extremely Randomized Clustering Forests can be used to produce efficient visual vocabularies for image classification tasks. [sent-66, score-0.449]
37 3 Extremely Randomized Clustering Forests (ERC-Forests) Our goal is to build a discriminative coding method. [sent-67, score-0.233]
38 Our method starts by building randomized decision trees that predict class labels y from visual descriptor vectors d = (f1 , . [sent-68, score-1.153]
39 For notational simplicity we assume that all of the descriptors from a given image share the same label y. [sent-75, score-0.452]
40 We train the trees using a labeled (for now) training set L = {(dn , yn ), n = 1, . [sent-76, score-0.388]
41 However we use the trees only for spatial coding, not classification per se. [sent-80, score-0.44]
42 During a query, for each descriptor tested, each tree is traversed from the root down to a leaf and the returned label is the unique leaf index, not the (set of) descriptor label(s) y associated with the leaf. [sent-81, score-1.226]
43 At each node t corresponding to descriptor space region Rt , two children l, r are created by choosing a boolean test Tt that divides Rt into two disjoint regions, Rt = Rl ∪ Rr with Rl ∩ Rr = φ. [sent-84, score-0.352]
44 (1, D) for normal ID3 decision tree learning) produce highly discriminant trees with little diversity, while Smin = 0 or Tmax = 1 produce completely random trees. [sent-95, score-0.651]
45 Compared to standard decision tree learning, the trees built using random decisions are larger and have higher variance. [sent-97, score-0.631]
46 Class label variance can be reduced by voting over the ensemble of trees (e. [sent-98, score-0.511]
47 [15]), but here, instead of voting we treat each leaf in each tree as a separate visual word and stack the leaf indices from each tree into an extended code vector for each input descriptor, leaving the integration of votes to the final classifier. [sent-100, score-1.134]
48 [10]), with each tree being responsible for distributing the data across its own set of clusters. [sent-103, score-0.195]
49 Classifier forests are characterized by Breiman’s bound on the asymptotic generalization error [4], P E∗ ≤ ρ (1 − s2 )/s2 where s measures the strength of the individual trees and ρ measures the correlation between them in terms of the raw margin. [sent-104, score-0.523]
50 Experimentally, the trees appear to be rather diverse while still remaining relatively strong, which should lead to good error bounds. [sent-106, score-0.351]
51 In the experiments below, local features are extracted from the training images by sampling sub-windows at random positions and scales1 and coding them using a visual descriptor function. [sent-108, score-0.787]
52 Right: some test patches that were assigned to a particular ‘car’ leaf (left) and a particular ‘bike’ one (right)). [sent-113, score-0.29]
53 One can also prune during construction, which is faster but does not allow the number of leaf nodes to be controlled directly. [sent-115, score-0.214]
54 In use, the trees transform each descriptor into a set of leaf node indices with one element from each tree. [sent-116, score-0.884]
55 Independently of the codebook, the denser the sampling the better the results, so typically we sample images more densely during testing than during codebook training, c. [sent-118, score-0.198]
56 The worst-case complexity for building a tree is O(Tmax N k), where N is the number of patches and k is the number of clusters/leaf nodes before pruning. [sent-122, score-0.336]
57 With adversarial data the method cannot guarantee balanced trees so it can not do better than this, but in our experiments on real data we always obtained well balanced trees at a practical complexity of around O(Tmax N log k). [sent-123, score-0.814]
58 The dependence on data dimensionality D is hidden in the constant Tmax , which needs to be set large enough to filter out irrelevant√ feature dimensions, thus providing better coding and more balanced trees. [sent-124, score-0.173]
59 In contrast, k-means has a complexity of O(DN k) which is more than 104 times larger for our 768-D wavelet descriptor with N = 20000 image patches and k = 5000 clusters, not counting the number of iterations that k-means has to perform. [sent-126, score-0.635]
60 Our method is also faster in use – a useful property given that reliable image classification requires large numbers of subwindows to be labelled [18, 24]. [sent-127, score-0.218]
61 Labeling a descriptor with a balanced tree requires O(log k) operations whereas k-means costs O(kD). [sent-128, score-0.562]
62 GRAZ-02 (figure 2-left) contains three object categories – bicycles (B), cars (C), persons (P) – and negatives (N, meaning that none of B,C,P are present). [sent-134, score-0.245]
63 Our color descriptor uses raw HSL color pixels to produce a 768-D feature vector (16×16 pixels × 3 colors). [sent-139, score-0.419]
64 Our color wavelet descriptor transforms this into another 768-D vector using a 16×16 Haar wavelet transform. [sent-140, score-0.495]
65 Finally, we tested the popular grayscale SIFT descriptor [14], which returns 128-D vectors (4×4 histograms of 8 orientations). [sent-141, score-0.418]
66 The method is randomized so we report means and variances over 10 learning runs. [sent-143, score-0.162]
67 In contrast Tmax has a significant influence on performance so it Figure 3: ‘Bike’ visual words for 4 different images. [sent-146, score-0.216]
68 The brightness denotes the posterior probability for the visual word at the given image position to be labelled ‘bike’. [sent-147, score-0.419]
69 Our algorithm’s ability to produce meaningful visual words is illustrated in figure 3 (c. [sent-150, score-0.248]
70 Each white dot corresponds to the center of an image sub-window that reached an unmixed leaf node for the given object category (i. [sent-153, score-0.534]
71 all of the training vectors belonging to the leaf are labeled with that category). [sent-155, score-0.218]
72 Note that even though they have been learned on entire images without object segmentation, the visual vocabulary is discriminative enough to detect local structures in the test images that correspond well with representative object fragments, as illustrated in figure 2(right). [sent-156, score-0.741]
73 The tests here were for individual object categories versus negatives (N). [sent-157, score-0.242]
74 We took 300 images from each category, using images with even numbers for training and ones with odd numbers for testing. [sent-158, score-0.185]
75 For Setting 1 tests we trained on the whole image as in [19], while for Setting 2 ones we used the segmentation masks provided with the images to train on the objects alone without background. [sent-159, score-0.328]
76 For the GRAZ-02 database the wavelet descriptors gave the best performance. [sent-160, score-0.336]
77 Remarkably, using segmentation masks during training does not improve the image classification performance. [sent-173, score-0.251]
78 Unless otherwise stated, 20 000 features (67 per image) were used to learn 1000 spatial bins per tree for 5 trees, and 8000 patches were sampled per image to build the resulting 5000-D histograms. [sent-176, score-0.685]
79 The histograms are binarized using trivial thresholding at count 1 before being fed to the global linear SVM image classifier. [sent-177, score-0.248]
80 Note that we were not able to extend the k-means curve beyond 20 000 windows per image owing to prohibitive execution times. [sent-184, score-0.232]
81 The figure also shows results for ‘unsupervised trees’ – ERC-Forests built without using the class labels during tree construction. [sent-185, score-0.279]
82 The algorithm remains the same, but the node scoring function is defined as the ratio between the splits so as to encourage balanced trees similar to randomized KD-trees. [sent-186, score-0.61]
83 4(b) shows that codebooks with around 5000 entries (1000 per tree) suffice for good results. [sent-192, score-0.243]
84 4(c) shows that when the number of features used to build the codebooks is increased, the Figure 4: Evaluation of the parameters for B vs. [sent-194, score-0.258]
85 Also, if the trees are pruned too heavily they lose discriminative power: it is better to grow them fully and do without pruning. [sent-199, score-0.407]
86 4(d) shows that increasing the number of trees from 1 to 5 reduces the variance and increases the accuracy, with little improvement beyond this. [sent-201, score-0.351]
87 Here, the number of leaves per tree was kept constant at 1000, so doubling the number of trees effectively doubles the vocabulary size. [sent-202, score-0.711]
88 Just 73 patches per image (50 000 in total over the 648 training images) were used to build the codebook. [sent-208, score-0.393]
89 Building the histograms for the 684 training and 689 test images with 10 000 patches per image took only a few hours. [sent-217, score-0.476]
90 They use the same kind of tree e structures to classify images directly, without introducing the vocabulary layer that we propose. [sent-221, score-0.342]
91 Using SIFT descriptors we get an EER classification rate of 85. [sent-227, score-0.263]
92 100 patches per image were used to build a codebook with 4 trees. [sent-229, score-0.48]
93 5 Conclusions Bag of local descriptor based image classifiers give state-of-the art results but require the quantization of large numbers of high-dimensional image descriptors into many label classes. [sent-231, score-1.021]
94 Extremely Randomized Clustering Forests provide a rapid and highly discriminative approach to this that outperforms k-means based coding in training time and memory, testing time, and classification accuracy. [sent-232, score-0.21]
95 It is also resistant to background clutter, giving relatively clean segmentation and “popout” of foreground classes even when trained on images that contain significantly more background features than foreground ones. [sent-234, score-0.287]
96 Although trained as classifiers, the trees are used as descriptor-space quantization rules with the final classification being handled by a separate SVM trained on the leaf indices. [sent-235, score-0.616]
97 This seems to be a promising approach for visual recognition, and may be beneficial in other areas such as object detection and segmentation. [sent-236, score-0.324]
98 Scalability of local image descriptors: o o A comparative study. [sent-303, score-0.174]
99 Representing and recognizing the visual appearance of materials using threedimensional textons. [sent-317, score-0.216]
100 Object localization with boosting and weak supervision for generic object recognition. [sent-355, score-0.145]
wordName wordTfidf (topN-words)
[('trees', 0.351), ('descriptor', 0.311), ('descriptors', 0.263), ('tmax', 0.248), ('visual', 0.216), ('codebooks', 0.198), ('tree', 0.195), ('leaf', 0.181), ('forests', 0.172), ('randomized', 0.162), ('image', 0.142), ('codebook', 0.124), ('smin', 0.124), ('coding', 0.117), ('patches', 0.109), ('object', 0.108), ('eer', 0.108), ('classi', 0.104), ('clustering', 0.085), ('quantization', 0.084), ('images', 0.074), ('bikes', 0.074), ('jurie', 0.074), ('moosmann', 0.074), ('vocabulary', 0.073), ('wavelet', 0.073), ('histograms', 0.069), ('voting', 0.066), ('bike', 0.065), ('extremely', 0.065), ('ensembles', 0.063), ('category', 0.062), ('word', 0.061), ('categories', 0.061), ('build', 0.06), ('vocabularies', 0.059), ('tt', 0.059), ('balanced', 0.056), ('discriminative', 0.056), ('background', 0.055), ('sift', 0.055), ('clutter', 0.055), ('eccv', 0.054), ('coders', 0.05), ('csurka', 0.05), ('dance', 0.05), ('geurts', 0.05), ('goldstein', 0.05), ('keypoint', 0.05), ('opelt', 0.05), ('ensemble', 0.047), ('leaves', 0.047), ('iccv', 0.047), ('label', 0.047), ('owing', 0.045), ('per', 0.045), ('built', 0.044), ('spatial', 0.044), ('pascal', 0.043), ('bicycles', 0.043), ('subwindows', 0.043), ('winn', 0.043), ('dictionaries', 0.043), ('surviving', 0.043), ('thresholds', 0.042), ('decision', 0.041), ('histogram', 0.041), ('node', 0.041), ('bag', 0.041), ('labels', 0.04), ('tests', 0.04), ('dn', 0.04), ('nist', 0.039), ('votes', 0.039), ('quantizing', 0.039), ('cvpr', 0.039), ('tested', 0.038), ('color', 0.038), ('segmentation', 0.037), ('cation', 0.037), ('binarized', 0.037), ('rr', 0.037), ('resistance', 0.037), ('supervision', 0.037), ('training', 0.037), ('masks', 0.035), ('ijcv', 0.035), ('hc', 0.035), ('faster', 0.033), ('categorization', 0.033), ('unstable', 0.033), ('foreground', 0.033), ('negatives', 0.033), ('mar', 0.033), ('rt', 0.032), ('local', 0.032), ('produce', 0.032), ('index', 0.032), ('nearest', 0.032), ('building', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
2 0.20019351 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation
Author: Joseph Turian, Benjamin Wellington, I. D. Melamed
Abstract: Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns. Experiments demonstrate the method’s versatility, accuracy, and efficiency. Relevant software is freely available at http://nlp.cs.nyu.edu/parser and http://nlp.cs.nyu.edu/GenPar. 1
3 0.17158766 66 nips-2006-Detecting Humans via Their Pose
Author: Alessandro Bissacco, Ming-Hsuan Yang, Stefano Soatto
Abstract: We consider the problem of detecting humans and classifying their pose from a single image. Specifically, our goal is to devise a statistical model that simultaneously answers two questions: 1) is there a human in the image? and, if so, 2) what is a low-dimensional representation of her pose? We investigate models that can be learned in an unsupervised manner on unlabeled images of human poses, and provide information that can be used to match the pose of a new image to the ones present in the training set. Starting from a set of descriptors recently proposed for human detection, we apply the Latent Dirichlet Allocation framework to model the statistics of these features, and use the resulting model to answer the above questions. We show how our model can efficiently describe the space of images of humans with their pose, by providing an effective representation of poses for tasks such as classification and matching, while performing remarkably well in human/non human decision problems, thus enabling its use for human detection. We validate the model with extensive quantitative experiments and comparisons with other approaches on human detection and pose matching. 1
4 0.16207875 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
5 0.1410215 115 nips-2006-Learning annotated hierarchies from relational data
Author: Daniel M. Roy, Charles Kemp, Vikash K. Mansinghka, Joshua B. Tenenbaum
Abstract: The objects in many real-world domains can be organized into hierarchies, where each internal node picks out a category of objects. Given a collection of features and relations defined over a set of objects, an annotated hierarchy includes a specification of the categories that are most useful for describing each individual feature and relation. We define a generative model for annotated hierarchies and the features and relations that they describe, and develop a Markov chain Monte Carlo scheme for learning annotated hierarchies. We show that our model discovers interpretable structure in several real-world data sets.
6 0.12909082 41 nips-2006-Bayesian Ensemble Learning
7 0.12871331 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
8 0.1270896 185 nips-2006-Subordinate class recognition using relational object models
9 0.11149643 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
10 0.10864107 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
11 0.10586496 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
12 0.096016914 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
13 0.095257983 103 nips-2006-Kernels on Structured Objects Through Nested Histograms
14 0.094229236 50 nips-2006-Chained Boosting
15 0.085212633 122 nips-2006-Learning to parse images of articulated bodies
16 0.081261784 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype
17 0.078748748 42 nips-2006-Bayesian Image Super-resolution, Continued
18 0.07469213 34 nips-2006-Approximate Correspondences in High Dimensions
19 0.072503619 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
20 0.071183547 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods
topicId topicWeight
[(0, -0.249), (1, 0.047), (2, 0.165), (3, -0.087), (4, 0.048), (5, 0.006), (6, -0.216), (7, -0.104), (8, 0.063), (9, -0.183), (10, 0.094), (11, 0.149), (12, -0.006), (13, 0.011), (14, 0.097), (15, 0.043), (16, 0.06), (17, 0.123), (18, 0.043), (19, 0.035), (20, -0.188), (21, -0.011), (22, -0.1), (23, -0.034), (24, -0.041), (25, -0.004), (26, -0.06), (27, -0.034), (28, -0.087), (29, 0.06), (30, 0.051), (31, 0.034), (32, 0.115), (33, 0.012), (34, -0.006), (35, 0.077), (36, 0.037), (37, 0.073), (38, -0.022), (39, -0.051), (40, -0.019), (41, -0.07), (42, -0.031), (43, 0.012), (44, 0.047), (45, -0.081), (46, -0.026), (47, 0.044), (48, 0.097), (49, 0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.95687175 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
2 0.65461892 172 nips-2006-Scalable Discriminative Learning for Natural Language Parsing and Translation
Author: Joseph Turian, Benjamin Wellington, I. D. Melamed
Abstract: Parsing and translating natural languages can be viewed as problems of predicting tree structures. For machine learning approaches to these predictions, the diversity and high dimensionality of the structures involved mandate very large training sets. This paper presents a purely discriminative learning method that scales up well to problems of this size. Its accuracy was at least as good as other comparable methods on a standard parsing task. To our knowledge, it is the first purely discriminative learning algorithm for translation with treestructured models. Unlike other popular methods, this method does not require a great deal of feature engineering a priori, because it performs feature selection over a compound feature space as it learns. Experiments demonstrate the method’s versatility, accuracy, and efficiency. Relevant software is freely available at http://nlp.cs.nyu.edu/parser and http://nlp.cs.nyu.edu/GenPar. 1
3 0.6427809 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
4 0.59375376 41 nips-2006-Bayesian Ensemble Learning
Author: Hugh A. Chipman, Edward I. George, Robert E. Mcculloch
Abstract: We develop a Bayesian “sum-of-trees” model, named BART, where each tree is constrained by a prior to be a weak learner. Fitting and inference are accomplished via an iterative backfitting MCMC algorithm. This model is motivated by ensemble methods in general, and boosting algorithms in particular. Like boosting, each weak learner (i.e., each weak tree) contributes a small amount to the overall model. However, our procedure is defined by a statistical model: a prior and a likelihood, while boosting is defined by an algorithm. This model-based approach enables a full and accurate assessment of uncertainty in model predictions, while remaining highly competitive in terms of predictive accuracy. 1
5 0.58552706 185 nips-2006-Subordinate class recognition using relational object models
Author: Aharon B. Hillel, Daphna Weinshall
Abstract: We address the problem of sub-ordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while sub-ordinate categories are more often defined by part properties (like ’jagged wheels’). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model’s part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the sub-ordinate class. 1
6 0.5608694 115 nips-2006-Learning annotated hierarchies from relational data
7 0.55076033 52 nips-2006-Clustering appearance and shape by learning jigsaws
8 0.5455395 73 nips-2006-Efficient Methods for Privacy Preserving Face Detection
9 0.52914625 199 nips-2006-Unsupervised Learning of a Probabilistic Grammar for Object Detection and Parsing
10 0.51553261 66 nips-2006-Detecting Humans via Their Pose
11 0.51416099 174 nips-2006-Similarity by Composition
12 0.50660676 122 nips-2006-Learning to parse images of articulated bodies
13 0.48588955 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype
14 0.47868276 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees
15 0.47712576 170 nips-2006-Robotic Grasping of Novel Objects
16 0.46952009 50 nips-2006-Chained Boosting
17 0.46368387 45 nips-2006-Blind Motion Deblurring Using Image Statistics
18 0.44905454 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
19 0.43908229 140 nips-2006-Multiple Instance Learning for Computer Aided Diagnosis
20 0.43073437 103 nips-2006-Kernels on Structured Objects Through Nested Histograms
topicId topicWeight
[(1, 0.119), (3, 0.023), (7, 0.09), (9, 0.053), (21, 0.295), (22, 0.077), (44, 0.042), (57, 0.122), (65, 0.054), (69, 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.8304134 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests
Author: Frank Moosmann, Bill Triggs, Frederic Jurie
Abstract: Some of the most effective recent methods for content-based image classification work by extracting dense or sparse local image descriptors, quantizing them according to a coding rule such as k-means vector quantization, accumulating histograms of the resulting “visual word” codes over the image, and classifying these with a conventional classifier such as an SVM. Large numbers of descriptors and large codebooks are needed for good results and this becomes slow using k-means. We introduce Extremely Randomized Clustering Forests – ensembles of randomly created clustering trees – and show that these provide more accurate results, much faster training and testing and good resistance to background clutter in several state-of-the-art image classification tasks. 1
2 0.81891423 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions
Author: Andrea Frome, Yoram Singer, Jitendra Malik
Abstract: In this paper we introduce and experiment with a framework for learning local perceptual distance functions for visual recognition. We learn a distance function for each training image as a combination of elementary distances between patch-based visual features. We apply these combined local distance functions to the tasks of image retrieval and classification of novel images. On the Caltech 101 object recognition benchmark, we achieve 60.3% mean recognition across classes using 15 training images per class, which is better than the best published performance by Zhang, et al. 1
3 0.8180961 124 nips-2006-Linearly-solvable Markov decision problems
Author: Emanuel Todorov
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical signi cance, the new MDPs enable ef cient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Offpolicy learning of the optimal value function is possible without need for stateaction values; the new algorithm (Z-learning) outperforms Q-learning. This work was supported by NSF grant ECS–0524761. 1
4 0.61695486 185 nips-2006-Subordinate class recognition using relational object models
Author: Aharon B. Hillel, Daphna Weinshall
Abstract: We address the problem of sub-ordinate class recognition, like the distinction between different types of motorcycles. Our approach is motivated by observations from cognitive psychology, which identify parts as the defining component of basic level categories (like motorcycles), while sub-ordinate categories are more often defined by part properties (like ’jagged wheels’). Accordingly, we suggest a two-stage algorithm: First, a relational part based object model is learnt using unsegmented object images from the inclusive class (e.g., motorcycles in general). The model is then used to build a class-specific vector representation for images, where each entry corresponds to a model’s part. In the second stage we train a standard discriminative classifier to classify subclass instances (e.g., cross motorcycles) based on the class-specific vector representation. We describe extensive experimental results with several subclasses. The proposed algorithm typically gives better results than a competing one-step algorithm, or a two stage algorithm where classification is based on a model of the sub-ordinate class. 1
5 0.61228544 34 nips-2006-Approximate Correspondences in High Dimensions
Author: Kristen Grauman, Trevor Darrell
Abstract: Pyramid intersection is an efficient method for computing an approximate partial matching between two sets of feature vectors. We introduce a novel pyramid embedding based on a hierarchy of non-uniformly shaped bins that takes advantage of the underlying structure of the feature space and remains accurate even for sets with high-dimensional feature vectors. The matching similarity is computed in linear time and forms a Mercer kernel. Whereas previous matching approximation algorithms suffer from distortion factors that increase linearly with the feature dimension, we demonstrate that our approach can maintain constant accuracy even as the feature dimension increases. When used as a kernel in a discriminative classifier, our approach achieves improved object recognition results over a state-of-the-art set kernel. 1
6 0.58777851 8 nips-2006-A Nonparametric Approach to Bottom-Up Visual Saliency
7 0.58667195 51 nips-2006-Clustering Under Prior Knowledge with Application to Image Segmentation
8 0.58600843 119 nips-2006-Learning to Rank with Nonsmooth Cost Functions
9 0.58474004 195 nips-2006-Training Conditional Random Fields for Maximum Labelwise Accuracy
10 0.584382 110 nips-2006-Learning Dense 3D Correspondence
11 0.58381712 72 nips-2006-Efficient Learning of Sparse Representations with an Energy-Based Model
12 0.5816263 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning
13 0.5803169 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment
14 0.57888037 76 nips-2006-Emergence of conjunctive visual features by quadratic independent component analysis
15 0.57864237 65 nips-2006-Denoising and Dimension Reduction in Feature Space
16 0.5776726 130 nips-2006-Max-margin classification of incomplete data
17 0.57708895 118 nips-2006-Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields
18 0.57556021 158 nips-2006-PG-means: learning the number of clusters in data
19 0.57471824 106 nips-2006-Large Margin Hidden Markov Models for Automatic Speech Recognition
20 0.57454753 20 nips-2006-Active learning for misspecified generalized linear models