jmlr jmlr2011 jmlr2011-31 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg
Abstract: Common visual codebook generation methods used in a bag of visual words model, for example, k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that with histogram features, the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks. In this paper, we demonstrate that HIK can be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. The algorithm is also generalized to arbitrary additive kernels. Its speed is thousands of times faster than a naive implementation of the kernel k-means algorithm. In addition, we propose a one-class SVM formulation to create more effective visual code words. Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. Keywords: visual codebook, additive kernel, histogram intersection kernel
Reference: text
sentIndex sentText sentNum sentScore
1 However, most popular visual descriptors are histograms of image measurements. [sent-8, score-0.604]
2 In this paper, we demonstrate that HIK can be used in an unsupervised manner to significantly improve the generation of visual codebooks. [sent-10, score-0.386]
3 We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. [sent-11, score-0.395]
4 The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. [sent-12, score-1.084]
5 In addition, we propose a one-class SVM formulation to create more effective visual code words. [sent-15, score-0.409]
6 Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. [sent-16, score-0.963]
7 Keywords: visual codebook, additive kernel, histogram intersection kernel 1. [sent-17, score-0.942]
8 Introduction Bag of visual words (BOV) is currently a popular approach to object and scene recognition in computer vision. [sent-18, score-0.616]
9 Two typical visual descriptors are SIFT by Lowe (2004) and HOG by Dalal and Triggs (2005). [sent-23, score-0.393]
10 • Generate a codebook and map features to visual code words. [sent-31, score-0.769]
11 A visual codebook is a method that divides the space of visual descriptors into several regions. [sent-32, score-1.033]
12 Features in one region correspond to the same visual code word, which is represented by an integer between 1 and the size of the codebook. [sent-33, score-0.409]
13 An image is then encoded as a histogram of visual code words. [sent-34, score-0.778]
14 The quality of the visual codebook has a significant impact on the success of BOV-based methods. [sent-38, score-0.64]
15 Popular and successful methods for object and scene categorization typically employ unsupervised learning methods (for example, k-means clustering or Gaussian Mixture Model) to obtain a visual codebook. [sent-39, score-0.546]
16 First, most of the popular visual descriptors are based on histograms of various image measurements such as spatial gradients, optical flow, or color. [sent-43, score-0.63]
17 In this paper we demonstrate that HIK and other additive kernels can be used to generate better visual codebooks with unsupervised learning, in comparison to the popular k-means clustering method. [sent-50, score-0.863]
18 Specifically, this paper makes four contributions: First, we show that HIK generates better codebooks and thus improves recognition accuracy. [sent-53, score-0.426]
19 (2008), such that the generation and application of HIK codebook has the same theoretical complexity as the standard k-means. [sent-55, score-0.414]
20 Second, we show that all additive kernels can be used to efficiently generate visual codebooks. [sent-58, score-0.483]
21 Section 3 introduces the histogram intersection kernel and various other additive kernels, and presents a general kernel k-means based method for visual codebook generation. [sent-70, score-1.354]
22 In Section 4 we propose methods to efficiently generate visual codebooks for HIK and other additive kernels. [sent-71, score-0.793]
23 Related Works In this section we will briefly review two categories of related works: different kernels for comparing histograms, and various visual codebook generation methods. [sent-74, score-0.764]
24 The main point of this paper is that when histogram features are employed, the histogram intersection kernel or another additive kernel should be used to compare them. [sent-75, score-1.057]
25 (2008) in two ways: First, we demonstrate that the speedup of HIK can be extended to codebook generation (and unsupervised learning in general). [sent-85, score-0.441]
26 On the visual codebook side, k-means is the most widely used method for visual codebook generation (Sivic and Zisserman, 2003). [sent-87, score-1.36]
27 3099 W U , TAN AND R EHG method was presented in Tuytelaars and Schmid (2007), which divided the space of visual descriptors into regular lattice instead of learning a division of the space from training data. [sent-94, score-0.393]
28 In this work, we propose a new alternative to k-means, based on the histogram intersection kernel and other additive kernels. [sent-100, score-0.636]
29 In k-means based methods, a visual code word is usually represented by the cluster center (that is, the average of all features that belong to this code word), which is simple and fast to compute. [sent-101, score-0.621]
30 It was discovered that assigning a feature to multiple code words (which is also termed as softassignment) may improve the codebook quality (Philbin et al. [sent-102, score-0.491]
31 Another interesting representation is the hyperfeature in Agarwal and Triggs (2008), which considers the mapped code word indexes as a type of image feature and repeatedly generates new codebooks and code words into a hierarchy. [sent-109, score-0.728]
32 Nist´ r e and Stew´ nius (2006) used a tree structure to divide the space of visual descriptors hierarchically e and Moosmann et al. [sent-111, score-0.393]
33 Both methods map visual features to code words much faster than k-means. [sent-113, score-0.459]
34 (2008) unified the codebook generation step with the classifier learning step seamlessly. [sent-116, score-0.414]
35 , xd ) ∈ Rd be a histogram of non-negative real values with d histogram bins, where + R+ is the set of non-negative real numbers. [sent-129, score-0.634]
36 x could represent an image (for example, a histogram of visual code words in the bag of visual words model) or an image patch (for example, a SIFT visual descriptor). [sent-130, score-1.514]
37 The histogram intersection kernel κHI is defined as follows (Swain and Ballard, 1991): κHI (x1 , x2 ) = d ∑ min(x1, j , x2, j ) . [sent-131, score-0.521]
38 2 Additive Kernels Algorithm 1 is not restricted to work only with the histogram intersection kernel. [sent-150, score-0.443]
39 It is obvious that the histogram intersection kernel is an instance of the additive kernels. [sent-157, score-0.636]
40 If g(·) is a non-negative and non-decreasing function, then the generalized histogram intersection kernel, κ(x1 , x2 ) = d ∑ g(min(x1, j , x2, j )) , j=1 is a valid additive kernel. [sent-159, score-0.558]
41 + 2: {The output is a function that maps a histogram to its visual code word index, w1 (x∗ ) : Rd → + {1, . [sent-172, score-0.788]
42 5: repeat 6: For every input histogram xi , find the visual code word that xi belongs to, and denote the index of this visual code word as li : li ← arg min φ(xi ) − m j 2 , 1 ≤ i ≤ n. [sent-193, score-1.259]
43 1≤ j≤m For every visual code word mi , find the set of the input histograms that belong to this visual code word, and denote this set as πi : 7: πi = { j|l j = i, 1 ≤ j ≤ n}, 1 ≤ i ≤ m. [sent-194, score-1.08]
44 8: For every visual code word mi , update it to be the centroid of input histograms that belong to this visual code word: ∑ j∈πi φ(x j ) mi ← , 1 ≤ i ≤ m. [sent-195, score-1.121]
45 11: Output: For any histogram x∗ ∈ Rd , its corresponding visual code word index is: + 10: w1 (x∗ ) = arg min φ(x∗ ) − mi 2 . [sent-198, score-0.829]
46 3 K-median Codebook Generation Although k-means (or equivalently, using the ℓ2 distance) is the most popular codebook generation method, the histogram intersection kernel has a closer connection to the ℓ1 distance. [sent-205, score-0.935]
47 An online k-median algorithm has been used by Larlus and Jurie to create visual codebooks in the Pascal challenge (Everingham et al. [sent-216, score-0.66]
48 In Section 5, we empirically compare visual codebooks generated by the k-median algorithm to those generated by both the k-means algorithm and the proposed additive kernel k-means method. [sent-218, score-0.853]
49 In this section, we first propose an efficient kernel k-means algorithm for the histogram intersection kernel, and then generalize the algorithm to all additive kernels. [sent-222, score-0.636]
50 Thus, we need to compute this term only once for every visual code word in each iteration. [sent-231, score-0.471]
51 For example, if we use the histogram intersection kernel and compute this term literally using Equation 1, the complexity is O(|πi |d). [sent-234, score-0.521]
52 (2008) proposed fast methods to compute Equation 4 for the histogram intersection kernel to improve the testing speed of HIK SVM classifiers, which achieved an exact answer of Equation 4 in O(d log2 |π|) steps and an approximate answer in O(d) steps. [sent-247, score-0.521]
53 A histogram of visual code word indexes has the property that every histogram component is a non-negative integer, that is, it is a vector in Nd . [sent-249, score-1.105]
54 Similarly, a visual descriptor histogram can usually be transformed into the space Nd . [sent-250, score-0.652]
55 The naive implementation does not need additional storage during the visual codebook generation step. [sent-265, score-0.744]
56 4 One Class SVM Codebook Representation A codebook generated by the k-means algorithm first divides the space Rd into m regions, and then represents each code word (or, region) by the centroid of the examples (histogram, feature vectors, etc. [sent-275, score-0.529]
57 For example, if we model regions as Gaussian distributions with distinct covariance matrices, the generation of codebooks and mapping from visual features to code words will require much more storage and computational resources than we can afford. [sent-282, score-0.893]
58 o The one-class SVM summarizes the distribution of histograms inside a visual code word. [sent-294, score-0.568]
59 We believe that this compact hypersphere better summarizes a visual code word. [sent-297, score-0.409]
60 We propose Algorithm 4 to use one-class SVM to generate visual code words. [sent-300, score-0.427]
61 In other words, a histogram is considered as belonging to the i-th visual word if it is inside the sphere (in the feature space Φ) centered at mi with radius ri . [sent-319, score-0.756]
62 A sphere in Φ is different from a usual k-means sphere because it respects the similarity measure κ, and its radius ri automatically adapts to the distribution of histograms in a visual word. [sent-320, score-0.465]
63 In each train/test splitting, a visual codebook is generated using the training images, and both training and testing images are transformed into histograms of code words. [sent-329, score-0.902]
64 The proposed algorithms can efficiently process a huge number of histogram features, for example, approximately 200k to 320k histograms are clustered across the first three data sets in less than 6 minute. [sent-331, score-0.476]
65 We use two types of visual descriptors: SIFT for Caltech 101, CENTRIST (CENsus TRansform hISTogram, refer to Wu and Rehg (2011) for more details) for the scene, event, and indoor data sets. [sent-334, score-0.396]
66 ) ¯ The first step is to use visual descriptors from the training images to form a visual codebook, in which we use m = 200 to generate 200 visual code words. [sent-336, score-1.126]
67 Thus an image or image sub-window is represented by a histogram of code words in the specified image region. [sent-338, score-0.6]
68 Since we are classifying histograms, we modified both LIBSVM and BSVM so that they are able to use the histogram intersection kernel. [sent-348, score-0.443]
69 Mean / standard deviation values and paired t-tests are used to show the benefit of histogram kernel codebooks (Algorithm 1), while the Wilcoxon test is used for evaluating the one-class SVM code word generation method (Algorithm 4). [sent-369, score-0.994]
70 κHI and κLIN means that a histogram intersection or a linear kernel visual codebook is used, respectively. [sent-372, score-1.161]
71 HIK codebooks also have advantages over k-median codebooks in most cases. [sent-383, score-0.708]
72 One-class SVM improves histogram intersection kernel code words (Algorithm 4). [sent-388, score-0.648]
73 In summary, using HIK codebooks and one-class SVM together generated the best results in almost all cases (best results are shown in boldface within each column of Table 2). [sent-394, score-0.354]
74 As shown in Table 2, HIK codebooks outperformed k-median codebooks in most cases. [sent-404, score-0.708]
75 3 Experimental Results for Additive Kernels Experiments with codebooks generated using the other two additive kernels (χ2 and exponential HIK) are shown in Table 4. [sent-409, score-0.513]
76 For ease of comparison, results of HIK (without one-class SVM) codebooks are also shown in Table 4. [sent-410, score-0.354]
77 However, all three additive kernel based codebooks generally have higher accuracies than the standard k-means codebook generation method. [sent-412, score-0.996]
78 Since the time complexity of additive kernel based codebooks is the same as that of the k-means method, it is advantageous to apply such kernels in generating visual codebooks. [sent-413, score-0.897]
79 For example, the χ2 kernel in some cases leads to higher accuracies than the histogram intersection kernel. [sent-414, score-0.556]
80 There is not an obvious kernel for the ℓ1 distance, so we did not use one-class SVM for codebooks generated by k-median. [sent-418, score-0.432]
81 3112 V ISUAL C ODEBOOK G ENERATION U SING A DDITIVE K ERNELS κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 67. [sent-419, score-0.44]
82 70% (a) Caltech 101, 15 train, 20 test κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 84. [sent-449, score-0.44]
83 58% (b) 15 class scene, 100 train, rest test κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 84. [sent-479, score-0.44]
84 76% (c) 8 class sports, 70 train, 60 test κHI , ocsvm κHI , ¬ocsvm k-median κLIN , ocsvm κLIN , ¬ocsvm B, s = 4 43. [sent-509, score-0.44]
85 72% (d) 67 class indoor, 80 train, 20 test Table 2: Results of HIK, k-median and k-means codebooks and one-class SVM code words. [sent-539, score-0.457]
86 Computation time Codebook storage size HIK 2 v ¯ k-median 2 1 k-means 1 1 Table 3: Comparison of three codebook generation methods. [sent-541, score-0.414]
87 HIK codebooks are used, with ocsvm , ¬B and s = 8. [sent-635, score-0.574]
88 As shown in Table 6, if we use SIFT for scene recognition and CENTRIST for object recognition, the recognition accuracies are reduced. [sent-672, score-0.393]
89 The proposed method is significantly better than standard k-means codebooks with more visual code words. [sent-682, score-0.763]
90 Scene recognition requires different type of features (CENTRIST instead of SIFT) and less information (performance almost stabilized when step size is 8 and codebook size is 200). [sent-685, score-0.432]
91 Conclusion In this article, we show that when the histogram intersection kernel is used as the similarity measure in clustering visual descriptors that are histograms, the generated visual codebooks produce better code words and as a consequence, improve the bag of visual words model. [sent-688, score-2.081]
92 We propose a HIK based codebook generation method which runs almost as fast as k-means and has consistently higher accuracies than k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. [sent-689, score-1.089]
93 As an alternative to k-means, in which cluster centroids are used to represent code words, we proposed a one-class SVM formulation to generate better visual code words. [sent-690, score-0.551]
94 We also generalize the proposed visual codebook generation method to arbitrary additive kernels. [sent-691, score-0.835]
95 Although k-median is rarely used to generate codebooks, we empirically evaluated k-median codebooks and recommend it as a compromise between the proposed method and k-means. [sent-694, score-0.396]
96 K3115 W U , TAN AND R EHG median codebooks have lower accuracies than HIK codebooks but usually have higher accuracy than k-means codebooks. [sent-695, score-0.761]
97 The PASCAL visual object classes challenge 2006 (VOC 2006) results, 2006. [sent-740, score-0.358]
98 Learning generative visual models from few training example: an incremental Bayesian approach tested on 101 object categories. [sent-742, score-0.358]
99 Beyond the Euclidean distance: Creating effective visual codebooks using the histogram intersection kernel. [sent-877, score-1.103]
100 Unifying discriminative visual codebook generation with classifier training for object category recognition. [sent-891, score-0.772]
wordName wordTfidf (topN-words)
[('hik', 0.431), ('codebooks', 0.354), ('codebook', 0.334), ('histogram', 0.317), ('visual', 0.306), ('ocsvm', 0.22), ('scene', 0.162), ('histograms', 0.159), ('hi', 0.134), ('intersection', 0.126), ('additive', 0.115), ('caltech', 0.106), ('code', 0.103), ('maji', 0.096), ('indoor', 0.09), ('svm', 0.088), ('descriptors', 0.087), ('dditive', 0.086), ('ehg', 0.086), ('isual', 0.086), ('odebook', 0.086), ('rehg', 0.086), ('sift', 0.081), ('generation', 0.08), ('kernel', 0.078), ('centrist', 0.077), ('recognition', 0.072), ('vision', 0.071), ('eneration', 0.066), ('word', 0.062), ('sports', 0.06), ('ernels', 0.06), ('jianxin', 0.057), ('wu', 0.056), ('tan', 0.053), ('object', 0.052), ('image', 0.052), ('sing', 0.05), ('ehi', 0.048), ('lin', 0.046), ('kernels', 0.044), ('zisserman', 0.044), ('lazebnik', 0.041), ('mi', 0.041), ('libsvm', 0.04), ('bsvm', 0.04), ('bov', 0.038), ('rd', 0.037), ('vedaldi', 0.037), ('accuracies', 0.035), ('triggs', 0.033), ('feature', 0.03), ('descriptor', 0.029), ('antonio', 0.029), ('nmd', 0.029), ('odone', 0.029), ('sobel', 0.029), ('swain', 0.029), ('berg', 0.027), ('speedup', 0.027), ('features', 0.026), ('clustering', 0.026), ('spatial', 0.026), ('james', 0.025), ('bill', 0.024), ('dalal', 0.024), ('densely', 0.024), ('jurie', 0.024), ('sivic', 0.024), ('bag', 0.024), ('naive', 0.024), ('words', 0.024), ('compromise', 0.024), ('distance', 0.023), ('ci', 0.023), ('gao', 0.022), ('accelerate', 0.022), ('cluster', 0.021), ('train', 0.019), ('ballard', 0.019), ('boiman', 0.019), ('boughorbel', 0.019), ('cordelia', 0.019), ('frederic', 0.019), ('gemert', 0.019), ('hog', 0.019), ('libhik', 0.019), ('moosmann', 0.019), ('philbin', 0.019), ('stew', 0.019), ('subhransu', 0.019), ('svetlana', 0.019), ('tuytelaars', 0.019), ('recognizing', 0.019), ('classi', 0.019), ('gradients', 0.019), ('euclidean', 0.019), ('accuracy', 0.018), ('generate', 0.018), ('ieee', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg
Abstract: Common visual codebook generation methods used in a bag of visual words model, for example, k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that with histogram features, the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks. In this paper, we demonstrate that HIK can be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. The algorithm is also generalized to arbitrary additive kernels. Its speed is thousands of times faster than a naive implementation of the kernel k-means algorithm. In addition, we propose a one-class SVM formulation to create more effective visual code words. Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. Keywords: visual codebook, additive kernel, histogram intersection kernel
2 0.070056267 101 jmlr-2011-Variable Sparsity Kernel Learning
Author: Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath, Sankaran Raman
Abstract: paper1 This presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls , s ≥ 2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O m2 ntot log nmax /ε2 where m is no. training data points, nmax , ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in terms of computational efficiency. Keywords: multiple kernel learning, mirror descent, mixed-norm, object categorization, scalability 1. All authors contributed equally. The author names appear in alphabetical order. c 2011 Jonathan Aflalo, Aharon Ben-Tal, Chiranjib Bhattacharyya, Jagarlapudi Saketha Nath and Sankaran Raman. A FLALO , B EN -TAL , B HATTACHARYYA , NATH AND R AMAN
3 0.045771465 55 jmlr-2011-Learning Multi-modal Similarity
Author: Brian McFee, Gert Lanckriet
Abstract: In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, including nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transformations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multimedia similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure. Keywords: multiple kernel learning, metric learning, similarity
4 0.04353033 66 jmlr-2011-Multiple Kernel Learning Algorithms
Author: Mehmet Gönen, Ethem Alpaydın
Abstract: In recent years, several methods have been proposed to combine multiple kernels instead of using a single one. These different kernels may correspond to using different notions of similarity or may be using information coming from multiple sources (different representations or different feature subsets). In trying to organize and highlight the similarities and differences between them, we give a taxonomy of and review several multiple kernel learning algorithms. We perform experiments on real data sets for better illustration and comparison of existing algorithms. We see that though there may not be large differences in terms of accuracy, there is difference between them in complexity as given by the number of stored support vectors, the sparsity of the solution as given by the number of used kernels, and training time complexity. We see that overall, using multiple kernels instead of a single one is useful and believe that combining kernels in a nonlinear or data-dependent way seems more promising than linear combination in fusing information provided by simple linear kernels, whereas linear methods are more reasonable when combining complex Gaussian kernels. Keywords: support vector machines, kernel machines, multiple kernel learning
5 0.04345401 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels
Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt
Abstract: In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this Weisfeiler-Lehman sequence of graphs, including a highly efficient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classification, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT
6 0.043260984 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
7 0.038026966 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
8 0.036915813 98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures
9 0.036677845 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
10 0.036253642 58 jmlr-2011-Learning from Partial Labels
11 0.035054453 105 jmlr-2011-lp-Norm Multiple Kernel Learning
12 0.03313458 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis
13 0.032974675 61 jmlr-2011-Logistic Stick-Breaking Process
14 0.032324269 48 jmlr-2011-Kernel Analysis of Deep Networks
15 0.029932898 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
16 0.027501648 68 jmlr-2011-Natural Language Processing (Almost) from Scratch
17 0.027203318 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
18 0.026611034 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
19 0.023371406 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding
20 0.023331176 16 jmlr-2011-Clustering Algorithms for Chains
topicId topicWeight
[(0, 0.129), (1, -0.064), (2, 0.053), (3, -0.128), (4, -0.013), (5, 0.006), (6, -0.006), (7, 0.001), (8, -0.035), (9, -0.067), (10, -0.014), (11, 0.029), (12, 0.03), (13, -0.017), (14, -0.064), (15, -0.061), (16, -0.009), (17, -0.018), (18, -0.021), (19, -0.076), (20, -0.026), (21, 0.007), (22, -0.095), (23, 0.049), (24, -0.062), (25, -0.005), (26, -0.115), (27, 0.123), (28, -0.176), (29, -0.119), (30, 0.002), (31, -0.257), (32, 0.08), (33, -0.114), (34, 0.146), (35, 0.273), (36, -0.111), (37, -0.065), (38, -0.197), (39, 0.219), (40, -0.096), (41, -0.131), (42, 0.213), (43, 0.139), (44, -0.003), (45, -0.225), (46, 0.17), (47, 0.048), (48, -0.057), (49, -0.107)]
simIndex simValue paperId paperTitle
same-paper 1 0.95734602 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg
Abstract: Common visual codebook generation methods used in a bag of visual words model, for example, k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that with histogram features, the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks. In this paper, we demonstrate that HIK can be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. The algorithm is also generalized to arbitrary additive kernels. Its speed is thousands of times faster than a naive implementation of the kernel k-means algorithm. In addition, we propose a one-class SVM formulation to create more effective visual code words. Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. Keywords: visual codebook, additive kernel, histogram intersection kernel
2 0.32587889 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
Author: Paramveer S. Dhillon, Dean Foster, Lyle H. Ungar
Abstract: We propose a framework MIC (Multiple Inclusion Criterion) for learning sparse models based on the information theoretic Minimum Description Length (MDL) principle. MIC provides an elegant way of incorporating arbitrary sparsity patterns in the feature space by using two-part MDL coding schemes. We present MIC based models for the problems of grouped feature selection (MICG ROUP) and multi-task feature selection (MIC-M ULTI). MIC-G ROUP assumes that the features are divided into groups and induces two level sparsity, selecting a subset of the feature groups, and also selecting features within each selected group. MIC-M ULTI applies when there are multiple related tasks that share the same set of potentially predictive features. It also induces two level sparsity, selecting a subset of the features, and then selecting which of the tasks each feature should be added to. Lastly, we propose a model, T RANS F EAT, that can be used to transfer knowledge from a set of previously learned tasks to a new task that is expected to share similar features. All three methods are designed for selecting a small set of predictive features from a large pool of candidate features. We demonstrate the effectiveness of our approach with experimental results on data from genomics and from word sense disambiguation problems.1 Keywords: feature selection, minimum description length principle, multi-task learning
3 0.31488168 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels
Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt
Abstract: In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this Weisfeiler-Lehman sequence of graphs, including a highly efficient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classification, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT
4 0.30945244 58 jmlr-2011-Learning from Partial Labels
Author: Timothee Cour, Ben Sapp, Ben Taskar
Abstract: We address the problem of partially-labeled multiclass classification, where instead of a single label per instance, the algorithm is given a candidate set of labels, only one of which is correct. Our setting is motivated by a common scenario in many image and video collections, where only partial access to labels is available. The goal is to learn a classifier that can disambiguate the partiallylabeled training instances, and generalize to unseen data. We define an intuitive property of the data distribution that sharply characterizes the ability to learn in this setting and show that effective learning is possible even when all the data is only partially labeled. Exploiting this property of the data, we propose a convex learning formulation based on minimization of a loss function appropriate for the partial label setting. We analyze the conditions under which our loss function is asymptotically consistent, as well as its generalization and transductive performance. We apply our framework to identifying faces culled from web news sources and to naming characters in TV series and movies; in particular, we annotated and experimented on a very large video data set and achieve 6% error for character naming on 16 episodes of the TV series Lost. Keywords: weakly supervised learning, multiclass classification, convex learning, generalization bounds, names and faces
5 0.27964967 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
6 0.25802574 3 jmlr-2011-A Cure for Variance Inflation in High Dimensional Kernel Principal Component Analysis
7 0.24927698 100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels
8 0.20211282 98 jmlr-2011-Universality, Characteristic Kernels and RKHS Embedding of Measures
9 0.20030805 66 jmlr-2011-Multiple Kernel Learning Algorithms
10 0.19851746 55 jmlr-2011-Learning Multi-modal Similarity
11 0.19286706 101 jmlr-2011-Variable Sparsity Kernel Learning
12 0.1907997 83 jmlr-2011-Scikit-learn: Machine Learning in Python
13 0.18412198 68 jmlr-2011-Natural Language Processing (Almost) from Scratch
14 0.17807548 57 jmlr-2011-Learning a Robust Relevance Model for Search Using Kernel Methods
15 0.17536648 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms
16 0.16870846 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
17 0.15783481 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination
18 0.15724744 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals
19 0.15328395 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package
20 0.15202205 105 jmlr-2011-lp-Norm Multiple Kernel Learning
topicId topicWeight
[(4, 0.023), (9, 0.016), (10, 0.027), (24, 0.044), (31, 0.103), (32, 0.013), (41, 0.014), (60, 0.022), (71, 0.529), (73, 0.042), (78, 0.031), (90, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.84140903 31 jmlr-2011-Efficient and Effective Visual Codebook Generation Using Additive Kernels
Author: Jianxin Wu, Wei-Chian Tan, James M. Rehg
Abstract: Common visual codebook generation methods used in a bag of visual words model, for example, k-means or Gaussian Mixture Model, use the Euclidean distance to cluster features into visual code words. However, most popular visual descriptors are histograms of image measurements. It has been shown that with histogram features, the Histogram Intersection Kernel (HIK) is more effective than the Euclidean distance in supervised learning tasks. In this paper, we demonstrate that HIK can be used in an unsupervised manner to significantly improve the generation of visual codebooks. We propose a histogram kernel k-means algorithm which is easy to implement and runs almost as fast as the standard k-means. The HIK codebooks have consistently higher recognition accuracy over k-means codebooks by 2–4% in several benchmark object and scene recognition data sets. The algorithm is also generalized to arbitrary additive kernels. Its speed is thousands of times faster than a naive implementation of the kernel k-means algorithm. In addition, we propose a one-class SVM formulation to create more effective visual code words. Finally, we show that the standard kmedian clustering method can be used for visual codebook generation and can act as a compromise between the HIK / additive kernel and the k-means approaches. Keywords: visual codebook, additive kernel, histogram intersection kernel
2 0.75195348 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
Author: Cassio P. de Campos, Qiang Ji
Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique
3 0.2845583 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing
Author: Jennifer Gillenwater, Kuzman Ganchev, João Graça, Fernando Pereira, Ben Taskar
Abstract: A strong inductive bias is essential in unsupervised grammar induction. In this paper, we explore a particular sparsity bias in dependency grammars that encourages a small number of unique dependency types. We use part-of-speech (POS) tags to group dependencies by parent-child types and investigate sparsity-inducing penalties on the posterior distributions of parent-child POS tag pairs in the posterior regularization (PR) framework of Graça et al. (2007). In experiments with 12 different languages, we achieve significant gains in directed attachment accuracy over the standard expectation maximization (EM) baseline, with an average accuracy improvement of 6.5%, outperforming EM by at least 1% for 9 out of 12 languages. Furthermore, the new method outperforms models based on standard Bayesian sparsity-inducing parameter priors with an average improvement of 5% and positive gains of at least 1% for 9 out of 12 languages. On English text in particular, we show that our approach improves performance over other state-of-the-art techniques.
4 0.27253011 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation
Author: Amarnag Subramanya, Jeff Bilmes
Abstract: We describe a new objective for graph-based semi-supervised learning based on minimizing the Kullback-Leibler divergence between discrete probability measures that encode class membership probabilities. We show how the proposed objective can be efficiently optimized using alternating minimization. We prove that the alternating minimization procedure converges to the correct optimum and derive a simple test for convergence. In addition, we show how this approach can be scaled to solve the semi-supervised learning problem on very large data sets, for example, in one instance we use a data set with over 108 samples. In this context, we propose a graph node ordering algorithm that is also applicable to other graph-based semi-supervised learning approaches. We compare the proposed approach against other standard semi-supervised learning algorithms on the semi-supervised learning benchmark data sets (Chapelle et al., 2007), and other real-world tasks such as text classification on Reuters and WebKB, speech phone classification on TIMIT and Switchboard, and linguistic dialog-act tagging on Dihana and Switchboard. In each case, the proposed approach outperforms the state-of-the-art. Lastly, we show that our objective can be generalized into a form that includes the standard squared-error loss, and we prove a geometric rate of convergence in that case. Keywords: graph-based semi-supervised learning, transductive inference, large-scale semi-supervised learning, non-parametric models
5 0.27143189 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series
Author: Graham W. Taylor, Geoffrey E. Hinton, Sam T. Roweis
Abstract: In this paper we develop a class of nonlinear generative models for high-dimensional time series. We first propose a model based on the restricted Boltzmann machine (RBM) that uses an undirected model with binary latent variables and real-valued “visible” variables. The latent and visible variables at each time step receive directed connections from the visible variables at the last few time-steps. This “conditional” RBM (CRBM) makes on-line inference efficient and allows us to use a simple approximate learning procedure. We demonstrate the power of our approach by synthesizing various sequences from a model trained on motion capture data and by performing on-line filling in of data lost during capture. We extend the CRBM in a way that preserves its most important computational properties and introduces multiplicative three-way interactions that allow the effective interaction weight between two variables to be modulated by the dynamic state of a third variable. We introduce a factoring of the implied three-way weight tensor to permit a more compact parameterization. The resulting model can capture diverse styles of motion with a single set of parameters, and the three-way interactions greatly improve its ability to blend motion styles or to transition smoothly among them. Videos and source code can be found at http://www.cs.nyu.edu/˜gwtaylor/publications/ jmlr2011. Keywords: unsupervised learning, restricted Boltzmann machines, time series, generative models, motion capture
6 0.26747945 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
7 0.26714769 48 jmlr-2011-Kernel Analysis of Deep Networks
8 0.26676193 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models
9 0.26579887 78 jmlr-2011-Producing Power-Law Distributions and Damping Word Frequencies with Two-Stage Language Models
10 0.2643964 95 jmlr-2011-Training SVMs Without Offset
11 0.26155162 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
12 0.25847036 16 jmlr-2011-Clustering Algorithms for Chains
13 0.25794801 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning
14 0.25762045 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
15 0.25445139 101 jmlr-2011-Variable Sparsity Kernel Learning
16 0.25210327 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets
17 0.25100926 99 jmlr-2011-Unsupervised Similarity-Based Risk Stratification for Cardiovascular Events Using Long-Term Time-Series Data
18 0.24837716 66 jmlr-2011-Multiple Kernel Learning Algorithms
19 0.24752928 102 jmlr-2011-Waffles: A Machine Learning Toolkit
20 0.24582909 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals