nips nips2006 nips2006-103 knowledge-graph by maker-knowledge-mining

103 nips-2006-Kernels on Structured Objects Through Nested Histograms

Source: pdf

Author: Marco Cuturi, Kenji Fukumizu

Abstract: We propose a family of kernels for structured objects which is based on the bag-ofcomponents paradigm. However, rather than decomposing each complex object into the single histogram of its components, we use for each object a family of nested histograms, where each histogram in this hierarchy describes the object seen from an increasingly granular perspective. We use this hierarchy of histograms to deﬁne elementary kernels which can detect coarse and ﬁne similarities between the objects. We compute through an efﬁcient averaging trick a mixture of such speciﬁc kernels, to propose a ﬁnal kernel value which weights efﬁciently local and global matches. We propose experimental results on an image retrieval experiment which show that this mixture is an effective template procedure to be used with kernels on histograms.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract We propose a family of kernels for structured objects which is based on the bag-ofcomponents paradigm. [sent-3, score-0.483]

2 However, rather than decomposing each complex object into the single histogram of its components, we use for each object a family of nested histograms, where each histogram in this hierarchy describes the object seen from an increasingly granular perspective. [sent-4, score-0.44]

3 We use this hierarchy of histograms to deﬁne elementary kernels which can detect coarse and ﬁne similarities between the objects. [sent-5, score-0.986]

4 We compute through an efﬁcient averaging trick a mixture of such speciﬁc kernels, to propose a ﬁnal kernel value which weights efﬁciently local and global matches. [sent-6, score-0.237]

5 We propose experimental results on an image retrieval experiment which show that this mixture is an effective template procedure to be used with kernels on histograms. [sent-7, score-0.352]

6 , in bioinformatics, image and text analysis or signal processing) requires more arbitrary choices, both to represent the objects in a malleable form, and to choose suitable kernels on these representations. [sent-12, score-0.424]

7 The challenge of using kernel methods on real-world data has thus recently fostered many proposals for kernels on complex objects, notably strings, trees, images or graphs to cite a few. [sent-13, score-0.559]

8 In common practice, most of these objects can be regarded as structured aggregates of smaller components, and the coarsest approach to study such aggregates is to consider them directly as bags of components. [sent-14, score-0.434]

9 In the ﬁeld of kernel methods, such a representation has not only been widely adopted (Haussler, 1999; Joachims, 2002; Sch¨ lkopf et al. [sent-15, score-0.218]

10 , 2004), but it has also spurred the proo posal of kernels better suited to the geometry of the underlying histograms (Kondor & Jebara, 2003; Lafferty & Lebanon, 2005; Hein & Bousquet, 2005; Cuturi et al. [sent-16, score-0.653]

11 While this viewpoint may translate into adequate properties for some learning tasks, such as translation or rotation invariance when using histograms of colors to manipulate images (Chapelle et al. [sent-19, score-0.589]

12 As one would expect, these histograms are usually a sparse and need to be regularized using ad-hoc rules and prior knowledge (Leslie et al. [sent-22, score-0.394]

13 , 2003) before being directly compared using kernels on histograms. [sent-23, score-0.259]

14 ) kernels if particular care is taken to adapt them (Vert et al. [sent-27, score-0.305]

15 2 t2 Figure 1: From the bag of components representation to a set of nested bags, using a set of labels. [sent-34, score-0.098]

16 in this paper new families of kernels which can be easily tuned to detect both coarse and ﬁne similarities between the objects, in a range spanned from kernels which only consider coarse histograms to kernels which only detect strict local matches. [sent-35, score-1.545]

17 To size such types of similarities between two objects, we elaborate on the elementary bag-of-components perspective to consider instead families of nested histograms (indexed by a set of hierarchical labels to be deﬁned) to describe each object. [sent-36, score-0.688]

18 In this framework, the root label corresponds to the global representation introduced before, while longer labels represent a speciﬁc condition under which the components have been sampled. [sent-37, score-0.092]

19 We then deﬁne kernels that take into account mixtures of similarities, spanning from detailed resolutions which only compare the smallest bags to the coarsest one. [sent-38, score-0.556]

20 This trade-off between ﬁne and coarse perspectives sets an averaging framework to deﬁne kernels, which we introduce formally in Section 2. [sent-39, score-0.142]

21 This theoretical framework would not be tractable without an efﬁcient factorization detailed in Section 3 which yields computations which grow linearly in time and space with respect to the number of labels to evaluate the value of the kernel. [sent-40, score-0.091]

22 We then provide experimental results in Section 4 on an image retrieval task which shows that the methodology improves the performance of kernel based state-of-the art techniques in this ﬁeld with a low extra computational cost. [sent-41, score-0.265]

23 2 Kernels Deﬁned through Hierarchies of Histograms In the kernel literature, structured objects are usually represented as histograms of components, e. [sent-42, score-0.67]

24 , images as histograms of colors and/or features, texts as bags of words and sequences as histograms of letters or n-grams. [sent-44, score-1.017]

25 One may instead create families of histograms, indexed by speciﬁc sampling conditions: • In image analysis, create color or feature histograms following a prior partition of the image into predeﬁned patches, as in (Grauman & Darrell, 2005). [sent-46, score-0.806]

26 Another possibility would be to deﬁne families of histograms, all for the same image, which would consider increasingly granular discretizations of the color space. [sent-47, score-0.21]

27 • In sequence analysis, extract local histograms which may correspond to predeﬁned regions of the original sequence, as in (Matsuda et al. [sent-48, score-0.394]

28 by considering the 26 histogram of letters sampled just after the letters {A, B, · · · , Z}, or the 26 × 26 histograms of letters after contexts {AA, AB, · · · , ZZ}. [sent-52, score-0.62]

29 • In text analysis, use histograms of words found after grammatical categories of increasing complexity, such as verbs, nouns, articles or adverbs. [sent-53, score-0.417]

30 an index or a speciﬁc gene) and decompose each of the subsequent series into histograms of values conditioned to the value of the reference series. [sent-58, score-0.388]

31 Structured objects are thus def b represented as a family µ of ML (X ) = (M+ (X ))L , that is µ = {µt }t∈L where for each t ∈ L, µt b is a bounded measure of M+ (X ). [sent-60, score-0.349]

32 1 Local Similarities Between Measures To compare two objects under the light of any sampling condition t, that is comparing their respecb tive decompositions as measures µt and µt , we make use of an arbitrary p. [sent-63, score-0.144]

33 kernel k on M+ (X ) to which we will refer as the base kernel throughout the paper. [sent-65, score-0.407]

34 For interpretation purposes only, we will assume in the following sections that k is an inﬁnitely divisible kernel which can be written 1 b as k = e− λ ψ , λ > 0, where ψ is a negative deﬁnite (Berg et al. [sent-66, score-0.218]

35 , 1984) kernel on M+ (X ), or equivalently −ψ is a conditionally p. [sent-67, score-0.172]

36 For two elements µ, µ of ML (X ) and a given element t ∈ L, the kernel def kt (µ, µ ) = k(µt , µt ) quantiﬁes the similarity of µ and µ by measuring how similarly their components were observed with respect to label t. [sent-73, score-0.623]

37 For two different labels s and t of L, ks and kt can be associated through polynomial combinations with positive coefﬁcients to result in new kernels, notably their sum ks +kt or their product ks kt . [sent-74, score-0.668]

38 This is particularly adequate if some complementarity is assumed between s and t, so that their combination can provide new insights for a given learning task. [sent-75, score-0.092]

39 If on the contrary these labels are assumed to be similar, then they can be regarded as a grouped label {s} ∪ {t} and result in the kernel def k{s}∪{t} (µ, µ ) = k(µs + µt , µs + µt ), which will measure the similarity of m and m under both s or t labels. [sent-76, score-0.507]

40 Let us give an intuition for this deﬁnition by considering two texts A, B built up with words from a dictionary D. [sent-77, score-0.155]

41 As an b alternative to the general histograms of words θA and θB of M+ (D), one may consider for instance A A B B θcan , θmay and θcan , θmay , the respective histograms of words that follow the words can and may in texts A and B respectively. [sent-78, score-0.883]

42 If one considers that can and may are different words, then the following kernel quantiﬁes the similarity of A and B taking advantage of this difference: B A B A k{can},{may} (A, B) = k(θcan , θcan ) × k(θmay , θmay ). [sent-79, score-0.212]

43 If on the contrary one decides that can and may are equivalent, an adequate kernel would ﬁrst merge the histograms, and then compare them: A A B B k{can,may} (A, B) = k(θcan + θmay , θcan + θmay ). [sent-80, score-0.264]

44 The previous formula can be naturally extended to deﬁne kernels indexed on a set T ⊂ L of grouped labels, through def def def kT (µ, µ ) = k (µT , µT ) , where µT = µt and µT = t∈ T µt . [sent-81, score-0.838]

45 Let P be a ﬁnite partition of L, that is a ﬁnite family P = (T1 , . [sent-84, score-0.212]

46 We write P(L) for the set of all partitions of L. [sent-88, score-0.163]

47 Consider now the kernel deﬁned by a partition P as n def kP (µ, µ ) = kTi (µ, µ ). [sent-89, score-0.475]

48 i=1 (1) The kernel kP quantiﬁes the similarity between two objects by detecting their joint similarity under all possible labels of L, assuming a priori that certain labels can be grouped together, following the subsets Ti enumerated in the partition P . [sent-90, score-0.699]

49 Note that there is some arbitrary in this deﬁnition since a simple multiplication of base kernels kTi is used to deﬁne kP , rather than any other polynomial combination. [sent-91, score-0.322]

50 We follow in that sense the convolution kernels (Haussler, 1999) approach, and indeed, for each partition P , kP can be regarded as a convolution kernel. [sent-92, score-0.541]

51 More precisely, the multiplicative structure of Equation (1) quantiﬁes how similar two objects are given a partition P , in a way that imposes for the objects to be similar according to all subsets Ti . [sent-93, score-0.358]

52 Figure 2: A useful set of labels L for images which would focus on pixel localization can be represented by a grid, such as the 8 × 8 one represented above. [sent-95, score-0.115]

53 Any partition 3 P of the image which complies with the hierarchy P0 in the example above, can in turn be used to represent an image as a family of sub-probability measures, which reduces in the case of two-color images to binary histograms as illustrated in the right-most image. [sent-97, score-0.871]

54 For two images, these respective histograms can be directly compared through the kernel kP . [sent-98, score-0.52]

55 As illustrated in Figure 2, where images are summarized through histograms indexed by patches, a partition of L reﬂects a given belief on how patches may or may not be associated or split to focus on local dissimilarities. [sent-99, score-0.601]

56 Hence, all partitions contained in the set P(L) of all possible partitions1 are not likely to be equally meaningful given that some labels may a natural form of grouping. [sent-100, score-0.254]

57 If one uses a Markovian analysis, that is consider histograms of components conditioned by contexts, a natural way to group contexts would be to group them according to their semantic or grammatical content for text analysis or according to their sufﬁx for sequence analysis. [sent-102, score-0.464]

58 Such meaningful partitions can be intuitively obtained when a hierarchical structure which groups elements of L together is known a priori. [sent-103, score-0.204]

59 A hierarchy on L, such as the triadic hierarchy shown in Figure 3, is a family (Pd )D = {P0 = {L}, . [sent-104, score-0.396]

60 To provide a hierarchical information, the family (Pd )D is such that any subset d=0 present in a partition Pd is strictly included in a (unique by deﬁnition of a partition) subset from the coarser partition Pd−1 . [sent-107, score-0.409]

61 This is equivalent to stating that each subset T in a partition Pd is divided in Pd+1 as a partition of T which is not T itself. [sent-108, score-0.276]

62 Consider now the subset PD ⊂ P(L) of all partitions of L obtained by using only sets contained in the collection def D def D D P0 = d=0 Pd , namely PD = {P ∈ P(L) s. [sent-112, score-0.493]

63 The set PD contains both the coarsest and the ﬁnest resolutions, respectively P0 and PD , but also all variable resolutions for sets D enumerated in P0 , as can be seen for instance in the third image of Figure 2. [sent-115, score-0.357]

64 1 È P(L) is quite a big space, since if L is a ﬁnite set of cardinal r, the cardinal of the set of partitions is known r as the Bell Number of order r with Br = 1 ∞ u ∼ er ln r . [sent-116, score-0.249]

65 e r→∞ 11 1 2 3 4 5 6 7 8 9 19 61 0 73 P0 P1 99 P2 Figure 3: A hierarchy generated by two successive triadic partitions. [sent-118, score-0.186]

66 3 Averaging Resolution Speciﬁc Kernels Each partition P contained in PD provides a resolution to compare two objects, which generates a large family of kernels kP when P spans PD . [sent-120, score-0.51]

67 Some partitions are likely to be better suited for certain tasks, which may call for an efﬁcient estimation scheme to select an optimal partition for a given task. [sent-121, score-0.301]

68 We take in this section a different direction which has a more Bayesian ﬂavor by considering an averaging of such kernels based on a prior on the set of partitions. [sent-123, score-0.364]

69 In practice, this averaging favours objects which share similarities under a large collection of resolutions, and may also be interpreted as a Bayesian averaging of convolution kernels (Haussler, 1999). [sent-124, score-0.641]

70 Deﬁnition 1 Let L be an index set endowed with a hierarchy (Pd )D , π be a prior measure on the d=0 b b corresponding set of partitions PD and k a base kernel on M+ (X ) × M+ (X ). [sent-125, score-0.609]

71 The averaged kernel kπ on ML (X ) × ML (X ) is deﬁned as kπ (µ, µ ) = π(P ) kP (µ, µ ). [sent-126, score-0.172]

72 (2) P ∈ PD As can be observed in Equation (2), the kernel automatically detects in the range of all partitions the ones which provide a good match between the compared objects, to increase subsequently the resulting similarity score. [sent-127, score-0.375]

73 Also note that in an image-analysis context, the pyramid-matching kernel proposed in (Grauman & Darrell, 2005) only considers the original partitions of the hierarchy (Pd )D , while Equation (2) considers all possible partitions of PD . [sent-128, score-0.634]

74 This can be carried out with d=0 little cost if an adequate set of priors π is selected as seen below. [sent-129, score-0.092]

75 1 Partitions Generated by Branching Processes All partitions P of PD can be generated through the following rule, starting from the initial root partition P := P0 = {L}. [sent-132, score-0.301]

76 either replace it by its siblings in s(T ) with probability εT , and reapply this rule to each sibling unless they belong to the ﬁnest partition PD . [sent-135, score-0.231]

77 For a partition P ∈ PD , ◦ D T } gathers π(P ) = T ∈ P (1 − εT ) ◦ (εT ), where the set P = {T ∈ P0 s. [sent-137, score-0.138]

78 ∃V ∈ P, V T∈P all coarser sets belonging to coarser resolutions than P , and can be regarded as the set of all ancestors D in P0 of sets enumerated in P . [sent-139, score-0.358]

79 KT = (1 − εT )kT (µ, µ ) + εT Proof The proof follows from a factorization which uses the branching process prior used for the tree generation, and can be derived from the proof of (Catoni, 2004, Proposition 5. [sent-145, score-0.101]

80 The opposite ﬁgure underlines the importance of incorporating to each node KT a weighted product of the sibling kernel evaluations KU . [sent-147, score-0.215]

81 The update rule for the computation of kπ takes into account the branching process prior by weighting the kernel kT with all values kti obtained for ﬁner resolutions ti in s(T ). [sent-148, score-0.54]

82 This complexity is also upperbounded by the total amount of components considered in the compared objects, as in (Cuturi & Vert, 2005) for instance. [sent-151, score-0.107]

83 3 Choosing the Base Kernel b Any kernel on M+ (X ) can be used to comply with the terms of Deﬁnition 1 and apply an average scheme on families of measures. [sent-153, score-0.275]

84 We also note that an even more general formulation can be obtained by using a different kernel kt for each label t of L, without altering the overall applicability of the factorization above. [sent-154, score-0.417]

85 First, one can note that kernels such as the information diffusion kernel (Lafferty & Lebanon, 2005) and variance based kernels (Kondor & Jebara, 2003; Cuturi et al. [sent-156, score-0.736]

86 The most adequate b geometry of M+ (X ), following the denormalization scheme proposed in (Amari & Nagaoka, 2001, √ p. [sent-160, score-0.092]

87 More generally, one can consider the whole family of kernels for bounded measures described in (Hein & Bousquet, 1 2005) to choose the base kernel k, namely the family of Hilbertian metrics ψ such that k = e− λ ψ . [sent-162, score-0.676]

88 4 Experiments in Image Retrieval We present in this section experiments inspired by the image retrieval task ﬁrst considered in (Chapelle et al. [sent-164, score-0.139]

89 Our dataset was also extracted from the Corel Stock database and includes 12 families of labeled images, each class containing 2 1 log2 ( λ ) 0 -6 -12 0 1/2 ε 1 0. [sent-166, score-0.103]

90 13 Figure 4: Misclassiﬁcation rate on the corel experiment, using the Hellinger H1 distance between 1 histograms coupled with one-vs-all SVM classiﬁcation (C = 100) as a function of λ and ε. [sent-177, score-0.387]

91 ε controls the granularity of the averaging kernel, ranging from the coarsest perspective (ε = 0) when only the global histogram is used, to the ﬁnest one (ε = 1) when only the ﬁnest histograms are considered. [sent-181, score-0.583]

92 3% respectively 100 color images of 256 × 384 pixels. [sent-186, score-0.122]

93 The families depict images of bears, African specialty animals, monkeys, cougars, ﬁreworks, mountains, ofﬁce interiors, bonsais, sunsets, clouds, apes and rocks and gems. [sent-187, score-0.168]

94 We used 9 bits for the color of each pixel to reduce the size of the RGB color space to 83 = 512 from the original set of 2563 = 16, 777, 216 colors, and we deﬁned centered grids of 4, 42 = 16 and 43 = 64 local patches. [sent-193, score-0.114]

95 We provide results for each of the 5 considered kernels and for each considered depth D ranging from 1 to 3. [sent-194, score-0.259]

96 By considering values of ε ranging from 0 to 1, we aim at giving a sketch of the robustness of the averaging approach, since the SVM’s seem to perform better when 0 < ε < 1 for a large span of λ values. [sent-197, score-0.105]

97 Finally, the Gaussian kernel was also tested but its very poor performance (with error rate above 22% for all parameters) illustrates once more that the Gaussian kernel is usually a poor choice to compare histograms directly. [sent-199, score-0.692]

98 , 2004), although we do not aim here at learning linear combinations of the kernels kT , but rather start from an hierarchical belief on them to propose an algebraic combination. [sent-205, score-0.259]

99 H1 H2 TV Xi2 JD D=1 D=2 D=3 Figure 5: Error-rate results for different kernels and depths are displayed in the same way that in Figure 4, using the same colorscale across experiments. [sent-207, score-0.259]

100 Hilbertian metrics and positive deﬁnite kernels on probability measures. [sent-265, score-0.259]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pd', 0.428), ('histograms', 0.348), ('kernels', 0.259), ('kt', 0.204), ('kernel', 0.172), ('def', 0.165), ('partitions', 0.163), ('cuturi', 0.149), ('kp', 0.147), ('partition', 0.138), ('hierarchy', 0.136), ('resolutions', 0.129), ('vert', 0.121), ('objects', 0.11), ('coarsest', 0.108), ('ti', 0.104), ('families', 0.103), ('similarities', 0.093), ('adequate', 0.092), ('nest', 0.086), ('texts', 0.079), ('coarse', 0.077), ('kti', 0.075), ('matsuda', 0.075), ('family', 0.074), ('ku', 0.073), ('darrell', 0.073), ('grauman', 0.073), ('ml', 0.065), ('lebanon', 0.065), ('enumerated', 0.065), ('upperbounded', 0.065), ('images', 0.065), ('averaging', 0.065), ('notably', 0.063), ('base', 0.063), ('histogram', 0.062), ('hein', 0.06), ('branching', 0.06), ('bags', 0.06), ('haussler', 0.06), ('coarser', 0.059), ('color', 0.057), ('nested', 0.056), ('kondor', 0.055), ('image', 0.055), ('bousquet', 0.053), ('jebara', 0.052), ('labels', 0.05), ('indexed', 0.05), ('akutsu', 0.05), ('catoni', 0.05), ('granular', 0.05), ('minato', 0.05), ('nagaoka', 0.05), ('saigo', 0.05), ('siblings', 0.05), ('triadic', 0.05), ('ks', 0.049), ('convolution', 0.049), ('chapelle', 0.049), ('regarded', 0.046), ('et', 0.046), ('protein', 0.044), ('letters', 0.043), ('hilbertian', 0.043), ('fukumizu', 0.043), ('cardinal', 0.043), ('sibling', 0.043), ('hellinger', 0.043), ('jd', 0.043), ('quanti', 0.042), ('components', 0.042), ('contexts', 0.041), ('factorization', 0.041), ('meaningful', 0.041), ('similarity', 0.04), ('structured', 0.04), ('considering', 0.04), ('index', 0.04), ('spans', 0.039), ('corel', 0.039), ('hierarchies', 0.039), ('retrieval', 0.038), ('elementary', 0.038), ('colors', 0.038), ('contextual', 0.037), ('sonnenburg', 0.037), ('lafferty', 0.037), ('words', 0.036), ('aggregates', 0.035), ('leslie', 0.035), ('tsuda', 0.035), ('endowed', 0.035), ('detect', 0.035), ('measures', 0.034), ('grouped', 0.034), ('alignment', 0.033), ('tokyo', 0.033), ('grammatical', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 103 nips-2006-Kernels on Structured Objects Through Nested Histograms

Author: Marco Cuturi, Kenji Fukumizu

2 0.15085962 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

Author: Hariharan Narayanan, Mikhail Belkin, Partha Niyogi

Abstract: One of the intuitions underlying many graph-based methods for clustering and semi-supervised learning, is that class or cluster boundaries pass through areas of low probability density. In this paper we provide some formal analysis of that notion for a probability distribution. We introduce a notion of weighted boundary volume, which measures the length of the class/cluster boundary weighted by the density of the underlying probability distribution. We show that sizes of the cuts of certain commonly used data adjacency graphs converge to this continuous weighted volume of the boundary. keywords: Clustering, Semi-Supervised Learning 1

3 0.14323731 115 nips-2006-Learning annotated hierarchies from relational data

Author: Daniel M. Roy, Charles Kemp, Vikash K. Mansinghka, Joshua B. Tenenbaum

Abstract: The objects in many real-world domains can be organized into hierarchies, where each internal node picks out a category of objects. Given a collection of features and relations deﬁned over a set of objects, an annotated hierarchy includes a speciﬁcation of the categories that are most useful for describing each individual feature and relation. We deﬁne a generative model for annotated hierarchies and the features and relations that they describe, and develop a Markov chain Monte Carlo scheme for learning annotated hierarchies. We show that our model discovers interpretable structure in several real-world data sets.

4 0.13121384 65 nips-2006-Denoising and Dimension Reduction in Feature Space

Author: Mikio L. Braun, Klaus-Robert Müller, Joachim M. Buhmann

Abstract: We show that the relevant information about a classiﬁcation problem in feature space is contained up to negligible error in a ﬁnite number of leading kernel PCA components if the kernel matches the underlying learning problem. Thus, kernels not only transform data sets such that good generalization can be achieved even by linear discriminant functions, but this transformation is also performed in a manner which makes economic use of feature space dimensions. In the best case, kernels provide efﬁcient implicit representations of the data to perform classiﬁcation. Practically, we propose an algorithm which enables us to recover the subspace and dimensionality relevant for good classiﬁcation. Our algorithm can therefore be applied (1) to analyze the interplay of data set and kernel in a geometric fashion, (2) to help in model selection, and to (3) de-noise in feature space in order to yield better classiﬁcation results. 1

5 0.11918677 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

Author: Yonatan Amit, Shai Shalev-shwartz, Yoram Singer

Abstract: We describe and analyze an algorithmic framework for online classiﬁcation where each online trial consists of multiple prediction tasks that are tied together. We tackle the problem of updating the online hypothesis by deﬁning a projection problem in which each prediction task corresponds to a single linear constraint. These constraints are tied together through a single slack parameter. We then introduce a general method for approximately solving the problem by projecting simultaneously and independently on each constraint which corresponds to a prediction sub-problem, and then averaging the individual solutions. We show that this approach constitutes a feasible, albeit not necessarily optimal, solution for the original projection problem. We derive concrete simultaneous projection schemes and analyze them in the mistake bound model. We demonstrate the power of the proposed algorithm in experiments with online multiclass text categorization. Our experiments indicate that a combination of class-dependent features with the simultaneous projection method outperforms previously studied algorithms. 1

6 0.11020688 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

7 0.095257983 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

8 0.085481636 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions

9 0.083739288 66 nips-2006-Detecting Humans via Their Pose

10 0.082733646 77 nips-2006-Fast Computation of Graph Kernels

11 0.082443103 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

12 0.080413289 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

13 0.078039549 34 nips-2006-Approximate Correspondences in High Dimensions

14 0.077796146 3 nips-2006-A Complexity-Distortion Approach to Joint Pattern Alignment

15 0.075410895 108 nips-2006-Large Scale Hidden Semi-Markov SVMs

16 0.074847654 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

17 0.073064677 175 nips-2006-Simplifying Mixture Models through Function Approximation

18 0.071065664 45 nips-2006-Blind Motion Deblurring Using Image Statistics

19 0.070130154 82 nips-2006-Gaussian and Wishart Hyperkernels

20 0.067863241 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.222), (1, 0.078), (2, 0.048), (3, 0.053), (4, 0.051), (5, 0.046), (6, -0.09), (7, -0.062), (8, 0.039), (9, -0.099), (10, -0.034), (11, 0.063), (12, 0.037), (13, -0.093), (14, 0.035), (15, -0.04), (16, -0.099), (17, 0.023), (18, -0.116), (19, 0.004), (20, -0.202), (21, 0.018), (22, -0.18), (23, -0.091), (24, 0.126), (25, 0.064), (26, 0.01), (27, -0.02), (28, 0.019), (29, 0.075), (30, 0.006), (31, -0.011), (32, -0.099), (33, -0.058), (34, 0.107), (35, -0.092), (36, 0.262), (37, -0.056), (38, -0.079), (39, -0.029), (40, -0.025), (41, -0.084), (42, 0.101), (43, -0.012), (44, -0.134), (45, 0.074), (46, 0.097), (47, 0.037), (48, -0.027), (49, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94767797 103 nips-2006-Kernels on Structured Objects Through Nested Histograms

Author: Marco Cuturi, Kenji Fukumizu

2 0.55409276 177 nips-2006-Sparse Kernel Orthonormalized PLS for feature extraction in large data sets

Author: Jerónimo Arenas-garcía, Kaare B. Petersen, Lars K. Hansen

Abstract: In this paper we are presenting a novel multivariate analysis method. Our scheme is based on a novel kernel orthonormalized partial least squares (PLS) variant for feature extraction, imposing sparsity constrains in the solution to improve scalability. The algorithm is tested on a benchmark of UCI data sets, and on the analysis of integrated short-time music features for genre prediction. The upshot is that the method has strong expressive power even with rather few features, is clearly outperforming the ordinary kernel PLS, and therefore is an appealing method for feature extraction of labelled data. 1

3 0.5505439 82 nips-2006-Gaussian and Wishart Hyperkernels

Author: Risi Kondor, Tony Jebara

Abstract: We propose a new method for constructing hyperkenels and deﬁne two promising special cases that can be computed in closed form. These we call the Gaussian and Wishart hyperkernels. The former is especially attractive in that it has an interpretable regularization scheme reminiscent of that of the Gaussian RBF kernel. We discuss how kernel learning can be used not just for improving the performance of classiﬁcation and regression methods, but also as a stand-alone algorithm for dimensionality reduction and relational or metric learning. 1

4 0.48333645 115 nips-2006-Learning annotated hierarchies from relational data

Author: Daniel M. Roy, Charles Kemp, Vikash K. Mansinghka, Joshua B. Tenenbaum

5 0.46112201 55 nips-2006-Computation of Similarity Measures for Sequential Data using Generalized Suffix Trees

Author: Konrad Rieck, Pavel Laskov, Sören Sonnenburg

Abstract: We propose a generic algorithm for computation of similarity measures for sequential data. The algorithm uses generalized sufﬁx trees for efﬁcient calculation of various kernel, distance and non-metric similarity functions. Its worst-case run-time is linear in the length of sequences and independent of the underlying embedding language, which can cover words, k-grams or all contained subsequences. Experiments with network intrusion detection, DNA analysis and text processing applications demonstrate the utility of distances and similarity coefﬁcients for sequences as alternatives to classical kernel functions.

6 0.45780241 142 nips-2006-Mutagenetic tree Fisher kernel improves prediction of HIV drug resistance from viral genotype

7 0.44922245 63 nips-2006-Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

8 0.44023791 151 nips-2006-On the Relation Between Low Density Separation, Spectral Clustering and Graph Cuts

9 0.41818139 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

10 0.41763821 45 nips-2006-Blind Motion Deblurring Using Image Statistics

11 0.38507736 65 nips-2006-Denoising and Dimension Reduction in Feature Space

12 0.38417482 78 nips-2006-Fast Discriminative Visual Codebooks using Randomized Clustering Forests

13 0.38321385 169 nips-2006-Relational Learning with Gaussian Processes

14 0.36960819 77 nips-2006-Fast Computation of Graph Kernels

15 0.3580977 170 nips-2006-Robotic Grasping of Novel Objects

16 0.35762027 135 nips-2006-Modelling transcriptional regulation using Gaussian Processes

17 0.3444235 6 nips-2006-A Kernel Subspace Method by Stochastic Realization for Learning Nonlinear Dynamical Systems

18 0.3433117 84 nips-2006-Generalized Regularized Least-Squares Learning with Predefined Features in a Hilbert Space

19 0.33669791 94 nips-2006-Image Retrieval and Classification Using Local Distance Functions

20 0.33341765 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.078), (3, 0.014), (7, 0.061), (9, 0.026), (20, 0.016), (22, 0.038), (44, 0.047), (57, 0.062), (65, 0.543), (69, 0.014), (90, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94517004 103 nips-2006-Kernels on Structured Objects Through Nested Histograms

Author: Marco Cuturi, Kenji Fukumizu

2 0.89511287 146 nips-2006-No-regret Algorithms for Online Convex Programs

Author: Geoffrey J. Gordon

Abstract: Online convex programming has recently emerged as a powerful primitive for designing machine learning algorithms. For example, OCP can be used for learning a linear classiﬁer, dynamically rebalancing a binary search tree, ﬁnding the shortest path in a graph with unknown edge lengths, solving a structured classiﬁcation problem, or ﬁnding a good strategy in an extensive-form game. Several researchers have designed no-regret algorithms for OCP. But, compared to algorithms for special cases of OCP such as learning from expert advice, these algorithms are not very numerous or ﬂexible. In learning from expert advice, one tool which has proved particularly valuable is the correspondence between no-regret algorithms and convex potential functions: by reasoning about these potential functions, researchers have designed algorithms with a wide variety of useful guarantees such as good performance when the target hypothesis is sparse. Until now, there has been no such recipe for the more general OCP problem, and therefore no ability to tune OCP algorithms to take advantage of properties of the problem or data. In this paper we derive a new class of no-regret learning algorithms for OCP. These Lagrangian Hedging algorithms are based on a general class of potential functions, and are a direct generalization of known learning rules like weighted majority and external-regret matching. In addition to proving regret bounds, we demonstrate our algorithms learning to play one-card poker. 1

3 0.88755387 156 nips-2006-Ordinal Regression by Extended Binary Classification

Author: Ling Li, Hsuan-tien Lin

Abstract: We present a reduction framework from ordinal regression to binary classiﬁcation based on extended examples. The framework consists of three steps: extracting extended examples from the original examples, learning a binary classiﬁer on the extended examples with any binary classiﬁcation algorithm, and constructing a ranking rule from the binary classiﬁer. A weighted 0/1 loss of the binary classiﬁer would then bound the mislabeling cost of the ranking rule. Our framework allows not only to design good ordinal regression algorithms based on well-tuned binary classiﬁcation approaches, but also to derive new generalization bounds for ordinal regression from known bounds for binary classiﬁcation. In addition, our framework uniﬁes many existing ordinal regression algorithms, such as perceptron ranking and support vector ordinal regression. When compared empirically on benchmark data sets, some of our newly designed algorithms enjoy advantages in terms of both training speed and generalization performance over existing algorithms, which demonstrates the usefulness of our framework. 1

4 0.87893826 170 nips-2006-Robotic Grasping of Novel Objects

Author: Ashutosh Saxena, Justin Driemeyer, Justin Kearns, Andrew Y. Ng

Abstract: We consider the problem of grasping novel objects, speciﬁcally ones that are being seen for the ﬁrst time through vision. We present a learning algorithm that neither requires, nor tries to build, a 3-d model of the object. Instead it predicts, directly as a function of the images, a point at which to grasp the object. Our algorithm is trained via supervised learning, using synthetic images for the training set. We demonstrate on a robotic manipulation platform that this approach successfully grasps a wide variety of objects, such as wine glasses, duct tape, markers, a translucent box, jugs, knife-cutters, cellphones, keys, screwdrivers, staplers, toothbrushes, a thick coil of wire, a strangely shaped power horn, and others, none of which were seen in the training set. 1

5 0.87047642 102 nips-2006-Kernel Maximum Entropy Data Transformation and an Enhanced Spectral Clustering Algorithm

Author: Robert Jenssen, Torbjørn Eltoft, Mark Girolami, Deniz Erdogmus

Abstract: We propose a new kernel-based data transformation technique. It is founded on the principle of maximum entropy (MaxEnt) preservation, hence named kernel MaxEnt. The key measure is Renyi’s entropy estimated via Parzen windowing. We show that kernel MaxEnt is based on eigenvectors, and is in that sense similar to kernel PCA, but may produce strikingly different transformed data sets. An enhanced spectral clustering algorithm is proposed, by replacing kernel PCA by kernel MaxEnt as an intermediate step. This has a major impact on performance.

6 0.56875336 203 nips-2006-implicit Online Learning with Kernels

7 0.54191458 61 nips-2006-Convex Repeated Games and Fenchel Duality

8 0.53461206 79 nips-2006-Fast Iterative Kernel PCA

9 0.53381771 26 nips-2006-An Approach to Bounded Rationality

10 0.52502203 115 nips-2006-Learning annotated hierarchies from relational data

11 0.5047431 82 nips-2006-Gaussian and Wishart Hyperkernels

12 0.50137889 152 nips-2006-Online Classification for Complex Problems Using Simultaneous Projections

13 0.48943546 123 nips-2006-Learning with Hypergraphs: Clustering, Classification, and Embedding

14 0.48739859 117 nips-2006-Learning on Graph with Laplacian Regularization

15 0.48661989 65 nips-2006-Denoising and Dimension Reduction in Feature Space

16 0.48214722 83 nips-2006-Generalized Maximum Margin Clustering and Unsupervised Kernel Learning

17 0.479224 163 nips-2006-Prediction on a Graph with a Perceptron

18 0.47833061 109 nips-2006-Learnability and the doubling dimension

19 0.47788283 150 nips-2006-On Transductive Regression

20 0.47775263 125 nips-2006-Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning