nips nips2002 nips2002-114 knowledge-graph by maker-knowledge-mining

114 nips-2002-Information Regularization with Partially Labeled Data

Source: pdf

Author: Martin Szummer, Tommi S. Jaakkola

Abstract: Classiﬁcation with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (y|x) beyond a few available labeled examples. We formulate a regularization approach to linking the marginal and the conditional in a general way. The regularization penalty measures the information that is implied about the labels over covering regions. No parametric assumptions are required and the approach remains tractable even for continuous marginal densities P (x). We develop algorithms for solving the regularization problem for ﬁnite covers, establish a limiting differential equation, and exemplify the behavior of the new regularization approach in simple cases.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Classiﬁcation with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (y|x) beyond a few available labeled examples. [sent-5, score-0.876]

2 We formulate a regularization approach to linking the marginal and the conditional in a general way. [sent-6, score-0.572]

3 The regularization penalty measures the information that is implied about the labels over covering regions. [sent-7, score-0.504]

4 No parametric assumptions are required and the approach remains tractable even for continuous marginal densities P (x). [sent-8, score-0.347]

5 We develop algorithms for solving the regularization problem for ﬁnite covers, establish a limiting differential equation, and exemplify the behavior of the new regularization approach in simple cases. [sent-9, score-0.689]

6 To beneﬁt from such examples, we must exploit either implicitly or explicitly the link between the marginal density P (x) over examples x and the conditional P (y|x) representing the decision boundary for the labels y. [sent-11, score-0.51]

7 High density regions or clusters in the data, for example, can be expected to fall solely in one or another class. [sent-12, score-0.385]

8 Most discriminative methods do not attempt to explicitly model or incorporate information from the marginal density P (x). [sent-13, score-0.346]

9 However, many discriminative algorithms such as SVMs exploit the notion of margin that effectively relates P (x) to P (y|x); the decision boundary is biased to fall preferentially in low density regions of P (x) so that only a few points fall within the margin band. [sent-14, score-0.623]

10 In this paper we appeal to information theory to explicitly constrain P (y|x) on the basis of P (x) in a regularization framework. [sent-16, score-0.351]

11 65 + + – – + – 1 – + – + + – 1 Figure 1: Mutual information I(x; y) measured in bits for four regions with different conﬁgurations of labels y= {+,-}. [sent-20, score-0.428]

12 The marginal P (x) is discrete and uniform across the points. [sent-21, score-0.283]

13 The mutual information is low when the labels are homogenous in the region, and high when labels vary. [sent-22, score-0.383]

14 The mutual information is invariant to the spatial conﬁguration of points within the neighborhood. [sent-23, score-0.338]

15 2 Information Regularization We begin by showing how to regularize a small region of the domain X . [sent-24, score-0.34]

16 We will subsequently cover the domain (or any chosen subset) with multiple small regions, and describe criteria that ensure regularization of the whole domain on the basis of the individual regions. [sent-25, score-0.542]

17 1 Regularizing a Single Region Consider a small contiguous region Q in the domain X (e. [sent-27, score-0.254]

18 We will regularize the conditional probability P (y|x) by penalizing the amount of information the conditionals imply about the labels within the region. [sent-30, score-0.634]

19 The regularizer is a function of both P (y|x) and P (x), and will penalize changes in P (y|x) more in regions with high P (x). [sent-31, score-0.42]

20 Let L be the set of labeled points (size N L ) and L ∪ U be the set of labeled and unlabeled points (size NLU ). [sent-32, score-0.82]

21 The marginal P (x) is assumed to be given, and may be available directly in terms of a continuous density, or as an empirical density P (x) = 1/NLU · i∈L∪U δ(x − xi ) corresponding to a set of points {xi } that may not have labels (δ(·) is the Dirac delta function integrating to 1). [sent-33, score-0.59]

22 As a measure of information, we employ mutual information [5], which is the average number of bits that x contains about the label in region Q (see Figure 1. [sent-34, score-0.499]

23 ) The measure depends both on the marginal density P (x) (speciﬁcally its restriction to x ∈ Q namely P (x|Q) = P (x)/ Q P (x) dx) and the conditional P (y|x). [sent-35, score-0.346]

25 The densities conditioned on Q are normalized to integrate to 1 within the region Q. [sent-39, score-0.273]

26 Note that the mutual information is invariant to permutations of the elements of X within Q, which suggests that the regions must be small enough to preserve locality. [sent-40, score-0.554]

27 The regularization penalty has to further scale with the number of points in the region (or the probability mass). [sent-41, score-0.664]

28 The mutual information IQ (x; y) measures the information per point, and to obtain the total mutual information contained in a region, we must multiply by the probability mass MQ . [sent-44, score-0.587]

29 The regularization will be stronger in regions with high P (x). [sent-45, score-0.553]

30 When the region is small, then the marginal will be close to uniform over the region and V Q ∝ R2 , where R is, e. [sent-48, score-0.63]

31 2 Limiting Behavior for Vanishing Size Regions When the size of the region is scaled down, the mutual information will go to zero for any continuous P (y|x). [sent-54, score-0.537]

32 We derive here the appropriate regularization penalty in the limit of vanishing regions. [sent-55, score-0.394]

33 Within a small region Q we can (under mild continuity assumptions) approximate P (y|x) by a Taylor expansion around the mean point x0 ∈ Q, obtaining P (y|Q) ≈ P (y|x0 ) to ﬁrst order. [sent-57, score-0.241]

34 The regularization penalty should not scale with the resolution at which we penalize information and we thus divide out the size-dependent part. [sent-59, score-0.425]

35 3 Regularizing the Domain We want to regularize the conditional P (y|x) across the domain X (or any subset of interest). [sent-63, score-0.252]

36 Since individual regions must be relatively small to preserve locality, we need multiple regions to cover the domain. [sent-64, score-0.764]

37 Since the regularization penalty is assigned to each region, the regions must overlap to ensure that the conditionals in different regions become functionally dependent. [sent-66, score-1.314]

38 In general all areas with signiﬁcant marginal density P (x) should be included in the cover or will not be regularized (areas of zero marginal need not be considered). [sent-68, score-0.614]

39 The cover should generally be connected (with respect to neighborhood relations of the regions) so that labeled points have potential to inﬂuence all conditionals. [sent-69, score-0.499]

40 The amount of overlap between any two regions in the cover determines how strongly the corresponding conditionals are tied to each other. [sent-70, score-0.797]

41 On the other hand, the regions should be small to preserve locality. [sent-71, score-0.317]

42 The limit of a large number of small overlapping regions can be deﬁned, and we ensure continuity of P (y|x) when the offset between regions vanishes relative to their size (in all dimensions). [sent-72, score-0.745]

43 3 Classiﬁcation with Information Regularization Information regularization across multiple regions can be performed, for example, by minimizing the maximum information per region, subject to correct classiﬁcation of the labeled points. [sent-73, score-0.851]

44 Speciﬁcally, we constrain each region in the cover (Q ∈ C) to carry at most γ units of information. [sent-74, score-0.388]

45 0 ≤ P (y|xk ) ≤ 1, y (3a) (3b) (3c) (3d) We have incorporated the labeled points by constraining their conditionals to the observed values (eq. [sent-78, score-0.692]

46 The solution P (y|x) to this optimization problem is unique in regions that achieve the information constraint with equality (as long as P (x) > 0). [sent-80, score-0.356]

47 Deﬁne an atomic subregion as a non-empty intersection of regions that cannot be further intersected by any region (Figure 2). [sent-82, score-0.848]

48 All unlabeled points in an atomic subregion belong to the same set of regions, and therefore participate in exactly the same constraints. [sent-83, score-0.646]

49 They will be regularized the same way, and since mutual information is a convex function, it will be minimized when the conditionals P (y|x) are equal in the atomic subregion. [sent-84, score-0.849]

50 We can therefore parsimoniously represent conditionals of atomic subregions, instead of individual points, merely by treating such atomic subregions as “merged points” and weighting the associated constraint by the probability mass contained in the subregion. [sent-85, score-1.215]

51 1 Incorporating Noisy Labels Labeled points participate in the information regularization in the same way as unlabeled points. [sent-87, score-0.596]

52 However, their conditionals have additional constraints, which incorporate the label information. [sent-88, score-0.391]

53 In equation 3c we used the constraint P (y|xk ) = δ(y, yk ) for all labeled ˜ points. [sent-89, score-0.378]

54 This constraint does not permit noise in the labels (and cannot be used when two points at the same location have disagreeing labels. [sent-90, score-0.254]

55 The conditionals of the labeled points are directly determined by their labels, and are treated as ﬁxed constants. [sent-96, score-0.692]

56 5, the thresholded conditional classiﬁes labeled points in the observed class. [sent-98, score-0.442]

57 In constraint (exp-lbl), the conditionals for labeled points can have an average error at most b, where the averaged is over all labeled points. [sent-99, score-0.986]

58 Thus, a few points may have conditionals that deviate signiﬁcantly from their observed labels, giving robustness against mislabeled points and outliers. [sent-100, score-0.591]

59 2 Continuous Densities Information regularization is also computationally feasible for continuous marginal densities, known or estimated. [sent-104, score-0.557]

60 For example, we may be given a continuous unlabeled data distribution P (x) and a few discrete labeled points, and regularize across a ﬁnite set of covering regions. [sent-105, score-0.636]

61 The conditionals are uniform inside atomic subregions (except at labeled points), requiring estimates of only a ﬁnite number of conditionals. [sent-106, score-1.079]

62 3 Implementation Firstly, we choose appropriate regions forming a cover, and ﬁnd the atomic subregions. [sent-108, score-0.555]

63 If the data is all discrete, create a spherical region centered at every labeled and unlabeled point (or over some reduced set still covering all the points). [sent-111, score-0.626]

64 We have used regions of ﬁxed radius R, but the radius could also be set adaptively at each point to the distance of its Knearest neighbor. [sent-112, score-0.373]

65 The union of such regions is our cover, and we choose the radius R (or K) large enough to create a connected cover. [sent-113, score-0.351]

66 The cover induces a set of atomic subregions, and we merge the parameters P (y|x) of points inside individual atomic subregions (atomic subregions with no observed points can be ignored). [sent-114, score-1.366]

67 The marginal of each atomic subregion is proportional to the number of (merged) points it contains. [sent-115, score-0.677]

68 If continuous marginals are given, they will put probability mass in all atomic subregions where the marginal is non-zero. [sent-116, score-0.9]

69 To avoid considering an exponential number of subregions, we can limit the overlap between the regions by creating a sparser cover. [sent-117, score-0.37]

70 Given the cover, we now regularize the conditionals P (y|x) in the regions, according to eq. [sent-118, score-0.423]

71 This is a convex minimization problem with a global minimum, since mutual information is convex in P (y|x). [sent-120, score-0.259]

72 3a, the required gradients of the constraints for the binary class (y = {±1}) case (region Q, atomic subregion r) are: MQ dIQ (x; y) MQ P (y = 1|xr ) P (y = −1|Q) = P (xr |Q) log VQ dP (y = 1|xr ) VQ P (y = −1|xr ) P (y = 1|Q) . [sent-123, score-0.401]

73 4 Minimize Average Information An alternative regularization criterion minimizes the average mutual information across regions. [sent-126, score-0.572]

74 When calculating the average, we must correct for the overlaps of intersecting regions to avoid doublecounting (in contrast, the previous regularization criterion (eq. [sent-127, score-0.704]

75 3b) avoided doublecounting by restricting information in each region individually). [sent-128, score-0.297]

76 The inﬂuence of a region is proportional to the probability mass MQ contained in it. [sent-129, score-0.308]

77 We can then minimize average mutual information according to min P (y|xk ) Q ∗ MQ IQ (x; y) VQ (5a) s. [sent-132, score-0.283]

78 1 Limiting Behavior The above average information criterion is a discrete version of a continuous regularization criterion. [sent-138, score-0.497]

79 In the limit of a large number of small regions in the cover (where the spacing of the regions vanishes relative to their size), we obtain a well-deﬁned regularization criterion resulting in continuous P (y|x): min P (y|x) s. [sent-139, score-1.205]

80 More generally, we can formulate the regularization problem as a Tikhonov regularization, where the loss is the negative log-probability of labels: 1 P (y|x) NL min − log P (˜k |xk ) + λ y P (x0 )P (y|x0 ) y k∈L d log P (y|x) dx 2 dx0 . [sent-144, score-0.509]

81 ∂P (y = 1|x) dx ∂P (y = 1|x) (8) (natural boundary conditions apply since we can assume P (x) = 0 and P (y|x) = 0 at the boundary of the domain X ). [sent-151, score-0.24]

82 y 4 Results and Discussion We have experimentally studied the behavior of the regularizer with different marginal densities P (x). [sent-159, score-0.362]

83 Figure 3 shows the one-dimensional case with a continuous marginal density 1. [sent-160, score-0.354]

84 6 P(y|x) P(x) labeled points Posterior P(y|x) 1. [sent-161, score-0.355]

85 5 1 Figure 2: (Left) Three intersecting regions, and their atomic subregions (numbered). [sent-169, score-0.529]

86 P (y|x) for unlabeled points will be constant in atomic subregions. [sent-170, score-0.489]

87 Figure 3: (Right) The conditional (solid line) for a continuous marginal P (x) (dotted line) consisting of a mixture of two continuous Gaussian and two labeled points at (x=-0. [sent-171, score-0.818]

88 The row of circles at the top depicts the region structure used (a rendering of overlapping one-dimensional intervals. [sent-174, score-0.251]

89 5 1 Figure 4: Conditionals (solid lines) for two continuous marginals (dotted lines) plus two labeled points. [sent-187, score-0.405]

90 Left: the marginal is uniform, and the conditional approaches a straight line. [sent-188, score-0.273]

91 ) The conditional changes slowly in regions of high density. [sent-190, score-0.364]

92 (mixture of two Gaussians), and two discrete labeled points. [sent-191, score-0.263]

93 We choose N Q =40 regions centered at uniform intervals of [−1, 1], overlapping each other half-way, creating N Q + 1 atomic subregions. [sent-192, score-0.655]

94 The conditional varies smoothly between the labeled points of opposite classes. [sent-197, score-0.442]

95 Note the dependence on the marginal density P (x). [sent-198, score-0.259]

96 5 2 Figure 5: Conditionals for two other continuous marginals plus two labeled points (marked as crosses and located at x=-1, 2 in the left ﬁgure and x=-2, 2 in the right), solved via the differential equation (eq. [sent-228, score-0.644]

97 The conditionals are continuous but non-differentiable at the two labeled points (marked as crosses). [sent-230, score-0.787]

98 5 Conclusion We have presented an information theoretic regularization framework for combining conditional and marginal densities in a semi-supervised estimation setting. [sent-231, score-0.653]

99 The shape and structure of the regions give direct ways of imposing relations between particular variables or values of those variables. [sent-236, score-0.301]

100 The regions can be easily deﬁned on low-dimensional data manifolds. [sent-237, score-0.277]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('conditionals', 0.337), ('regions', 0.277), ('regularization', 0.276), ('atomic', 0.252), ('xk', 0.236), ('mq', 0.232), ('subregions', 0.232), ('labeled', 0.228), ('region', 0.207), ('marginal', 0.186), ('mutual', 0.173), ('cover', 0.144), ('vq', 0.143), ('iq', 0.134), ('points', 0.127), ('subregion', 0.112), ('unlabeled', 0.11), ('continuous', 0.095), ('dx', 0.089), ('conditional', 0.087), ('regularizer', 0.086), ('yk', 0.086), ('regularize', 0.086), ('labels', 0.086), ('xr', 0.076), ('tommi', 0.076), ('mass', 0.076), ('density', 0.073), ('szummer', 0.067), ('densities', 0.066), ('var', 0.061), ('differential', 0.059), ('marginals', 0.059), ('penalize', 0.057), ('penalty', 0.054), ('limiting', 0.054), ('boundary', 0.052), ('bfgs', 0.052), ('doublecounting', 0.052), ('covering', 0.05), ('radius', 0.048), ('domain', 0.047), ('min', 0.047), ('intersecting', 0.045), ('participate', 0.045), ('lab', 0.044), ('overlapping', 0.044), ('constraint', 0.041), ('preserve', 0.04), ('overlap', 0.039), ('calculus', 0.038), ('information', 0.038), ('constrain', 0.037), ('log', 0.037), ('merged', 0.036), ('vanishing', 0.036), ('jaakkola', 0.035), ('fall', 0.035), ('discrete', 0.035), ('fisher', 0.035), ('continuity', 0.034), ('regularizing', 0.034), ('eq', 0.034), ('sloan', 0.033), ('vanishes', 0.033), ('across', 0.032), ('spherical', 0.031), ('crosses', 0.03), ('uniform', 0.03), ('classi', 0.029), ('martin', 0.029), ('label', 0.029), ('ai', 0.029), ('limit', 0.028), ('criterion', 0.028), ('ensure', 0.028), ('bits', 0.027), ('partitioning', 0.027), ('must', 0.026), ('creating', 0.026), ('bottleneck', 0.026), ('choose', 0.026), ('average', 0.025), ('marked', 0.025), ('entropy', 0.025), ('regularized', 0.025), ('incorporate', 0.025), ('contained', 0.025), ('size', 0.024), ('behavior', 0.024), ('convex', 0.024), ('shape', 0.024), ('adjusted', 0.024), ('discriminative', 0.024), ('formulate', 0.023), ('substituting', 0.023), ('posterior', 0.023), ('xi', 0.023), ('equation', 0.023), ('plus', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 114 nips-2002-Information Regularization with Partially Labeled Data

Author: Martin Szummer, Tommi S. Jaakkola

2 0.1378981 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

Author: Harald Steck, Tommi S. Jaakkola

Abstract: A common objective in learning a model from data is to recover its network structure, while the model parameters are of minor interest. For example, we may wish to recover regulatory networks from high-throughput data sources. In this paper we examine how Bayesian regularization using a product of independent Dirichlet priors over the model parameters affects the learned model structure in a domain with discrete variables. We show that a small scale parameter - often interpreted as

3 0.13627708 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

Author: Olivier Chapelle, Jason Weston, Bernhard SchĂślkopf

Abstract: We propose a framework to incorporate unlabeled data in kernel classifier, based on the idea that two points in the same cluster are more likely to have the same label. This is achieved by modifying the eigenspectrum of the kernel matrix. Experimental results assess the validity of this approach. 1

4 0.10979089 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

Abstract: We propose a framework for classiﬁer design based on discriminative densities for representation of the differences of the class-conditional distributions in a way that is optimal for classiﬁcation. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization of the contrast is equivalent to minimization of an approximation of the Bayes risk. Therefore using suitable classes of probability density functions, the resulting maximum contrast classiﬁers (MCCs) can approximate the Bayes rule for the general multiclass case. In particular for a certain parametrization of the density functions we obtain MCCs which have the same functional form as the well-known Support Vector Machines (SVMs). We show that MCC-training in general requires some nonlinear optimization but under certain conditions the problem is concave and can be tackled by a single linear program. We indicate the close relation between SVM- and MCC-training and in particular we show that Linear Programming Machines can be viewed as an approximate realization of MCCs. In the experiments on benchmark data sets, the MCC shows a competitive classiﬁcation performance.

5 0.1094821 124 nips-2002-Learning Graphical Models with Mercer Kernels

Author: Francis R. Bach, Michael I. Jordan

Abstract: We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.

6 0.094014652 135 nips-2002-Learning with Multiple Labels

7 0.079017311 89 nips-2002-Feature Selection by Maximum Marginal Diversity

8 0.073449291 64 nips-2002-Data-Dependent Bounds for Bayesian Mixture Methods

9 0.071272708 93 nips-2002-Forward-Decoding Kernel-Based Phone Recognition

10 0.068653122 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers

11 0.066936061 106 nips-2002-Hyperkernels

12 0.066341504 19 nips-2002-Adapting Codes and Embeddings for Polychotomies

13 0.063905545 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

14 0.063631296 53 nips-2002-Clustering with the Fisher Score

15 0.061909124 150 nips-2002-Multiple Cause Vector Quantization

16 0.061691642 151 nips-2002-Multiplicative Updates for Nonnegative Quadratic Programming in Support Vector Machines

17 0.058532022 90 nips-2002-Feature Selection in Mixture-Based Clustering

18 0.058464844 41 nips-2002-Bayesian Monte Carlo

19 0.058149431 83 nips-2002-Extracting Relevant Structures with Side Information

20 0.054747678 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.182), (1, -0.079), (2, 0.01), (3, 0.012), (4, 0.003), (5, 0.044), (6, -0.063), (7, -0.004), (8, -0.05), (9, 0.045), (10, 0.068), (11, -0.033), (12, 0.03), (13, -0.02), (14, -0.047), (15, -0.051), (16, -0.008), (17, 0.015), (18, 0.073), (19, 0.021), (20, 0.033), (21, 0.041), (22, -0.04), (23, -0.093), (24, 0.064), (25, -0.137), (26, -0.127), (27, 0.029), (28, -0.112), (29, 0.018), (30, -0.008), (31, -0.287), (32, -0.032), (33, -0.134), (34, 0.104), (35, -0.08), (36, 0.043), (37, 0.061), (38, 0.123), (39, 0.03), (40, 0.103), (41, 0.008), (42, 0.059), (43, -0.096), (44, 0.114), (45, -0.031), (46, 0.021), (47, 0.093), (48, -0.092), (49, -0.138)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97489864 114 nips-2002-Information Regularization with Partially Labeled Data

Author: Martin Szummer, Tommi S. Jaakkola

2 0.64828646 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

Author: Harald Steck, Tommi S. Jaakkola

3 0.524306 124 nips-2002-Learning Graphical Models with Mercer Kernels

Author: Francis R. Bach, Michael I. Jordan

4 0.45391804 178 nips-2002-Robust Novelty Detection with Single-Class MPM

Author: Laurent E. Ghaoui, Michael I. Jordan, Gert R. Lanckriet

Abstract: In this paper we consider the problem of novelty detection, presenting an algorithm that aims to find a minimal region in input space containing a fraction 0: of the probability mass underlying a data set. This algorithm- the

5 0.4531939 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

6 0.43866956 93 nips-2002-Forward-Decoding Kernel-Based Phone Recognition

7 0.4073956 29 nips-2002-Analysis of Information in Speech Based on MANOVA

8 0.40378034 138 nips-2002-Manifold Parzen Windows

9 0.37573999 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences

10 0.36348155 89 nips-2002-Feature Selection by Maximum Marginal Diversity

11 0.356837 41 nips-2002-Bayesian Monte Carlo

12 0.35419595 121 nips-2002-Knowledge-Based Support Vector Machine Classifiers

13 0.34884256 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks

14 0.33482367 52 nips-2002-Cluster Kernels for Semi-Supervised Learning

15 0.31717691 172 nips-2002-Recovering Articulated Model Topology from Observed Rigid Motion

16 0.31605795 53 nips-2002-Clustering with the Fisher Score

17 0.31375152 111 nips-2002-Independent Components Analysis through Product Density Estimation

18 0.31318289 117 nips-2002-Intrinsic Dimension Estimation Using Packing Numbers

19 0.31306693 135 nips-2002-Learning with Multiple Labels

20 0.31271744 190 nips-2002-Stochastic Neighbor Embedding

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.023), (23, 0.017), (42, 0.054), (54, 0.093), (55, 0.024), (57, 0.011), (68, 0.021), (74, 0.579), (92, 0.024), (98, 0.062)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98988843 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning

Author: Stella X. Yu, Ralph Gross, Jianbo Shi

Abstract: Segmentation and recognition have long been treated as two separate processes. We propose a mechanism based on spectral graph partitioning that readily combine the two processes into one. A part-based recognition system detects object patches, supplies their partial segmentations as well as knowledge about the spatial configurations of the object. The goal of patch grouping is to find a set of patches that conform best to the object configuration, while the goal of pixel grouping is to find a set of pixels that have the best low-level feature similarity. Through pixel-patch interactions and between-patch competition encoded in the solution space, these two processes are realized in one joint optimization problem. The globally optimal partition is obtained by solving a constrained eigenvalue problem. We demonstrate that the resulting object segmentation eliminates false positives for the part detection, while overcoming occlusion and weak contours for the low-level edge detection.

same-paper 2 0.97364444 114 nips-2002-Information Regularization with Partially Labeled Data

Author: Martin Szummer, Tommi S. Jaakkola

3 0.96669346 78 nips-2002-Efficient Learning Equilibrium

Author: Ronen I. Brafman, Moshe Tennenholtz

Abstract: We introduce efficient learning equilibrium (ELE), a normative approach to learning in non cooperative settings. In ELE, the learning algorithms themselves are required to be in equilibrium. In addition, the learning algorithms arrive at a desired value after polynomial time, and deviations from a prescribed ELE become irrational after polynomial time. We prove the existence of an ELE in the perfect monitoring setting, where the desired value is the expected payoff in a Nash equilibrium. We also show that an ELE does not always exist in the imperfect monitoring case. Yet, it exists in the special case of common-interest games. Finally, we extend our results to general stochastic games. 1

4 0.9636519 66 nips-2002-Developing Topography and Ocular Dominance Using Two aVLSI Vision Sensors and a Neurotrophic Model of Plasticity

Author: Terry Elliott, Jörg Kramer

Abstract: A neurotrophic model for the co-development of topography and ocular dominance columns in the primary visual cortex has recently been proposed. In the present work, we test this model by driving it with the output of a pair of neuronal vision sensors stimulated by disparate moving patterns. We show that the temporal correlations in the spike trains generated by the two sensors elicit the development of reﬁned topography and ocular dominance columns, even in the presence of signiﬁcant amounts of spontaneous activity and ﬁxed-pattern noise in the sensors.

5 0.95928389 90 nips-2002-Feature Selection in Mixture-Based Clustering

Author: Martin H. Law, Anil K. Jain, Mário Figueiredo

Abstract: There exist many approaches to clustering, but the important issue of feature selection, i.e., selecting the data attributes that are relevant for clustering, is rarely addressed. Feature selection for clustering is difﬁcult due to the absence of class labels. We propose two approaches to feature selection in the context of Gaussian mixture-based clustering. In the ﬁrst one, instead of making hard selections, we estimate feature saliencies. An expectation-maximization (EM) algorithm is derived for this task. The second approach extends Koller and Sahami’s mutual-informationbased feature relevance criterion to the unsupervised case. Feature selection is then carried out by a backward search scheme. This scheme can be classiﬁed as a “wrapper”, since it wraps mixture estimation in an outer layer that performs feature selection. Experimental results on synthetic and real data show that both methods have promising performance.

6 0.91019642 162 nips-2002-Parametric Mixture Models for Multi-Labeled Text

7 0.76880264 175 nips-2002-Reinforcement Learning to Play an Optimal Nash Equilibrium in Team Markov Games

8 0.73333853 39 nips-2002-Bayesian Image Super-Resolution

9 0.70521742 152 nips-2002-Nash Propagation for Loopy Graphical Games

10 0.70025104 89 nips-2002-Feature Selection by Maximum Marginal Diversity

11 0.69783133 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization

12 0.69673479 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture

13 0.69294596 182 nips-2002-Shape Recipes: Scene Representations that Refer to the Image

14 0.68895251 74 nips-2002-Dynamic Structure Super-Resolution

15 0.67662799 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search

16 0.66746759 173 nips-2002-Recovering Intrinsic Images from a Single Image

17 0.65053564 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering

18 0.64949715 135 nips-2002-Learning with Multiple Labels

19 0.64786601 2 nips-2002-A Bilinear Model for Sparse Coding

20 0.64237833 105 nips-2002-How to Combine Color and Shape Information for 3D Object Recognition: Kernels do the Trick