nips nips2004 nips2004-178 knowledge-graph by maker-knowledge-mining

178 nips-2004-Support Vector Classification with Input Data Uncertainty


Source: pdf

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract This paper investigates a new learning model in which the input data is corrupted with noise. [sent-8, score-0.191]

2 Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. [sent-10, score-0.414]

3 We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. [sent-11, score-0.179]

4 1 Introduction In the traditional formulation of supervised learning, we seek a predictor that maps input x to output y. [sent-13, score-0.254]

5 The predictor is constructed from a set of training examples {(xi , yi )}. [sent-14, score-0.38]

6 That is, the input data are not corrupted with noise; or even when noise is present in the data, its effect is ignored in the learning formulation. [sent-16, score-0.277]

7 Sampling errors, modeling errors and instrument errors may preclude the possibility of knowing the input data exactly. [sent-18, score-0.197]

8 For example, in the problem of classifying sentences from speech recognition outputs for call-routing applications, the speech recognition system may make errors so that the observed text is corrupted with noise. [sent-19, score-0.199]

9 Moreover, many systems can provide estimates for the reliability of their outputs, which measure how uncertain each element of the outputs is. [sent-22, score-0.128]

10 A plausible approach for dealing with noisy input is to use the standard learning formulation without modeling the underlying input uncertainty. [sent-24, score-0.429]

11 If we assume that the same noise is observed both in the training data and in the test data, then the noise will cause similar effects in the training and testing phases. [sent-25, score-0.226]

12 Based on this (non-rigorous) reasoning, one can argue that the issue of input noise may be ignored. [sent-26, score-0.167]

13 However, we show in this paper that by modeling input uncertainty, we can obtain more accurate predictors. [sent-27, score-0.135]

14 2 Statistical models for prediction problems with uncertain input Consider (xi , yi ), where xi is corrupted with noise. [sent-28, score-1.131]

15 The joint probability of (xi , xi , yi ) can be written as: p(xi , xi , yi ) = p(xi , yi |θ)p(xi |θ , σi , xi ). [sent-33, score-2.415]

16 The joint probability of (xi , yi ) is obtained by integrating out the unobserved quantity xi : p(xi , yi ) = p(xi , yi |θ)p(xi |θ , σi , xi )dxi . [sent-34, score-1.933]

17 This model can be considered as a mixture model where each mixture component corresponds to a possible true input xi not observed. [sent-35, score-0.652]

18 In this framework, the unknown parameter (θ, θ ) can be estimated from the data using the maximum-likelihood estimate as: max θ,θ ln p(xi , yi |θ, θ ) = max θ,θ i ln p(xi , yi |θ)p(xi |θ , σi , xi )dxi . [sent-36, score-1.276]

19 (1) i Although this is a principled approach under our data generation process, due to the integration over the unknown true input xi , it often leads to a very complicated formulation which is difficult to solve. [sent-37, score-0.731]

20 In this method, we simply regard each xi as a parameter of the probability model, so the maximum-likelihood becomes: max θ,θ ln sup[p(xi , yi |θ)p(xi |θ , σi , xi )]. [sent-41, score-1.455]

21 i (2) xi If our probability model is correctly specified, then (1) is the preferred formulation. [sent-42, score-0.509]

22 However in practice, we may not know the exact p(xi |θ , σi , xi ) (for example, we may not be able to estimate the level of uncertainty σi accurately). [sent-43, score-0.638]

23 Intuitively (1) and (2) have similar effects since large values of p(xi , yi |θ)p(xi |θ , σi , xi ) dominate the summation in p(xi , yi |θ)p(xi |θ , σi , xi )dxi . [sent-45, score-1.61]

24 That is, both methods prefer a parameter configuration such that the product p(xi , yi |θ)p(xi |θ , σi , xi ) is large for some xi . [sent-46, score-1.339]

25 If an observation xi is contaminated with large noise so that p(xi |θ , σi , xi ) has a flat shape, then we can pick a xi that is very different from xi which predicts yi well. [sent-47, score-2.491]

26 On the other hand, if an observation xi is contaminated with very small noise, then (1) and (2) penalize a parameter θ such that p(xi , yi |θ) is small. [sent-48, score-0.903]

27 We focus on discriminative modeling in this paper since it usually leads to better prediction performance. [sent-51, score-0.12]

28 In discriminative modeling, we assume that p(xi , yi |θ) has a form p(xi , yi |θ) = p(xi )p(yi |θ, xi ). [sent-52, score-1.142]

29 As an example, we consider regression problems with Gaussian noise: p(xi , yi |θ) ∼ p(xi ) exp − (θT xi − yi )2 2σ 2 , p(xi |θ , σi , xi ) ∼ exp − xi − x i 2 2σi 2 . [sent-53, score-2.156]

30 The method in (2) becomes inf θ = arg min θ i xi (θT xi − yi )2 xi − x i + 2 2σ 2 2σi 2 . [sent-54, score-1.894]

31 (3) This formulation is closely related (but not identical) to the so-called total least squares (TLS) method [6, 5]. [sent-55, score-0.292]

32 The motivation for total least squares is the same as what we consider in this paper: input data are contaminated with noise. [sent-56, score-0.305]

33 Unlike the statistical modeling approach we adopted in this section, the total least squares algorithm is derived from a numerical computation point of view. [sent-57, score-0.288]

34 The resulting formulation is similar to (3), but its solution can be conveniently described by a matrix SVD decomposition. [sent-58, score-0.175]

35 The method has been widely applied in engineering applications, and is known to give better performance than the standard least squares method for problems with uncertain inputs. [sent-59, score-0.273]

36 In our framework, we can regard (3) as the underlying statistical model for total least squares. [sent-60, score-0.141]

37 For binary classification where yi ∈ {±1}, we consider logistic conditional probability model for yi , while still assume Gaussian noise in the input: p(xi , yi |θ) ∼ p(xi ) 1 , 1 + exp(−θ T xi yi ) xi − x i 2 2σi p(xi |θ , σi , xi ) ∼ exp − 2 . [sent-61, score-2.828]

38 Similar to the total least squares method (3), we obtain the following formulation from (2): inf ln(1 + e−θ θ = arg min θ i T x i yi xi )+ xi − x i 2 2σi 2 . [sent-62, score-1.677]

39 This problem can be remedied using the support vector machine formulation, which has attractive intuitive geometric interpretations for linearly separable problems. [sent-64, score-0.236]

40 3 Total support vector classification Our formulation of support vector classification with uncertain input data is motivated by the total least squares regression method that can be derived from the statistical model (3). [sent-66, score-0.568]

41 , xi = xi + ∆xi where noise ∆xi follows certain distribution. [sent-70, score-1.104]

42 Hence instead of assuming Gaussian noise as in (3) and (4), we consider a simple bounded uncertainty model ∆xi ≤ δi with uniform priors. [sent-72, score-0.252]

43 However, under the bounded un2 certainty model, the squared penalty term ||xi − xi ||2 /2σi is replaced by a constraint ∆xi ≤ δi . [sent-74, score-0.583]

44 Another reason for us to use the bounded uncertainty noise model is that the resulting formulation has a more intuitive geometric interpretation (see Section 4). [sent-75, score-0.521]

45 In the separable case, TSVC solves the following problem: min w,b,∆xi ,i=1,··· , subject to 1 2 w 2 yi wT (xi + ∆xi ) + b ≥ 1, ∆xi ≤ δi , i = 1, · · · , . [sent-78, score-0.478]

46 min w,b,ξ,∆xi ,i=1,··· , subject to C i=1 ξi + 1 2 w 2 yi wT (xi + ∆xi ) + b ≥ 1 − ξi , ξi ≥ 0, i = 1, · · · , , ∆xi ≤ δi , i = 1, · · · , . [sent-81, score-0.395]

47 Problems with corrupted inputs are more difficult than problems with no input uncertainty. [sent-85, score-0.228]

48 By modifying the noisy input data as in (6), we reconstruct an easier problem, for which we may find a good linear separator. [sent-87, score-0.158]

49 Moreover, by modeling noise in the input data, TSVC becomes less sensitive to data points that are very uncertain since we can find a choice of ∆xi such that xi +∆xi is far from the decision boundary and will not be a support vector. [sent-88, score-0.862]

50 4 Geometric interpretation Further investigation reveals an intuitive geometric interpretation for TSVC which allows users to easily grasp the fundamentals of this new formulation. [sent-91, score-0.158]

51 For any given hyperplane (w, b), the solution ∆ˆ i of problem (6) is ∆ˆ i = x x yi δi w , i = 1, · · · , . [sent-96, score-0.394]

52 The minimization of ξi can be decoupled into subproblems of minimizing each ξi = max{0, 1 − yi (wT (xi + ∆xi ) + b)} = max{0, 1 − yi (wT xi + b) − yi wT ∆xi } over its corresponding ∆xi . [sent-99, score-1.397]

53 Since ˆ ∆xi is bounded by δi , the optimal ∆ˆ i = yi δi w and the minimal ξi = max{0, 1 − x w yi (wT xi + b) − δi w }. [sent-101, score-1.169]

54 Then Sw (X) is a set of points that are w obtained by shifting the original points labeled +1 along w and points labeled −1 along −w, respectively, to its individual uncertainty boundary. [sent-103, score-0.129]

55 Note that solving problem (5) is equivalent to max ρ subject to constraints y i (wT (xi + w w Figure 1: The separating hyperplanes obtained (left) by standard SVC and (middle) by total SVC (6). [sent-112, score-0.308]

56 To have the greatest ρ, we want ˆ ˆ to max yi (wT (xi + ∆xi ) + ˆ for all i’s over ∆xi . [sent-116, score-0.337]

57 Hence following similar arguments b) ˆ ˆ ˆ ˆ ˆ in Lemma 1, we have |yi wT ∆xi | ≤ w ∆xi = δi w and when ∆xi = yi δi w , the ˆ w “equal” sign holds. [sent-117, score-0.328]

58 For example, the linearly non-separable problem (6) becomes min w,b,ξ C i=1 ξi + 1 2 w 2 subject to yi wT xi + b + δi w ≥ 1 − ξi , ξi ≥ 0, i = 1, · · · , . [sent-123, score-0.961]

59 (7) Solving problem (7) yields an optimal solution to problem (6), and problem (7) can be interpreted as finding (w, b) to separate Sw (X) with the maximal soft margin. [sent-124, score-0.176]

60 Moreover, the SOCP formulation involves a large amount of redundant variables, so a typical SOCP solver will take much longer time to achieve an optimal solution. [sent-128, score-0.203]

61 The first step of Algorithm 1 solves no more than a standard SVM by treating xi + ∆xi as the training examples. [sent-134, score-0.598]

62 Similar to how SVMs are usually optimized, we can solve the dual ˆ b. [sent-135, score-0.133]

63 SVM formulation [8] for w, ˆ The second step of Algorithm 1 solves a problem which has been discussed in Lemma 1. [sent-136, score-0.205]

64 As analyzed in [5, 3], Tikhonov regularization min C ξi + 1 w 2 2 has an important equivalent formulation as min ξi , subject to w ≤ γ where γ is a positive constant. [sent-141, score-0.315]

65 It can be shown that if γ ≤ w ∗ where w∗ is the solution to problem 1 (6) with 2 w 2 removed, then the solution for the constraint problem is identical to the solution of the Tikhonov regularization problem for an appropriately chosen C. [sent-142, score-0.216]

66 min i=1 ξi w,b,ξ subject to yi wT xi + b + γδi ≥ 1 − ξi , ξi ≥ 0, i = 1, · · · , , w 2 ≤ γ2. [sent-145, score-0.904]

67 By duality analysis similarly adopted in [3], problem (8) has a dual formulation in dual variables α as follows min α subject to 5. [sent-147, score-0.408]

68 2 γ i,j=1 i=1 α i α j y i y j xT xj − i αi yi = 0, i=1 (1 − γδi )αi (9) 0 ≤ αi ≤ 1, i = 1, · · · , . [sent-148, score-0.368]

69 TSVC with kernels By using a kernel function k, the input vector xi is mapped to Φ(xi ) in a usually high dimensional feature space. [sent-149, score-0.657]

70 The uncertainty in the input data introduces uncertainties for images Φ(xi ) in the feature space. [sent-150, score-0.279]

71 TSVC can be generalized to construct separating hyperplanes in the feature space using the images of input vectors and the mapped uncertainties. [sent-151, score-0.216]

72 One possible generalization of TSVC is to assume the images are still subject to an additive noise and the uncertainty model in the feature space can be represented as ∆Φ(x i ) ≤ δi . [sent-152, score-0.269]

73 1, we obtain a problem same as (8) only with xi replaced by Φ(xi ) and ∆xi replaced by ∆Φ(xi ), which can be easily kernelized by solving its dual formulation (9) with inner products xT xj replaced by k(xi , xj ). [sent-154, score-1.017]

74 i It is more realistic, however, that we are only able to estimate uncertainties in the input space as bounded spheres ∆xi ≤ δi . [sent-155, score-0.187]

75 When the uncertainty sphere is mapped to the feature space, the mapped uncertainty region may correspond to an irregular shape in the feature space, which brings difficulties to the optimization of TSVC. [sent-156, score-0.342]

76 The first order Taylor expansion of k with respect to x is k(xi + ∆x, ·) = k(xi , ·) + ∆xT k (xi , ·) where k (xi , ·) is the gradient of k with respect to x at point xi . [sent-160, score-0.537]

77 Solving the dual SVM formulation in step 1 of Algorithm 1 with ∆xj fixed to ∆¯ j yields x ¯ a solution (w = j yj αj Φ(xj + ∆¯ j ), ¯ and thus a predictor f (x) = j yj αj k(x, xj + ¯ x b) ¯ ¯ b) ∆¯ j ) + ¯ In step 2, we set (w, b) to (w, ¯ and minimize x b. [sent-161, score-0.478]

78 ξi over ∆xi , which as we discussed in Lemma 1, amounts to minimizing each ξi = max{0, 1 − yi ( j yj αj k(xi + ¯ ∆xi , xj + ∆¯ j ) + b)} over ∆xi . [sent-162, score-0.439]

79 Applying the Taylor expansion yields x yi j yj αj k(xi + ∆xi , xj + ∆¯ j ) + b ¯ x = yi j yj αj k(xi , xj + ∆¯ j ) + b + yi ∆xT ¯ x i j yj αj k (xi , xj + ∆¯ j ). [sent-163, score-1.345]

80 ¯ x Table 1: Average test error percentages of TSVC and standard SVC algorithms on synthetic problems (left and middle ) and digits classification problems (right). [sent-164, score-0.252]

81 10 vi yj αj k (xi , xj +∆¯ j ) by the Cauchy-Schwarz ¯ x The optimal ∆xi = yi δi vi where vi = inequality. [sent-189, score-0.551]

82 0 to solve problems (8), (9) and the standard SVC dual problem that is part of Algorithm 1. [sent-193, score-0.199]

83 In the experiments with synthetic data in 2 dimensions, we generated (=20, 30, 50, 100, 150) training examples xi from the uniform distribution on [−5, 5]2 . [sent-194, score-0.614]

84 Two binary classification problems were created with target separating functions as x1 −x2 = 0 and x2 +x2 = 9, 1 2 respectively. [sent-195, score-0.127]

85 We used TSVC with linear functions for the first problem and TSVC with the quadratic kernel (xT xj )2 for the second problem. [sent-196, score-0.131]

86 The input vectors xi were contaminated i by Gaussian noise with mean [0,0] and covariance matrix Σ = σi I where σi was randomly chosen from [0. [sent-197, score-0.749]

87 The NIST database of handwritten digits does not contain any uncertainty information originally. [sent-214, score-0.227]

88 We used (=100, 500) digits from the beginning of the database in training and 2000 digits from the end of the database in test. [sent-218, score-0.173]

89 The uncertainty upper bounds δi can be regarded as tuning parameters. [sent-221, score-0.155]

90 7 Discussions We investigated a new learning model in which the observed input is corrupted with noise. [sent-230, score-0.191]

91 Based on a probability modeling approach, we derived a general statistical formulation where unobserved input is modeled as a hidden mixture component. [sent-231, score-0.363]

92 Under this framework, we were able to develop estimation methods that take input uncertainty into consideration. [sent-232, score-0.21]

93 Motivated by this probability modeling approach, we proposed a new SVM classification formulation that handles input uncertainty. [sent-233, score-0.276]

94 Moreover, we presented simple numerical algorithms which can be used to solve the resulting formulation efficiently. [sent-235, score-0.22]

95 Two empirical examples, one artificial and one with real data, were used to illustrate that the new method is superior to the standard SVM for problems with noisy input data. [sent-236, score-0.19]

96 Our work attempts to recover the original classifier from the corrupted training data, and hence we evaluated the performance on clean test data. [sent-238, score-0.137]

97 In our statistical modeling framework, rigorously speaking, the input uncertainty of test-data should be handled by a mixture model (or a voted classifier under the noisy input distribution). [sent-239, score-0.451]

98 The formulation in [2] was designed to separate the training data under the worst input noise configuration instead of the most likely configuration in our case. [sent-240, score-0.335]

99 The purpose is to directly handle test input uncertainty with a single linear classifier under the worst possible error setting. [sent-241, score-0.21]

100 A second order cone programming formulation for classifying missing data. [sent-256, score-0.168]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tsvc', 0.537), ('xi', 0.509), ('yi', 0.296), ('svc', 0.157), ('formulation', 0.141), ('wt', 0.139), ('uncertainty', 0.129), ('sw', 0.115), ('corrupted', 0.11), ('socp', 0.107), ('uncertain', 0.098), ('tikhonov', 0.086), ('noise', 0.086), ('input', 0.081), ('squares', 0.08), ('tls', 0.074), ('contaminated', 0.073), ('digits', 0.073), ('xj', 0.072), ('yj', 0.071), ('uncertainties', 0.069), ('dual', 0.057), ('dxi', 0.055), ('subject', 0.054), ('modeling', 0.054), ('synthetic', 0.053), ('solve', 0.051), ('geometric', 0.05), ('lemma', 0.049), ('hyperplanes', 0.049), ('classi', 0.049), ('intuitive', 0.048), ('separable', 0.047), ('siam', 0.047), ('svm', 0.047), ('target', 0.046), ('noisy', 0.046), ('min', 0.045), ('separating', 0.044), ('qcqp', 0.043), ('mapped', 0.042), ('xt', 0.041), ('regard', 0.041), ('discriminative', 0.041), ('max', 0.041), ('fix', 0.039), ('uncorrupted', 0.039), ('nist', 0.039), ('total', 0.039), ('replaced', 0.037), ('bounded', 0.037), ('problems', 0.037), ('solves', 0.036), ('hyperplane', 0.036), ('taylor', 0.035), ('support', 0.034), ('solution', 0.034), ('ln', 0.034), ('formulations', 0.033), ('predictor', 0.032), ('arguments', 0.032), ('least', 0.032), ('golub', 0.031), ('solver', 0.031), ('optimal', 0.031), ('easier', 0.031), ('quadratic', 0.031), ('errors', 0.031), ('mixture', 0.031), ('logistic', 0.031), ('alternating', 0.03), ('outputs', 0.03), ('regularization', 0.03), ('interpretation', 0.03), ('cation', 0.029), ('linearly', 0.029), ('statistical', 0.029), ('guration', 0.029), ('moreover', 0.029), ('numerical', 0.028), ('problem', 0.028), ('preprocessed', 0.028), ('expansion', 0.028), ('maximal', 0.027), ('vi', 0.027), ('cone', 0.027), ('unobserved', 0.027), ('solving', 0.027), ('training', 0.027), ('middle', 0.026), ('trials', 0.026), ('standard', 0.026), ('inf', 0.026), ('adopted', 0.026), ('separates', 0.026), ('regarded', 0.026), ('parameter', 0.025), ('examples', 0.025), ('usually', 0.025), ('handwritten', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

2 0.32458037 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

Author: Chiranjib Bhattacharyya, Pannagadatta K. Shivaswamy, Alex J. Smola

Abstract: We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi , yi ) a distribution over (xi , yi ) is specified. In particular, we derive a robust formulation when the distribution is given by a normal distribution. It leads to Second Order Cone Programming formulation. Our method is applied to the problem of missing data, where it outperforms direct imputation. 1

3 0.25570238 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

Author: Saharon Rosset, Robert Tibshirani, Ji Zhu, Trevor J. Hastie

Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. 1

4 0.2466252 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

Author: Peter L. Bartlett, Michael Collins, Ben Taskar, David A. McAllester

Abstract: We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient—even in cases where the number of labels y is exponential in size—provided that certain expectations under Gibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application of exponentiated gradient updates [7, 8] to quadratic programs. 1

5 0.15262367 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

Author: Máté Lengyel, Peter Dayan

Abstract: Areas of the brain involved in various forms of memory exhibit patterns of neural activity quite unlike those in canonical computational models. We show how to use well-founded Bayesian probabilistic autoassociative recall to derive biologically reasonable neuronal dynamics in recurrently coupled models, together with appropriate values for parameters such as the membrane time constant and inhibition. We explicitly treat two cases. One arises from a standard Hebbian learning rule, and involves activity patterns that are coded by graded firing rates. The other arises from a spike timing dependent learning rule, and involves patterns coded by the phase of spike times relative to a coherent local field potential oscillation. Our model offers a new and more complete understanding of how neural dynamics may support autoassociation. 1

6 0.13924062 44 nips-2004-Conditional Random Fields for Object Recognition

7 0.13664766 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

8 0.12959345 92 nips-2004-Kernel Methods for Implicit Surface Modeling

9 0.12675707 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

10 0.12632416 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

11 0.12619297 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

12 0.12382298 34 nips-2004-Breaking SVM Complexity with Cross-Training

13 0.11516642 19 nips-2004-An Application of Boosting to Graph Classification

14 0.11319377 82 nips-2004-Incremental Algorithms for Hierarchical Classification

15 0.11306422 17 nips-2004-Adaptive Manifold Learning

16 0.10763065 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

17 0.10729741 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

18 0.10569587 136 nips-2004-On Semi-Supervised Classification

19 0.1005161 115 nips-2004-Maximum Margin Clustering

20 0.095527574 42 nips-2004-Computing regularization paths for learning multiple kernels


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.301), (1, 0.12), (2, -0.029), (3, 0.237), (4, 0.077), (5, -0.038), (6, 0.081), (7, -0.013), (8, -0.179), (9, 0.084), (10, 0.319), (11, 0.116), (12, -0.164), (13, 0.038), (14, 0.07), (15, 0.133), (16, -0.237), (17, 0.112), (18, -0.026), (19, 0.104), (20, -0.083), (21, 0.023), (22, 0.021), (23, 0.048), (24, 0.012), (25, 0.018), (26, 0.115), (27, 0.015), (28, -0.035), (29, -0.02), (30, -0.053), (31, -0.029), (32, -0.082), (33, -0.03), (34, 0.039), (35, 0.05), (36, 0.053), (37, 0.038), (38, 0.019), (39, 0.023), (40, -0.042), (41, 0.033), (42, -0.071), (43, 0.019), (44, 0.03), (45, 0.042), (46, -0.006), (47, -0.096), (48, 0.044), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98805821 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

2 0.88787609 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

Author: Chiranjib Bhattacharyya, Pannagadatta K. Shivaswamy, Alex J. Smola

Abstract: We propose a convex optimization based strategy to deal with uncertainty in the observations of a classification problem. We assume that instead of a sample (xi , yi ) a distribution over (xi , yi ) is specified. In particular, we derive a robust formulation when the distribution is given by a normal distribution. It leads to Second Order Cone Programming formulation. Our method is applied to the problem of missing data, where it outperforms direct imputation. 1

3 0.8128258 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

Author: Peter L. Bartlett, Michael Collins, Ben Taskar, David A. McAllester

Abstract: We consider the problem of structured classification, where the task is to predict a label y from an input x, and y has meaningful internal structure. Our framework includes supervised training of Markov random fields and weighted context-free grammars as special cases. We describe an algorithm that solves the large-margin optimization problem defined in [12], using an exponential-family (Gibbs distribution) representation of structured objects. The algorithm is efficient—even in cases where the number of labels y is exponential in size—provided that certain expectations under Gibbs distributions can be calculated efficiently. The method for structured labels relies on a more general result, specifically the application of exponentiated gradient updates [7, 8] to quadratic programs. 1

4 0.71967363 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

Author: Saharon Rosset, Robert Tibshirani, Ji Zhu, Trevor J. Hastie

Abstract: In this paper we argue that the choice of the SVM cost parameter can be critical. We then derive an algorithm that can fit the entire path of SVM solutions for every value of the cost parameter, with essentially the same computational cost as fitting one SVM model. 1

5 0.6169157 17 nips-2004-Adaptive Manifold Learning

Author: Jing Wang, Zhenyue Zhang, Hongyuan Zha

Abstract: Recently, there have been several advances in the machine learning and pattern recognition communities for developing manifold learning algorithms to construct nonlinear low-dimensional manifolds from sample data points embedded in high-dimensional spaces. In this paper, we develop algorithms that address two key issues in manifold learning: 1) the adaptive selection of the neighborhood sizes; and 2) better fitting the local geometric structure to account for the variations in the curvature of the manifold and its interplay with the sampling density of the data set. We also illustrate the effectiveness of our methods on some synthetic data sets. 1

6 0.61113793 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

7 0.60019344 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

8 0.59824145 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

9 0.58268636 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

10 0.57233745 44 nips-2004-Conditional Random Fields for Object Recognition

11 0.5270102 41 nips-2004-Comparing Beliefs, Surveys, and Random Walks

12 0.51919615 19 nips-2004-An Application of Boosting to Graph Classification

13 0.49248725 34 nips-2004-Breaking SVM Complexity with Cross-Training

14 0.49149993 82 nips-2004-Incremental Algorithms for Hierarchical Classification

15 0.48013312 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

16 0.47936603 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

17 0.47047296 92 nips-2004-Kernel Methods for Implicit Surface Modeling

18 0.43742478 36 nips-2004-Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification

19 0.43338493 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

20 0.43251961 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.123), (15, 0.209), (19, 0.147), (26, 0.053), (31, 0.038), (33, 0.184), (35, 0.025), (39, 0.05), (50, 0.046), (76, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97103047 33 nips-2004-Brain Inspired Reinforcement Learning

Author: Françcois Rivest, Yoshua Bengio, John Kalaska

Abstract: Successful application of reinforcement learning algorithms often involves considerable hand-crafting of the necessary non-linear features to reduce the complexity of the value functions and hence to promote convergence of the algorithm. In contrast, the human brain readily and autonomously finds the complex features when provided with sufficient training. Recent work in machine learning and neurophysiology has demonstrated the role of the basal ganglia and the frontal cortex in mammalian reinforcement learning. This paper develops and explores new reinforcement learning algorithms inspired by neurological evidence that provides potential new approaches to the feature construction problem. The algorithms are compared and evaluated on the Acrobot task. 1

2 0.95091981 172 nips-2004-Sparse Coding of Natural Images Using an Overcomplete Set of Limited Capacity Units

Author: Eizaburo Doi, Michael S. Lewicki

Abstract: It has been suggested that the primary goal of the sensory system is to represent input in such a way as to reduce the high degree of redundancy. Given a noisy neural representation, however, solely reducing redundancy is not desirable, since redundancy is the only clue to reduce the effects of noise. Here we propose a model that best balances redundancy reduction and redundant representation. Like previous models, our model accounts for the localized and oriented structure of simple cells, but it also predicts a different organization for the population. With noisy, limited-capacity units, the optimal representation becomes an overcomplete, multi-scale representation, which, compared to previous models, is in closer agreement with physiological data. These results offer a new perspective on the expansion of the number of neurons from retina to V1 and provide a theoretical model of incorporating useful redundancy into efficient neural representations. 1

same-paper 3 0.92761821 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

4 0.88407904 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

5 0.88246411 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

Author: Changjiang Yang, Ramani Duraiswami, Larry S. Davis

Abstract: The computation and memory required for kernel machines with N training samples is at least O(N 2 ). Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N ). We also give an error bound for the approximation, and provide experimental results on the UCI datasets. 1

6 0.88076001 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

7 0.87999022 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

8 0.87725508 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

9 0.87700665 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression

10 0.87678671 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

11 0.87576044 168 nips-2004-Semigroup Kernels on Finite Sets

12 0.8752113 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

13 0.87456119 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

14 0.87237197 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

15 0.87079334 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

16 0.86929816 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

17 0.86806744 70 nips-2004-Following Curved Regularized Optimization Solution Paths

18 0.86563134 163 nips-2004-Semi-parametric Exponential Family PCA

19 0.8654505 148 nips-2004-Probabilistic Computation in Spiking Populations

20 0.86371738 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach