nips nips2007 nips2007-32 knowledge-graph by maker-knowledge-mining

32 nips-2007-Bayesian Co-Training


Source: pdf

Author: Shipeng Yu, Balaji Krishnapuram, Harald Steck, R. B. Rao, Rómer Rosales

Abstract: We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, unlike some previous multi-view learning methods. Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. Experiments on toy data and real world data sets illustrate the benefits of this approach. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. [sent-5, score-0.175]

2 This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. [sent-6, score-0.169]

3 Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. [sent-7, score-0.108]

4 Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. [sent-9, score-0.322]

5 1 Introduction Data samples may sometimes be characterized in multiple ways, e. [sent-11, score-0.086]

6 , web-pages can be described both in terms of the textual content in each page and the hyperlink structure between them. [sent-13, score-0.027]

7 [1] have shown that the error rate on unseen test samples can be upper bounded by the disagreement between the classification-decisions obtained from independent characterizations (i. [sent-14, score-0.404]

8 Thus, in the web-page example, misclassification rate can be indirectly minimized by reducing the rate of disagreement between hyperlink-based and content-based classifiers, provided these characterizations are independent conditional on the class. [sent-17, score-0.449]

9 In many application domains class labels can be expensive to obtain and hence scarce, whereas unlabeled data are often cheap and abundantly available. [sent-18, score-0.227]

10 Moreover, the disagreement between the class labels suggested by different views can be computed even when using unlabeled data. [sent-19, score-0.612]

11 Therefore, a natural strategy for using unlabeled data to minimize the misclassification rate is to enforce consistency between the classification decisions based on several independent characterizations of the unlabeled samples. [sent-20, score-0.598]

12 For brevity, unless otherwise specified, we shall use the term co-training to describe the entire genre of methods that rely upon this intuition, although strictly it should only refer to the original algorithm of [2]. [sent-21, score-0.092]

13 Some co-training algorithms jointly optimize an objective function which includes misclassification penalties (loss terms) for classifiers from each view and a regularization term that penalizes lack of agreement between the classification decisions of the different views. [sent-22, score-0.333]

14 In recent times, this coregularization approach has become the dominant strategy for exploiting the intuition behind multiview consensus learning, rendering obsolete earlier alternating-optimization strategies. [sent-23, score-0.523]

15 We survey in Section 2 the major approaches to co-training, the theoretical guarantees that have spurred interest in the topic, and the previously published concerns about the applicability to certain domains. [sent-24, score-0.091]

16 We analyze the precise assumptions that have been made and the optimization criteria to better understand why these approaches succeed (or fail) in certain situations. [sent-25, score-0.06]

17 Then in Section 3 we propose a principled undirected graphical model for co-training which we call the Bayesian cotraining, and show that co-regularization algorithms provide one way for maximum-likelihood (ML) learning under this probabilistic model. [sent-26, score-0.175]

18 By explicitly highlighting previously unstated assumptions, 1 Bayesian co-training provides a deeper understanding of the co-regularization framework, and we are also able to discuss certain fundamental limitations of multi-view consensus learning. [sent-27, score-0.585]

19 In Section 4, we show that even simple and visually illustrated 2-D problems are sometimes not amenable to a co-training/co-regularization solution (no matter which specific model/algorithm is used – including ours). [sent-28, score-0.028]

20 Summarizing our algorithmic contributions, co-regularization is exactly equivalent to the use of a novel co-training kernel for support vector machines (SVMs) and Gaussian processes (GP), thus allowing one to leverage the large body of available literature for these algorithms. [sent-30, score-0.039]

21 , the level of similarity between any pair of samples depends on all the available samples, whether labeled or unlabeled, thus promoting semi-supervised learning. [sent-33, score-0.085]

22 Furthermore, we can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. [sent-35, score-0.322]

23 While each of these theoretical guarantees is intriguing and theoretically interesting, they are also rather unrealistic in many application domains. [sent-41, score-0.089]

24 The assumption that classifiers do not make mistakes when they are confident and that of class conditional independence are rarely satisfied in practice. [sent-42, score-0.147]

25 Co-EM and Related Algorithms: The Co-EM algorithm of [4] extended the original bootstrap approach of the co-training algorithm to operate simultaneously on all unlabeled samples in an iterative batch mode. [sent-44, score-0.241]

26 This co-regularization framework improves upon the cotraining and co-EM algorithms by maximizing a convex objective function; however the algorithm still depends on an alternating optimization that optimizes one view at a time. [sent-50, score-0.353]

27 Relationship to Current Work: The present work provides a probabilistic graphical model for multi-view consensus learning; alternating optimization based co-regularization is shown to be just one algorithm that accomplishes ML learning in this model. [sent-52, score-0.612]

28 A more efficient, alternative strategy is proposed here for fully Bayesian classification under the same model. [sent-53, score-0.042]

29 In practice, this strategy offers several advantages: it is easily extended to multiple views, it accommodates noisy views which are less predictive of class labels, and reduces run-time and memory requirements. [sent-54, score-0.433]

30 2 f(x1) y1 f1(x1(1)) f(x2) y2 f1(x2(1)) fc(x1) f2(x1(2)) y1 f(xn) yn (b) f1(xn(1)) y2 fc(xn) f2(x2(2)) … … … (a) fc(x2) f2(xn(2)) yn Figure 1: Factor graph for (a) one-view and (b) two-view models. [sent-55, score-0.092]

31 1 Single-View Learning with Gaussian Processes A Gaussian Process (GP) defines a nonparametric prior over functions in Bayesian statistics [9]. [sent-57, score-0.029]

32 , xn ∈ Rd , f = {f (xi )}n follows a multivariate Gaussian i=1 N (h, K) with mean h = {h(xi )}n and covariance K = {κ(xi , xj )}n . [sent-61, score-0.091]

33 Normally we fix the i=1 i,j=1 mean function h ≡ 0, and take a parametric (and usually stationary) form for the kernel function κ (e. [sent-62, score-0.039]

34 , the Gaussian kernel κ(xk , x ) = exp(−ρ xk − x 2 ) with ρ > 0 a free parameter). [sent-64, score-0.095]

35 In a single-view, supervised learning scenario, an output or target yi is given for each observation xi (e. [sent-65, score-0.368]

36 , for regression yi ∈ R and for classification yi ∈ {−1, +1}). [sent-67, score-0.381]

37 In the GP model we assume there is a latent function f underlying the output, p(yi |xi ) = p(yi |f, xi )p(f ) df , with the GP prior p(f ) = GP(h, κ). [sent-68, score-0.298]

38 Given the latent function f , p(yi |f, xi ) = p(yi |f (xi )) takes a Gaussian noise model N (f (xi ), σ 2 ) for regression, and a sigmoid function λ(yi f (xi )) for classification. [sent-69, score-0.269]

39 The dependency structure of the single-view GP model can be shown as an undirected graph as in Fig. [sent-70, score-0.147]

40 The maximal cliques of the graphical model are the fully connected nodes (f (x1 ), . [sent-72, score-0.104]

41 2 Undirected Graphical Model for Multi-View Learning In multi-view learning, suppose we have m different views of a same set of n data samples. [sent-81, score-0.258]

42 Let (j) xi ∈ Rdj be the features for the i-th sample obtained using the j-th view, where dj is the di(1) (m) mensionality of the input space for view j. [sent-82, score-0.359]

43 , xi ) is the complete (j) (j) representation of the i-th data sample, and x(j) (x1 , . [sent-86, score-0.147]

44 , xn ) represents all sample observations for the j-th view. [sent-89, score-0.093]

45 , yn ) where yi is the single output assigned to the i-th data point. [sent-93, score-0.267]

46 Looking at this problem from a GP perspective, let fj denote the latent function for the j-th view (i. [sent-95, score-0.504]

47 , using features only from view j), and let fj ∼ GP(0, κj ) be its GP prior in view j. [sent-97, score-0.594]

48 Since one data sample i has only one single label yi even though it has multiple features from the multiple 1 The definition of ψ in this paper has been overloaded to simplify notation, but its meaning should be clear from the function arguments. [sent-98, score-0.351]

49 , latent function value fj (xi ) for view j), the label yi should depend on all of these latent function values for data sample i. [sent-101, score-0.87]

50 The challenge here is to make this dependency explicit in a graphical model. [sent-102, score-0.126]

51 We tackle this problem by introducing a new latent function, the consensus function fc , to ensure conditional independence between the output y and the m latent functions {fj } for the m views (see Fig. [sent-103, score-1.432]

52 At the functional level, the output y depends only on fc , and latent functions {fj } depend on each other only via the consensus function fc . [sent-105, score-1.363]

53 That is, we have the joint probability: m p(y, fc , f1 , . [sent-106, score-0.404]

54 , fm ) = 1 ψ(y, fc ) ψ(fj , fc ), Z j=1 with some potential functions ψ. [sent-109, score-0.816]

55 The graphical model leads to the following factorization: i=1 p (y, f c , f 1 , . [sent-111, score-0.077]

56 , f m ) = 1 Z m ψ(yi , fc (xi )) i ψ(f j )ψ(f j , f c ). [sent-114, score-0.374]

57 (2) j=1 Here the within-view potential ψ(f j ) specifies the dependency structure within each view j, and the consensus potential ψ(f j , f c ) describes how the latent function in each view is related with the consensus function fc . [sent-115, score-1.939]

58 With a GP prior for each of the views, we can define the following potentials: ψ(f j ) = exp 1 − f j K−1 f j , j 2 ψ(f j , f c ) = exp − (j) fj − fc 2 2σj 2 , (3) (j) where Kj is the covariance matrix of view j, i. [sent-116, score-0.896]

59 , Kj (xk , x ) = κj (xk , x ), and σj > 0 a scalar which quantifies how far away the latent function f j is from f c . [sent-118, score-0.122]

60 The output potential ψ(yi , fc (xi )) is defined the same as that in (1) for regression or classification. [sent-119, score-0.522]

61 Some more insight may be gained by taking a careful look at these definitions: 1) The within-view potentials only rely on the intrinsic structure of each view, i. [sent-120, score-0.085]

62 , through the covariance Kj in a GP setting; 2) Each consensus potential actually defines a Gaussian over the difference of f j and f c , 2 i. [sent-122, score-0.541]

63 , f j − f c ∼ N (0, σj I), and it can also be interpreted as assuming a conditional Gaussian for f j with the consensus f c being the mean. [sent-124, score-0.479]

64 Alternatively if we focus on f c , the joint consensus potentials 2 effectively define a conditional Gaussian prior for f c , f c |f 1 , . [sent-125, score-0.594]

65 , f m , as N (µc , σc I) where 2 µc = σc j fj 2, σj 2 σc = j 1 2 σj −1 . [sent-128, score-0.199]

66 This indicates that the prior mean of the consensus function f c is a weighted combination of the latent functions from all the views, and the weight is given by the inverse variance of each consensus potential. [sent-130, score-1.043]

67 The higher the variance, the smaller the contribution to the consensus function. [sent-131, score-0.446]

68 More insights of this undirected graphical model can be seen from the marginals, which we discuss in detail in the following subsections. [sent-132, score-0.211]

69 In addition, this Bayesian interpretation also helps us understand both the benefits and the limitations of co-training. [sent-134, score-0.06]

70 3 Marginal 1: Co-Regularized Multi-View Learning By taking the integral of (2) over f c (and ignoring the output potential for the moment), we obtain the joint marginal distribution of the m latent functions:    1 m 1 fj − fk 2  1 f j K−1 f j − . [sent-136, score-0.466]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('consensus', 0.446), ('fc', 0.374), ('gp', 0.311), ('views', 0.258), ('fj', 0.199), ('view', 0.183), ('yi', 0.174), ('characterizations', 0.16), ('disagreement', 0.159), ('xi', 0.147), ('classi', 0.139), ('unlabeled', 0.135), ('ers', 0.124), ('latent', 0.122), ('misclassi', 0.111), ('dent', 0.102), ('undirected', 0.098), ('decisions', 0.091), ('cotraining', 0.08), ('unstated', 0.08), ('graphical', 0.077), ('kj', 0.071), ('potential', 0.068), ('xn', 0.064), ('unreliable', 0.059), ('trusted', 0.059), ('guarantees', 0.059), ('alternating', 0.057), ('bootstrap', 0.056), ('xk', 0.056), ('potentials', 0.056), ('bayesian', 0.055), ('mistakes', 0.053), ('samples', 0.05), ('accommodate', 0.049), ('dependency', 0.049), ('output', 0.047), ('yn', 0.046), ('exp', 0.042), ('penalized', 0.042), ('strategy', 0.042), ('label', 0.041), ('gaussian', 0.041), ('cation', 0.04), ('kernel', 0.039), ('conditionally', 0.037), ('multiple', 0.036), ('insights', 0.036), ('rate', 0.035), ('promoting', 0.035), ('accommodates', 0.035), ('bharat', 0.035), ('harald', 0.035), ('obsolete', 0.035), ('overloaded', 0.035), ('rosales', 0.035), ('shipeng', 0.035), ('svms', 0.035), ('ml', 0.034), ('upon', 0.033), ('regression', 0.033), ('conditional', 0.033), ('jointly', 0.032), ('cad', 0.032), ('abundantly', 0.032), ('accomplishes', 0.032), ('balaji', 0.032), ('concatenate', 0.032), ('krishnapuram', 0.032), ('scarce', 0.032), ('siemens', 0.032), ('spurred', 0.032), ('steck', 0.032), ('summarizing', 0.032), ('con', 0.032), ('understand', 0.031), ('noisy', 0.031), ('class', 0.031), ('joint', 0.03), ('independence', 0.03), ('intrinsically', 0.03), ('genre', 0.03), ('hereafter', 0.03), ('highlighting', 0.03), ('intriguing', 0.03), ('retraining', 0.03), ('rely', 0.029), ('prior', 0.029), ('sample', 0.029), ('assumptions', 0.029), ('labels', 0.029), ('limitations', 0.029), ('brevity', 0.028), ('visually', 0.028), ('clari', 0.028), ('covariance', 0.027), ('indirectly', 0.027), ('cliques', 0.027), ('penalties', 0.027), ('textual', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 32 nips-2007-Bayesian Co-Training

Author: Shipeng Yu, Balaji Krishnapuram, Harald Steck, R. B. Rao, Rómer Rosales

Abstract: We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, unlike some previous multi-view learning methods. Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. Experiments on toy data and real world data sets illustrate the benefits of this approach. 1

2 0.28303322 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

Author: Kai Yu, Wei Chu

Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1

3 0.19973743 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.

4 0.18464299 104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes

Author: Maneesh Sahani, Byron M. Yu, John P. Cunningham, Krishna V. Shenoy

Abstract: Neural spike trains present challenges to analytical efforts due to their noisy, spiking nature. Many studies of neuroscientific and neural prosthetic importance rely on a smoothed, denoised estimate of the spike train’s underlying firing rate. Current techniques to find time-varying firing rates require ad hoc choices of parameters, offer no confidence intervals on their estimates, and can obscure potentially important single trial variability. We present a new method, based on a Gaussian Process prior, for inferring probabilistically optimal estimates of firing rate functions underlying single or multiple neural spike trains. We test the performance of the method on simulated data and experimentally gathered neural spike trains, and we demonstrate improvements over conventional estimators. 1

5 0.15849088 170 nips-2007-Robust Regression with Twinned Gaussian Processes

Author: Andrew Naish-guzman, Sean Holden

Abstract: We propose a Gaussian process (GP) framework for robust inference in which a GP prior on the mixing weights of a two-component noise model augments the standard process over latent function values. This approach is a generalization of the mixture likelihood used in traditional robust GP regression, and a specialization of the GP mixture models suggested by Tresp [1] and Rasmussen and Ghahramani [2]. The value of this restriction is in its tractable expectation propagation updates, which allow for faster inference and model selection, and better convergence than the standard mixture. An additional benefit over the latter method lies in our ability to incorporate knowledge of the noise domain to influence predictions, and to recover with the predictive distribution information about the outlier distribution via the gating process. The model has asymptotic complexity equal to that of conventional robust methods, but yields more confident predictions on benchmark problems than classical heavy-tailed models and exhibits improved stability for data with clustered corruptions, for which they fail altogether. We show further how our approach can be used without adjustment for more smoothly heteroscedastic data, and suggest how it could be extended to more general noise models. We also address similarities with the work of Goldberg et al. [3].

6 0.12520835 166 nips-2007-Regularized Boost for Semi-Supervised Learning

7 0.12222077 135 nips-2007-Multi-task Gaussian Process Prediction

8 0.11513191 69 nips-2007-Discriminative Batch Mode Active Learning

9 0.1044338 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

10 0.09429127 97 nips-2007-Hidden Common Cause Relations in Relational Learning

11 0.093083411 205 nips-2007-Theoretical Analysis of Learning with Reward-Modulated Spike-Timing-Dependent Plasticity

12 0.090786643 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

13 0.090600289 195 nips-2007-The Generalized FITC Approximation

14 0.090438545 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

15 0.088802449 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes

16 0.087779365 175 nips-2007-Semi-Supervised Multitask Learning

17 0.083676361 147 nips-2007-One-Pass Boosting

18 0.083188884 161 nips-2007-Random Projections for Manifold Learning

19 0.08225935 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

20 0.07876578 149 nips-2007-Optimal ROC Curve for a Combination of Classifiers


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.253), (1, 0.114), (2, -0.114), (3, 0.164), (4, -0.041), (5, 0.038), (6, -0.157), (7, -0.239), (8, 0.082), (9, 0.025), (10, -0.207), (11, -0.095), (12, -0.185), (13, -0.004), (14, -0.013), (15, 0.031), (16, -0.008), (17, -0.031), (18, -0.03), (19, 0.021), (20, 0.061), (21, -0.062), (22, 0.008), (23, 0.083), (24, -0.009), (25, 0.014), (26, -0.029), (27, -0.084), (28, -0.009), (29, 0.029), (30, 0.079), (31, 0.057), (32, 0.014), (33, 0.084), (34, -0.097), (35, -0.056), (36, -0.085), (37, -0.017), (38, -0.065), (39, 0.074), (40, 0.045), (41, 0.094), (42, -0.055), (43, -0.07), (44, 0.115), (45, 0.004), (46, -0.076), (47, -0.101), (48, 0.042), (49, 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94173712 32 nips-2007-Bayesian Co-Training

Author: Shipeng Yu, Balaji Krishnapuram, Harald Steck, R. B. Rao, Rómer Rosales

Abstract: We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, unlike some previous multi-view learning methods. Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. Experiments on toy data and real world data sets illustrate the benefits of this approach. 1

2 0.73909503 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

Author: Kai Yu, Wei Chu

Abstract: This paper aims to model relational data on edges of networks. We describe appropriate Gaussian Processes (GPs) for directed, undirected, and bipartite networks. The inter-dependencies of edges can be effectively modeled by adapting the GP hyper-parameters. The framework suggests an intimate connection between link prediction and transfer learning, which were traditionally two separate research topics. We develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity. 1

3 0.62811816 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

Author: Geoffrey E. Hinton, Ruslan Salakhutdinov

Abstract: We show how to use unlabeled data and a deep belief net (DBN) to learn a good covariance kernel for a Gaussian process. We first learn a deep generative model of the unlabeled data using the fast, greedy algorithm introduced by [7]. If the data is high-dimensional and highly-structured, a Gaussian kernel applied to the top layer of features in the DBN works much better than a similar kernel applied to the raw input. Performance at both regression and classification can then be further improved by using backpropagation through the DBN to discriminatively fine-tune the covariance kernel.

4 0.58203 104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes

Author: Maneesh Sahani, Byron M. Yu, John P. Cunningham, Krishna V. Shenoy

Abstract: Neural spike trains present challenges to analytical efforts due to their noisy, spiking nature. Many studies of neuroscientific and neural prosthetic importance rely on a smoothed, denoised estimate of the spike train’s underlying firing rate. Current techniques to find time-varying firing rates require ad hoc choices of parameters, offer no confidence intervals on their estimates, and can obscure potentially important single trial variability. We present a new method, based on a Gaussian Process prior, for inferring probabilistically optimal estimates of firing rate functions underlying single or multiple neural spike trains. We test the performance of the method on simulated data and experimentally gathered neural spike trains, and we demonstrate improvements over conventional estimators. 1

5 0.57341349 170 nips-2007-Robust Regression with Twinned Gaussian Processes

Author: Andrew Naish-guzman, Sean Holden

Abstract: We propose a Gaussian process (GP) framework for robust inference in which a GP prior on the mixing weights of a two-component noise model augments the standard process over latent function values. This approach is a generalization of the mixture likelihood used in traditional robust GP regression, and a specialization of the GP mixture models suggested by Tresp [1] and Rasmussen and Ghahramani [2]. The value of this restriction is in its tractable expectation propagation updates, which allow for faster inference and model selection, and better convergence than the standard mixture. An additional benefit over the latter method lies in our ability to incorporate knowledge of the noise domain to influence predictions, and to recover with the predictive distribution information about the outlier distribution via the gating process. The model has asymptotic complexity equal to that of conventional robust methods, but yields more confident predictions on benchmark problems than classical heavy-tailed models and exhibits improved stability for data with clustered corruptions, for which they fail altogether. We show further how our approach can be used without adjustment for more smoothly heteroscedastic data, and suggest how it could be extended to more general noise models. We also address similarities with the work of Goldberg et al. [3].

6 0.53475159 97 nips-2007-Hidden Common Cause Relations in Relational Learning

7 0.51269495 166 nips-2007-Regularized Boost for Semi-Supervised Learning

8 0.48250261 195 nips-2007-The Generalized FITC Approximation

9 0.45307061 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

10 0.43889207 201 nips-2007-The Value of Labeled and Unlabeled Examples when the Model is Imperfect

11 0.43060672 45 nips-2007-Classification via Minimum Incremental Coding Length (MICL)

12 0.42434284 175 nips-2007-Semi-Supervised Multitask Learning

13 0.41482824 186 nips-2007-Statistical Analysis of Semi-Supervised Regression

14 0.40672022 11 nips-2007-A Risk Minimization Principle for a Class of Parzen Estimators

15 0.40658066 69 nips-2007-Discriminative Batch Mode Active Learning

16 0.39255524 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning

17 0.3875891 139 nips-2007-Nearest-Neighbor-Based Active Learning for Rare Category Detection

18 0.38700196 135 nips-2007-Multi-task Gaussian Process Prediction

19 0.38502815 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

20 0.37455592 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.03), (13, 0.022), (16, 0.015), (18, 0.016), (21, 0.561), (31, 0.016), (34, 0.021), (35, 0.03), (47, 0.055), (83, 0.088), (87, 0.014), (90, 0.048)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95721847 103 nips-2007-Inferring Elapsed Time from Stochastic Neural Processes

Author: Misha Ahrens, Maneesh Sahani

Abstract: Many perceptual processes and neural computations, such as speech recognition, motor control and learning, depend on the ability to measure and mark the passage of time. However, the processes that make such temporal judgements possible are unknown. A number of different hypothetical mechanisms have been advanced, all of which depend on the known, temporally predictable evolution of a neural or psychological state, possibly through oscillations or the gradual decay of a memory trace. Alternatively, judgements of elapsed time might be based on observations of temporally structured, but stochastic processes. Such processes need not be specific to the sense of time; typical neural and sensory processes contain at least some statistical structure across a range of time scales. Here, we investigate the statistical properties of an estimator of elapsed time which is based on a simple family of stochastic process. 1

same-paper 2 0.95131469 32 nips-2007-Bayesian Co-Training

Author: Shipeng Yu, Balaji Krishnapuram, Harald Steck, R. B. Rao, Rómer Rosales

Abstract: We propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, unlike some previous multi-view learning methods. Furthermore, it can automatically estimate how much each view should be trusted, and thus accommodate noisy or unreliable views. Experiments on toy data and real world data sets illustrate the benefits of this approach. 1

3 0.94372791 193 nips-2007-The Distribution Family of Similarity Distances

Author: Gertjan Burghouts, Arnold Smeulders, Jan-mark Geusebroek

Abstract: Assessing similarity between features is a key step in object recognition and scene categorization tasks. We argue that knowledge on the distribution of distances generated by similarity functions is crucial in deciding whether features are similar or not. Intuitively one would expect that similarities between features could arise from any distribution. In this paper, we will derive the contrary, and report the theoretical result that Lp -norms –a class of commonly applied distance metrics– from one feature vector to other vectors are Weibull-distributed if the feature values are correlated and non-identically distributed. Besides these assumptions being realistic for images, we experimentally show them to hold for various popular feature extraction algorithms, for a diverse range of images. This fundamental insight opens new directions in the assessment of feature similarity, with projected improvements in object and scene recognition algorithms. 1

4 0.90346086 19 nips-2007-Active Preference Learning with Discrete Choice Data

Author: Brochu Eric, Nando D. Freitas, Abhijeet Ghosh

Abstract: We propose an active learning algorithm that learns a continuous valuation model from discrete preferences. The algorithm automatically decides what items are best presented to an individual in order to find the item that they value highly in as few trials as possible, and exploits quirks of human psychology to minimize time and cognitive burden. To do this, our algorithm maximizes the expected improvement at each query without accurately modelling the entire valuation surface, which would be needlessly expensive. The problem is particularly difficult because the space of choices is infinite. We demonstrate the effectiveness of the new algorithm compared to related active learning methods. We also embed the algorithm within a decision making tool for assisting digital artists in rendering materials. The tool finds the best parameters while minimizing the number of queries. 1

5 0.61381942 172 nips-2007-Scene Segmentation with CRFs Learned from Partially Labeled Images

Author: Bill Triggs, Jakob J. Verbeek

Abstract: Conditional Random Fields (CRFs) are an effective tool for a variety of different data segmentation and labeling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labeling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labeling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labeled datasets which can be costly and troublesome to produce. We introduce a method for learning CRFs from datasets with many unlabeled nodes by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that effective models can be learned from fragmentary labelings and that incorporating top-down aggregate features significantly improves the segmentations. The resulting segmentations are compared to the state-of-the-art on three different image datasets. 1

6 0.59453136 69 nips-2007-Discriminative Batch Mode Active Learning

7 0.59031874 93 nips-2007-GRIFT: A graphical model for inferring visual classification features from human data

8 0.58297592 56 nips-2007-Configuration Estimates Improve Pedestrian Finding

9 0.57965773 88 nips-2007-Fast and Scalable Training of Semi-Supervised CRFs with Application to Activity Recognition

10 0.57354945 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

11 0.56646067 28 nips-2007-Augmented Functional Time Series Representation and Forecasting with Gaussian Processes

12 0.56605011 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

13 0.56427366 175 nips-2007-Semi-Supervised Multitask Learning

14 0.56088972 136 nips-2007-Multiple-Instance Active Learning

15 0.55731416 97 nips-2007-Hidden Common Cause Relations in Relational Learning

16 0.55607533 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

17 0.54338562 83 nips-2007-Evaluating Search Engines by Modeling the Relationship Between Relevance and Clicks

18 0.5427677 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

19 0.54219264 104 nips-2007-Inferring Neural Firing Rates from Spike Trains Using Gaussian Processes

20 0.5373022 100 nips-2007-Hippocampal Contributions to Control: The Third Way