nips nips2012 nips2012-5 knowledge-graph by maker-knowledge-mining

5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning


Source: pdf

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. [sent-6, score-1.117]

2 As in ordinary regression, the candidate label set is a noisy version of the true label. [sent-7, score-0.388]

3 In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. [sent-8, score-0.374]

4 It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. [sent-11, score-0.448]

5 Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. [sent-14, score-0.445]

6 Fortunately, it is often possible to obtain a set of labels for each instance, where the correct label is one of the elements of the set. [sent-18, score-0.358]

7 Imprecisely-labeled training examples can be created by detecting each face in the image and defining a label set containing all of the names mentioned in the caption. [sent-21, score-0.274]

8 In this task, a field recording of multiple birds singing is divided into 10-second segments, and experts identify the species of all of the birds singing in each segment without localizing each species to a specific part of the spectrogram. [sent-23, score-0.262]

9 The superset label learning problem has been studied under two main formulations. [sent-26, score-0.559]

10 The assumption is that for every instance xi,j ∈ Bi , its true label yi,j ∈ Yi . [sent-31, score-0.314]

11 In the superset label formulation (which has sometimes been confusingly called the “partial label” problem) [7, 10, 8, 12, 4, 5], each instance xn has a candidate label set Yn that contains the unknown 1 true label yn . [sent-35, score-1.712]

12 It is more general than the MIML formulation, since any MIML problem can be converted to a superset label problem (with loss of the bag information). [sent-37, score-0.591]

13 Furthermore, the superset label formulation is natural in many applications that do not involve bags of instances. [sent-38, score-0.559]

14 For example, in some applications, annotators may be unsure of the correct label, so permitting them to provide a superset of the correct label avoids the risk of mislabeling. [sent-39, score-0.559]

15 In this paper, we employ the superset label formulation. [sent-40, score-0.559]

16 [5] who extend SVMs to handle superset labeled data. [sent-43, score-0.323]

17 In the superset label problem, the label set Yn can be viewed as a corruption of the true label. [sent-44, score-0.828]

18 In standard supervised learning, it is common to assume that the observed label is sampled from a Bernoulli random variable whose most likely outcome is equal to the true label. [sent-46, score-0.284]

19 In the superset label problem, we will assume that the observed label set Yn is drawn from a set-valued distribution p(Yn |yn ) that depends only on the true label. [sent-48, score-0.849]

20 When computing the likelihood, this will allow us to treat the true label as a latent variable that can be marginalized away. [sent-49, score-0.291]

21 When the label information is imprecise, the learning algorithm has to depend more on underlying structure in the data. [sent-50, score-0.236]

22 This suggests that the underlying structure of the data should also play important role in the superset label problem. [sent-52, score-0.559]

23 In this paper, we propose the Logistic Stick-Breaking Conditional Multinomial Model (LSB-CMM) for the superset label learning problem. [sent-53, score-0.559]

24 Given an input xn , the mapping component maps xn to a region k. [sent-55, score-0.382]

25 Then the coding component generates the label according to a multinomial distribution associated with k. [sent-56, score-0.388]

26 LSB-CMM addresses the superset label problem in several aspects. [sent-59, score-0.559]

27 The fact that instances in the same region often have the same label is important for inferring the true label from noisy candidate label sets. [sent-61, score-0.998]

28 2 The Logistic Stick Breaking Conditional Multinomial Model The superset label learning problem seeks to train a classifier f : Rd → {1, · · · , L} on a given dataset (x, Y ) = {(xn , Yn )}N , where each instance xn ∈ Rd has a candidate label set Yn ⊂ n=1 {1, · · · , L}. [sent-66, score-1.079]

29 The only information is that n=1 the true label yn of instance xn is in the candidate set Yn . [sent-68, score-0.917]

30 The extra labels {l|l = yn , l ∈ Yn } causing ambiguity will be called the distractor labels. [sent-69, score-0.784]

31 For any test instance (xt , yt ) drawn from the same distribution as {(xn , yn )}N , the trained classifier f should be able to map xt to yt with high n=1 probability. [sent-70, score-0.521]

32 We require |Yn | < L for all n; that is, every candidate label set must provide at least some information about the true label of the instance. [sent-72, score-0.609]

33 1 The Model As stated in the introduction, the candidate label set is a noisy version of the true label. [sent-74, score-0.373]

34 The key to our approach is to write this L as p(Yn |xn ) = yn =1 p(Yn |yn )p(yn |xn ), where each term is the product of the underlying true classifier, p(yn |xn ), and the noise model p(Yn |yn ). [sent-76, score-0.417]

35 Assumption: All labels in the candidate label set Yn have the same probability of generating Yn , but no label outside of Yn can generate Yn p(Yn |yn = l) = λ(Yn ) if l ∈ Yn . [sent-79, score-0.698]

36 First, the set of labels Yn is conditionally independent of the input xn given yn . [sent-81, score-0.621]

37 That is, suppose that the most likely label for a particular input xn is yn = l. [sent-85, score-0.735]

38 Because p(yn |xn ) is a multinomial distribution, a different label yn = l might be assigned to xn by the labeling process. [sent-86, score-0.83]

39 Then this label is further corrupted by adding distractor labels to produce Yn . [sent-87, score-0.573]

40 Given (1), we can marginalize away yn in the following optimization problem maximizing the likelihood of observed candidate labels. [sent-91, score-0.504]

41 N f∗ = arg max f L p(yn |xn ; f )p(Yn |yn ) log n=1 yn =1 N N = f p(yn |xn ; f ) + log arg max n=1 yn ∈Yn log(λ(Yn )). [sent-92, score-0.768]

42 The mapping component maps each instance xn to a region zn , zn ∈ {1, . [sent-101, score-0.518]

43 Then the coding component draws a label yn from the multinomial distribution indexed by zn with parameter θzn . [sent-105, score-0.875]

44 We denote the region indexes of the training instances by z = (zn )N . [sent-106, score-0.171]

45 The input to the k-th logistic function is the dot product of xn and a learned weight vector wk ∈ Rd+1 . [sent-110, score-0.344]

46 ) To regularize these logistic functions, we posit that each wk is drawn from a Gaussian distribution Normal(0, Σ), where Σ = diag(∞, σ 2 , · · · , σ 2 ). [sent-112, score-0.25]

47 For each xn , a T sequence of probabilities {vnk }K is generated from logistic functions, where vnk = expit(wk xn ) k=1 and expit(u) = 1/(1 + exp(−u)) is the logistic function. [sent-114, score-0.484]

48 , vnK computed from xn , we choose the region zn according to a stick-breaking procedure: k−1 (1 − vni ). [sent-120, score-0.305]

49 In the coding component of LSB-CMM, we first draw K L-dimensional multinomial probabilities θ = {θk }K from the prior Dirichlet distribution with parameter α. [sent-124, score-0.168]

50 Then, for each instance xn k=1 with mixture zn , its label yn is drawn from the multinomial distribution with θzn . [sent-125, score-1.027]

51 However, in the SLL problem yn is not observed and Yn is generated from yn . [sent-127, score-0.768]

52 log  (9) yn ∈Yn n=1 This cannot be solved directly, so we apply variational EM [1]. [sent-130, score-0.411]

53 ˆ α k=1 n=1 Then we factorize q as K N ˆ ˆ q(z, y, θ|φ, α) = ˆ q(zn , yn |φn ) n=1 q(θk |ˆ k ), α (10) k=1 ˆ ˆ where φn is a K × L matrix and q(zn , yn |φn ) is a multinomial distribution in which p(zn = k, yn = ˆnkl . [sent-134, score-1.247]

54 This distribution is constrained by the candidate label set: if a label l ∈ Yn , then φnkl = 0 ˆ l) = φ / for any value of k. [sent-135, score-0.576]

55 4 ˆ If the instance xn wants to join region k (i. [sent-144, score-0.247]

56 , l φnkl is large), then it must be similar to wk as well as to instances in that region in order to make φnk large. [sent-146, score-0.301]

57 Simultaneously, its candidate labels must fit the “label flavor” of region k, where the “label flavor” means region k prefers labels having large values in αk . [sent-147, score-0.522]

58 The update of α in (12) can be interpreted as having each instance xn vote for ˆ ˆ ˆ the label l for region k with weight φnkl . [sent-148, score-0.483]

59 N 1 T T T ˆ ˆ max − wk Σ−1 wk + φnk log(expit(wk xn )) + ψnk log(1 − expit(wk xn )) , wk 2 n=1 (13) L K ˆ ˆ ˆ ˆ ˆ where φnk = l=1 φnkl and ψnk = j=k+1 φnj . [sent-153, score-0.674]

60 Intuitively, the variable φnk is the probaˆ bility that instance xn belongs to region k, and ψnk is the probability that xn belongs to region {k + 1, · · · , K}. [sent-154, score-0.483]

61 Therefore, the optimal wk discriminates instances in region k against instances in regions ≥ k. [sent-155, score-0.415]

62 3 Prediction For a test instance xt , we predict the label with maximum posterior probability. [sent-157, score-0.308]

63 The test instance can be mapped to a region with w, but the coding matrix θ is marginalized out in the EM. [sent-158, score-0.187]

64 Given a test α point xt , the prediction is the label l that maximizes the probability p(yt = l|xt , w, α) calculated as ˆ (14). [sent-160, score-0.263]

65 The test instance goes to region k with probability φtk , and its label is decided by the votes (ˆ k ) in that region. [sent-163, score-0.368]

66 4 Complexity Analysis and Practical Issues In the E step, for each region k, the algorithm iterates over all candidate labels of all instances, so the complexity is O(N KL). [sent-165, score-0.313]

67 6 q q q q q q qq q q q q q q q q qq q q q q qq q q q q qq q qq q qq q q qq q qq q q q q q q q q q qqq q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q q −1. [sent-191, score-6.013]

68 8 q q q q qq q q q q q q qq qq q q q qq q qq qq q q q q q q q q qq q q qq q q q q q qq q qq q q q q qq q q q q q qq q q q q qq q q q q qq q q q qq q qq q q q q q q q q qq q 0 −0. [sent-193, score-10.183]

69 8 q q q q q q qq q q q q q q q q qq q q q q qq q q q q qq q qq q qq q q qq q qq q q q q q q q q q qqq q q q q q q q q q q q q qq q q q q q q q qq q q q q q q q q q q q q q q q q q q q 1. [sent-200, score-6.013]

70 8 q q q q qq q q q q q q qq qq q q q qq q qq qq q q q q q q q q qq q q qq q q q q q qq q qq q q q q qq q q q q q qq q q q q qq q q q q qq q q q qq q qq q q q q q q q q qq q 0 q q q 0. [sent-203, score-10.183]

71 Second, we perform controlled experiments on three synthetic datasets to study the robustness of LSB-CMM with respect to the degree of ambiguity of the label sets. [sent-210, score-0.374]

72 When the data is standardized, the regularization parameter σ 2 = 1 generally gives good results, so σ 2 is set to 1 in all superset label tasks. [sent-217, score-0.559]

73 We assign a label to each cluster so that the problem is linearly-inseparable (see (2)). [sent-231, score-0.266]

74 In the second task, we add a distractor label for two thirds of all instances (gray data points in the figure). [sent-233, score-0.501]

75 The distractor label is randomly chosen from the two labels other than the true label. [sent-234, score-0.59]

76 After injecting distractor labels, LSB-CMM still recovers the boundaries between classes. [sent-237, score-0.226]

77 For each training instance, we add distractor labels with controlled probability. [sent-242, score-0.371]

78 As in [5], we use p, q, and ε to control the ambiguity level of 6 Figure 3: Three regions learned by the model on usps candidate label sets. [sent-243, score-0.504]

79 The roles and values of these three variables are as follows: p is the probability that an instance has distractor labels (p = 1 for all controlled experiments); q ∈ {1, 2, 3, 4} is the number of distractor labels; and ε ∈ {0. [sent-244, score-0.597]

80 95} is the maximum probability that a distractor label co-occurs with the true label [5], also called the ambiguity degree. [sent-248, score-0.783]

81 In the first setting, we hold q = 1 and vary ε, that is, for each label l, we choose a specific label l = l as the (unique) distractor label with probability ε or choose any other label with probability 1 − ε. [sent-250, score-1.2]

82 In the second setting, we vary q and pick distractor labels randomly for each candidate label set. [sent-252, score-0.718]

83 As the number of distractor labels increases, performance of both methods goes down, but not too much. [sent-255, score-0.321]

84 When the true label is combined with different distractor labels, the disambiguation is easy. [sent-256, score-0.468]

85 The small dataset (segment) suffers even more from large ambiguity degree, because there are fewer data points that can “break” the strong correlation between the true label and the distractors. [sent-259, score-0.368]

86 Recall that φnk is the probability that xn is sent to region k. [sent-261, score-0.202]

87 In each region k, the representative instances have large values of φnk . [sent-262, score-0.168]

88 We examined all φnk from the model trained on the usps dataset with 3 random distractor labels. [sent-263, score-0.256]

89 In order to analyze the performance of the classifier learned from data with either superset labels or fully observed labels, one traditional method is to compute the confusion matrix. [sent-271, score-0.462]

90 The labels of each recording are the bird species that were singing during that 10-second period, and these species become candidate labels set of each syllable in the recording. [sent-280, score-0.549]

91 The labels of all segmentations in an image are treated as candidate labels for each segmentation. [sent-283, score-0.368]

92 3) Lost dataset [5]: This dataset contains 1122 faces, and each face has the true label and a set of candidate labels. [sent-285, score-0.433]

93 3 SVM LSB−CMM, vary q LSB−CMM, vary ε CLPL, vary q CLPL, vary ε 0. [sent-297, score-0.228]

94 7 SVM LSB−CMM, vary q LSB−CMM, vary ε CLPL, vary q CLPL, vary ε 0. [sent-300, score-0.228]

95 3 SVM LSB−CMM, vary q LSB−CMM, vary ε CLPL, vary q CLPL, vary ε 0. [sent-310, score-0.228]

96 The dot-dash line is for different q values (number of distractor labels) as shown on the top x-axis. [sent-314, score-0.199]

97 Accuracies of the three superset label learning algorithms are compared using the paired t-test at the 95% confidence level. [sent-342, score-0.559]

98 4 Conclusions This paper introduced the Logistic Stick-Breaking Conditional Multinomial Model to address the superset label learning problem. [sent-348, score-0.559]

99 The mixture representation allows LSB-CMM to discover cluster structure that has predictive power for the superset labels in the training data. [sent-349, score-0.521]

100 Hence, if two labels co-occur, LSB-CMM is not forced to choose one of them to assign to the training example but instead can create a region that maps to both of them. [sent-350, score-0.243]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('qq', 0.599), ('yn', 0.384), ('superset', 0.323), ('label', 0.236), ('distractor', 0.199), ('clpl', 0.183), ('wk', 0.148), ('expit', 0.122), ('labels', 0.122), ('xn', 0.115), ('nkl', 0.108), ('candidate', 0.104), ('zn', 0.103), ('nk', 0.098), ('multinomial', 0.095), ('cmm', 0.092), ('lsb', 0.092), ('region', 0.087), ('logistic', 0.081), ('ambiguity', 0.079), ('vnk', 0.076), ('instances', 0.066), ('birdsong', 0.061), ('lsbp', 0.061), ('miml', 0.061), ('vary', 0.057), ('sim', 0.051), ('regions', 0.048), ('singing', 0.046), ('sll', 0.046), ('instance', 0.045), ('stick', 0.041), ('cour', 0.04), ('usps', 0.037), ('breaking', 0.035), ('dirichlet', 0.034), ('species', 0.034), ('coding', 0.033), ('lost', 0.033), ('true', 0.033), ('controlled', 0.032), ('bag', 0.032), ('em', 0.032), ('classi', 0.031), ('briggs', 0.031), ('syllable', 0.031), ('svm', 0.03), ('kl', 0.03), ('ambiguous', 0.03), ('cluster', 0.03), ('bird', 0.029), ('segment', 0.029), ('mixture', 0.028), ('xt', 0.027), ('boundaries', 0.027), ('degree', 0.027), ('ambiguously', 0.027), ('kv', 0.027), ('sapp', 0.027), ('variational', 0.027), ('recording', 0.027), ('mapping', 0.025), ('jie', 0.025), ('er', 0.024), ('materials', 0.024), ('component', 0.024), ('birds', 0.023), ('qqq', 0.023), ('pendigits', 0.022), ('yt', 0.022), ('marginalized', 0.022), ('drawn', 0.021), ('corvallis', 0.021), ('tk', 0.021), ('toy', 0.021), ('avor', 0.02), ('mult', 0.02), ('segmentations', 0.02), ('dataset', 0.02), ('face', 0.02), ('accuracies', 0.02), ('oregon', 0.019), ('training', 0.018), ('eecs', 0.017), ('nguyen', 0.017), ('conditional', 0.017), ('confusion', 0.017), ('insensitive', 0.017), ('belongs', 0.017), ('probabilities', 0.016), ('classes', 0.016), ('maps', 0.016), ('libsvm', 0.016), ('bernoulli', 0.016), ('corrupted', 0.016), ('rbf', 0.016), ('likelihood', 0.016), ('ordinary', 0.015), ('representative', 0.015), ('supervised', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

2 0.58297366 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

3 0.54677123 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1

4 0.25105974 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

5 0.10750083 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

Author: Ashish Kapoor, Raajay Viswanathan, Prateek Jain

Abstract: In this paper, we present a Bayesian framework for multilabel classiďŹ cation using compressed sensing. The key idea in compressed sensing for multilabel classiďŹ cation is to ďŹ rst project the label vector to a lower dimensional space using a random transformation and then learn regression functions over these projections. Our approach considers both of these components in a single probabilistic model, thereby jointly optimizing over compression as well as learning tasks. We then derive an efďŹ cient variational inference scheme that provides joint posterior distribution over all the unobserved labels. The two key beneďŹ ts of the model are that a) it can naturally handle datasets that have missing labels and b) it can also measure uncertainty in prediction. The uncertainty estimate provided by the model allows for active learning paradigms where an oracle provides information about labels that promise to be maximally informative for the prediction task. Our experiments show signiďŹ cant boost over prior methods in terms of prediction performance over benchmark datasets, both in the fully labeled and the missing labels case. Finally, we also highlight various useful active learning scenarios that are enabled by the probabilistic model. 1

6 0.10480824 142 nips-2012-Generalization Bounds for Domain Adaptation

7 0.092642158 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

8 0.091174357 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

9 0.090831511 280 nips-2012-Proper losses for learning from partial labels

10 0.079447486 200 nips-2012-Local Supervised Learning through Space Partitioning

11 0.079162009 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

12 0.073637091 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

13 0.072091609 60 nips-2012-Bayesian nonparametric models for ranked data

14 0.066370144 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

15 0.064665779 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

16 0.062831573 256 nips-2012-On the connections between saliency and tracking

17 0.060614955 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

18 0.056510381 180 nips-2012-Learning Mixtures of Tree Graphical Models

19 0.056223199 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

20 0.050614022 361 nips-2012-Volume Regularization for Binary Classification


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.143), (1, 0.039), (2, 0.007), (3, 0.008), (4, -0.013), (5, -0.044), (6, 0.736), (7, 0.016), (8, -0.036), (9, 0.001), (10, 0.047), (11, 0.002), (12, -0.012), (13, 0.031), (14, 0.037), (15, 0.014), (16, -0.063), (17, 0.008), (18, 0.008), (19, -0.004), (20, -0.006), (21, 0.03), (22, 0.033), (23, 0.041), (24, -0.001), (25, 0.018), (26, 0.036), (27, 0.015), (28, -0.046), (29, 0.008), (30, -0.008), (31, 0.017), (32, 0.009), (33, -0.038), (34, -0.006), (35, -0.037), (36, 0.022), (37, -0.003), (38, 0.007), (39, 0.003), (40, -0.007), (41, 0.026), (42, -0.012), (43, 0.012), (44, 0.009), (45, -0.008), (46, -0.003), (47, 0.009), (48, 0.025), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96374434 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1

2 0.95331997 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

same-paper 3 0.91919792 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

4 0.79219186 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

5 0.25770456 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

Author: Yao-nan Chen, Hsuan-tien Lin

Abstract: Label space dimension reduction (LSDR) is an efficient and effective paradigm for multi-label classification with many classes. Existing approaches to LSDR, such as compressive sensing and principal label space transformation, exploit only the label part of the dataset, but not the feature part. In this paper, we propose a novel approach to LSDR that considers both the label and the feature parts. The approach, called conditional principal label space transformation, is based on minimizing an upper bound of the popular Hamming loss. The minimization step of the approach can be carried out efficiently by a simple use of singular value decomposition. In addition, the approach can be extended to a kernelized version that allows the use of sophisticated feature combinations to assist LSDR. The experimental results verify that the proposed approach is more effective than existing ones to LSDR across many real-world datasets. 1

6 0.22268341 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

7 0.21786474 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

8 0.2016508 207 nips-2012-Mandatory Leaf Node Prediction in Hierarchical Multilabel Classification

9 0.19654454 127 nips-2012-Fast Bayesian Inference for Non-Conjugate Gaussian Process Regression

10 0.19595286 280 nips-2012-Proper losses for learning from partial labels

11 0.19545719 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification

12 0.18278143 361 nips-2012-Volume Regularization for Binary Classification

13 0.17699313 142 nips-2012-Generalization Bounds for Domain Adaptation

14 0.17562807 359 nips-2012-Variational Inference for Crowdsourcing

15 0.17524496 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models

16 0.17274471 200 nips-2012-Local Supervised Learning through Space Partitioning

17 0.16940935 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

18 0.16834174 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

19 0.16552667 37 nips-2012-Affine Independent Variational Inference

20 0.16211154 226 nips-2012-Multiclass Learning Approaches: A Theoretical Comparison with Implications


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.043), (21, 0.043), (38, 0.138), (42, 0.021), (54, 0.019), (55, 0.038), (74, 0.062), (76, 0.124), (80, 0.104), (92, 0.038), (95, 0.26)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80864632 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding

Author: Philip Sterne, Joerg Bornschein, Abdul-saboor Sheikh, Joerg Luecke, Jacquelyn A. Shelton

Abstract: Modelling natural images with sparse coding (SC) has faced two main challenges: flexibly representing varying pixel intensities and realistically representing lowlevel image components. This paper proposes a novel multiple-cause generative model of low-level image statistics that generalizes the standard SC model in two crucial points: (1) it uses a spike-and-slab prior distribution for a more realistic representation of component absence/intensity, and (2) the model uses the highly nonlinear combination rule of maximal causes analysis (MCA) instead of a linear combination. The major challenge is parameter optimization because a model with either (1) or (2) results in strongly multimodal posteriors. We show for the first time that a model combining both improvements can be trained efficiently while retaining the rich structure of the posteriors. We design an exact piecewise Gibbs sampling method and combine this with a variational method based on preselection of latent dimensions. This combined training scheme tackles both analytical and computational intractability and enables application of the model to a large number of observed and hidden dimensions. Applying the model to image patches we study the optimal encoding of images by simple cells in V1 and compare the model’s predictions with in vivo neural recordings. In contrast to standard SC, we find that the optimal prior favors asymmetric and bimodal activity of simple cells. Testing our model for consistency we find that the average posterior is approximately equal to the prior. Furthermore, we find that the model predicts a high percentage of globular receptive fields alongside Gabor-like fields. Similarly high percentages are observed in vivo. Our results thus argue in favor of improvements of the standard sparse coding model for simple cells by using flexible priors and nonlinear combinations. 1

same-paper 2 0.78048146 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

3 0.76433682 26 nips-2012-A nonparametric variable clustering model

Author: Konstantina Palla, Zoubin Ghahramani, David A. Knowles

Abstract: Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems. 1

4 0.75567037 260 nips-2012-Online Sum-Product Computation Over Trees

Author: Mark Herbster, Stephen Pasteris, Fabio Vitale

Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1

5 0.66242552 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of infinite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1

6 0.65691793 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

7 0.65534967 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

8 0.65350282 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

9 0.65274471 65 nips-2012-Cardinality Restricted Boltzmann Machines

10 0.65273064 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

11 0.65121996 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

12 0.6507532 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

13 0.64948881 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

14 0.64925623 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

15 0.64916879 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models

16 0.64882338 56 nips-2012-Bayesian active learning with localized priors for fast receptive field characterization

17 0.64860255 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

18 0.64845002 252 nips-2012-On Multilabel Classification and Ranking with Partial Feedback

19 0.64823234 168 nips-2012-Kernel Latent SVM for Visual Recognition

20 0.64716417 48 nips-2012-Augmented-SVM: Automatic space partitioning for combining multiple non-linear dynamics