nips nips2004 nips2004-8 knowledge-graph by maker-knowledge-mining

8 nips-2004-A Machine Learning Approach to Conjoint Analysis

Source: pdf

Author: Olivier Chapelle, Za\

Abstract: Choice-based conjoint analysis builds models of consumer preferences over products with answers gathered in questionnaires. Our main goal is to bring tools from the machine learning community to solve this problem more efﬁciently. Thus, we propose two algorithms to quickly and accurately estimate consumer preferences. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract Choice-based conjoint analysis builds models of consumer preferences over products with answers gathered in questionnaires. [sent-6, score-1.117]

2 Thus, we propose two algorithms to quickly and accurately estimate consumer preferences. [sent-8, score-0.331]

3 1 Introduction Conjoint analysis (also called trade-off analysis) is one of the most popular marketing research technique used to determine which features a new product should have, by conjointly measuring consumers trade-offs between discretized1 attributes. [sent-9, score-0.25]

4 In this paper, we will focus on the choice-based conjoint analysis (CBC) framework [11] since it is both widely used and realistic: at each question in the survey, the consumer is asked to choose one product from several. [sent-10, score-1.074]

5 The preferences of a consumer are modeled via a utility function representing how much a consumer likes a given product. [sent-11, score-0.713]

6 The utility u(x) of a product x is assumed to be the sum of the partial utilities (or partworths) for each attribute, i. [sent-12, score-0.204]

7 However, instead of observing pairs (xl , yl ), the training samples are of the form ({x1 , . [sent-15, score-0.057]

8 , xp }, yk ) k k th indicating that among the p products {x1 , . [sent-18, score-0.255]

9 Without noise, k k this is expressed mathematically by u(xyk ) ≥ u(xb ), ∀ b = yk . [sent-22, score-0.09]

10 k k Let us settle down the general framework of a regular conjoint analysis survey. [sent-23, score-0.575]

11 We have a population of n consumers available for the survey. [sent-24, score-0.125]

12 The survey consists of a questionnaire of q questions for each consumer, each asking to choose one product from a basket of p. [sent-25, score-0.357]

13 Each product proﬁle is described through a attributes with l1 , . [sent-26, score-0.116]

14 , la levels each, via a a vector of length m = s=1 ls , with 1 at positions of levels taken by each attribute and 0 elsewhere. [sent-29, score-0.147]

15 Marketing researchers are interested in estimating individual partworths in order to perform for instance a segmentation of the population afterwards. [sent-30, score-0.557]

16 But traditional conjoint estimation techniques are not reliable for this task since the number of parameters m to be estimated is usually larger than the number of answers q available for each consumer. [sent-31, score-0.753]

17 They estimate instead the partworths on the whole population (aggregated partworths). [sent-32, score-0.478]

18 if the discretized attribute is weight, the levels would be light/heavy. [sent-35, score-0.139]

19 We also address adaptive questionnaire design with active learning heuristics. [sent-37, score-0.169]

20 2 Hierarchical Bayes Analysis The main idea of HB2 is to estimate the individual utility functions under the constraint that their variance should not be too small. [sent-38, score-0.157]

21 By doing so, the estimation problem is not ill-posed and the lack of information for a consumer can be completed by the other ones. [sent-39, score-0.358]

22 This method aims at estimating the individual linear utility functions ui (x) = wi · x, for 1 ≤ i ≤ n. [sent-42, score-0.372]

23 The individual partworths wi are drawn from a Gaussian distribution with mean α (representing the aggregated partworths) and covariance Σ (encoding population’s heterogeneity), 2. [sent-44, score-0.847]

24 xp ), the probability that the consumer i chooses the product x∗ is given by P (x∗ |wi ) = 2. [sent-50, score-0.444]

25 exp(wi · xb ) p b=1 (1) Model estimation We describe now the standard way of estimating α, w ≡ (w1 , . [sent-52, score-0.131]

26 , wn ) and Σ based on Gibbs sampling and then propose a much faster algorithm that approximates the maximum a posteriori (MAP) solution. [sent-55, score-0.073]

27 Gibbs sampling As far as we know, all implementations of HB rely on a variant of the Gibbs sampling [11]. [sent-56, score-0.12]

28 Sampling for α and Σ is straightforward, whereas sampling from P (w|α, Σ, Y ) ∝ P (Y |w). [sent-58, score-0.076]

29 Approximate MAP solution So far HB implementations make predictions by evaluating (1) at the empirical mean of the samples, in contrast with the standard bayesian approach, which would average the rhs of (1) over the different samples, given samples w from the posterior. [sent-62, score-0.153]

30 In order to alleviate the computational issues associated with Gibbs sampling, we suggest to consider the maximum of the posterior distribution (maximum a posteriori, MAP) rather than its mean. [sent-63, score-0.062]

31 2 Technical papers of Sawtooth software [11], the world leading company for conjoint analysis softwares, provide very useful and extensive references. [sent-64, score-0.631]

32 Using the model (1), the ﬁrst term in the rhs of (3) is convex in w, but not the second term. [sent-68, score-0.074]

33 W (α, w) = (4) i=1 As in equation (3), this objective function is minimized with respect to α when α is equal to the empirical mean of the wi . [sent-71, score-0.215]

34 For a given α, minimize (4) with respect to each of the wi independently. [sent-73, score-0.24]

35 3 Conjoint Analysis with Support Vector Machines Similarly to what has recently been proposed in [3], we are now investigating the use of Support Vector Machines (SVM) [1, 12] to solve the conjoint estimation problem. [sent-82, score-0.601]

36 1 Soft margin formulation of conjoint estimation th Let us recall the learning problem. [sent-84, score-0.667]

37 At the k-th question, the consumer chooses the yk p yk 1 b product from the basket {xk , . [sent-85, score-0.678]

38 Our goal is to estimate the individual partworths w, with the individual utility function now being u(x) = w · x. [sent-89, score-0.647]

39 With a reordering of the products, we can actually suppose that yk = 1. [sent-90, score-0.09]

40 Then the above inequalities can be rewritten as a set of p − 1 constraints: w · (x1 − xb ) ≥ 0, 2 ≤ b ≤ p. [sent-91, score-0.077]

41 (5) shows that the conjoint estimation problem can be cast as a classiﬁcation problem in the product-proﬁles differences space. [sent-93, score-0.601]

42 4 which is consistent with the L2 -loss measuring deviations of wi -s from α. [sent-95, score-0.215]

43 2 Estimation of individual utilities It was proposed in [3] to train one SVM per consumer to get wi and to compute the n 1 individual partworths by regularizing with the aggregated partworths w = n i=1 wi : wi +w ∗ wi = 2 . [sent-98, score-2.298]

44 Instead, to estimate the individual utility partworths wi , we suggest the following optimization problem (the set Qi contains the indices j such that the consumer i was asked to choose between products x1 , . [sent-99, score-1.268]

45 , xp ) : k k C qi (x1 k 2 Minimize wi + subject to wi · − k∈Qi xb ) ≥ k p 2 b=2 ξkb + ˜ C j=i qj k∈Qi / p 2 b=2 ξkb 1 − ξkb , ∀k, ∀b ≥ 2 . [sent-102, score-0.631]

46 C Here the ratio C determines the trade-off between the individual scale and the aggregated ˜ C one. [sent-103, score-0.175]

47 5 For C = 1, the population is modeled as if it were homogeneous, i. [sent-104, score-0.067]

48 For C 1, the individual partworths are computed independently, without ˜ C taking into account aggregated partworths. [sent-107, score-0.586]

49 4 Related work Ordinal regression Very recently [2] explores the so-called ordinal regression task for ranking, and derive two techniques for hyperparameters learning and model selection in a hierarchical bayesian framework, Laplace approximation and Expectation Propagation respectively. [sent-108, score-0.321]

50 Ordinal regression is similar yet distinct from conjoint estimation since training data are supposed to be rankings or ratings in contrast with conjoint estimation where training data are choice-based. [sent-109, score-1.359]

51 Large margin classiﬁers Casting the preference problem in a classiﬁcation framework, leading to learning by convex optimization, was known for a long time in the psychometrics community. [sent-111, score-0.092]

52 [5] pioneered the use of large margin classiﬁers for ranking tasks. [sent-112, score-0.066]

53 [3] introduced the kernel methods machinery for conjoint analysis on the individual scale. [sent-113, score-0.677]

54 Very recently [10] proposes an alternate method for dealing with heterogeneity in conjoint analysis, which boils down to a very similar optimization to our HB-MAP approximation objective function, but with large margin regularization and with minimum deviation from the aggregated partworths. [sent-114, score-0.882]

55 Collaborative ﬁltering Collaborative ﬁltering exploits similarity between ratings across a population. [sent-115, score-0.064]

56 The goal is to predict a person’s rating on new products given the person’s past ratings on similar products and the ratings of other people on all the products. [sent-116, score-0.314]

57 Again collaborative is designed for overlapping training samples for each consumer, and usually rating/ranking training data, whereas conjoint estimation usually deals with different questionnaires for each consumer and choice-based training data. [sent-117, score-1.083]

58 5 ˜ C ≥ C In this way, directions for which the xj , j ∈ Qi contain information are estimated accurately, whereas the others directions are estimated thanks to the answers of the other consumers. [sent-118, score-0.247]

59 The simulated product proﬁles consist of 4 attributes, each of them being discretized through 4 levels. [sent-120, score-0.095]

60 For each question, the consumer was asked to choose one product from a basket of 4. [sent-122, score-0.586]

61 A population of 100 consumers was simulated, each of them having to answer 4 questions. [sent-123, score-0.125]

62 The 100 true consumer partworths were generated from a Gaussian distribution with mean (−β, −β/3, β/3, β) (for each attribute) and with a diagonal covariance matrix σ 2 I. [sent-125, score-0.761]

63 Each answer is a choice from the basket of products, sampled from the discrete logit-type distribution (1). [sent-126, score-0.126]

64 Hence when β (called the magnitude6 ) is large, the consumer will choose with high probability the product with the highest utility, whereas when β is small, the answers will be less reliable. [sent-127, score-0.551]

65 Finally, as in [3], the performances are computed using the mean of the L2 distances between the true and estimated individual partworths (also called RMSE). [sent-129, score-0.595]

66 Beforehand the partworths are translated such that the mean on each attribute is 0 and normalized to 1. [sent-130, score-0.488]

67 11 one-choice-based8 conjoint surveys datasets9 were used for real experiments below. [sent-133, score-0.571]

68 The number of attributes ranged from 3 to 6 (hence total number of levels from 13 to 28), the size of the baskets, to pick one product from at each question, ranged from 2 to 5, and the number of questions ranged from 6 to 15. [sent-134, score-0.413]

69 The numbers of respondents ranged roughly from 50 to 1200. [sent-135, score-0.141]

70 Since here we did not address the issue of no choice options in question answering, we removed10 questions where customers refused to choose a product from the basket and chose the no-choice-option as an answer11 . [sent-136, score-0.324]

71 Finally, as in [16], the performances are computed using the hit rate, i. [sent-137, score-0.102]

72 The average training time for HB-S was 19 minutes (with 12000 iterations as suggested in [11]), whereas our implementation based on the approximation of the MAP solution took in average only 1. [sent-142, score-0.09]

73 to alleviate the sampling phase complexity, was achieved since we got a speed-up factor of the order of 1000. [sent-146, score-0.08]

74 Indeed, as shown in both Table 1 and Table 2, the performances achieved by HB-MAP were surprisingly often as good as HB-S’s, and sometimes even a bit better. [sent-148, score-0.071]

75 8 We limited ourselves to datasets in which respondents were asked to choose 1 product among a basket at each question. [sent-153, score-0.378]

76 11 When this procedure boiled down to unreasonable number of questions for hold-out evaluation of our algorithms, we simply removed the corresponding individuals. [sent-156, score-0.058]

77 7 explained by the fact that assuming that the covariance matrix is quasi-diagonal is a reasonable approximation, and that the mode of the posterior distribution is actually roughly close to the mean, for the real datasets considered. [sent-157, score-0.123]

78 The so-called chapspan, span estimate of leave-one-out prediction error [17], was used to select a suitable value of C 13 , since it gave a quasi-convex estimation on the regularization path. [sent-169, score-0.054]

79 SV in Table 2, compared to the HB methods and logistic regression [3] are very satisfactory in case of artiﬁcial experiments. [sent-171, score-0.113]

80 SV to heterogeneity in the number of training samples for each consumer. [sent-176, score-0.162]

81 Table 1: Average RMSE between estimated and true individual partworths Mag L L H H Het L H L H HB-S 0. [sent-177, score-0.524]

82 67 Table 2: Hit rate performances on real datasets. [sent-194, score-0.095]

83 p in Datmp is the number of products respondents are asked to choose one from at each question. [sent-233, score-0.254]

84 12 since individual choice data are Immersed in the rest of the population choice data, via the optimization objective 13 ˜ ˜ We observed that the value of the constant C was irrelevant, and that only the ratio C/C mattered. [sent-234, score-0.146]

85 However, they are sub-optimal because they do not take into account the previous answers of the consumer. [sent-240, score-0.118]

86 Therefore adaptive conjoint analysis was proposed [11, 16] for adaptively designing questionnaires. [sent-241, score-0.613]

87 The adaptive design concept is often called active learning in machine learning, as the algorithm can actively select questions whose responses are likely to be informative. [sent-242, score-0.179]

88 Experiments We implemented this heuristic for conjoint analysis by selecting for each question a set of products whose estimated utilities are as close as possible15 . [sent-244, score-0.799]

89 To compare the different designs, we used the same artiﬁcial simulations as in section 5, but with 16 questions per consumer in order to fairly compare to the orthogonal design. [sent-245, score-0.389]

90 34 Results in Table 3 show that active learning produced an adaptive design which seems efﬁcient, especially in the case of high magnitude, i. [sent-259, score-0.144]

91 For large margin methods [10, 3] give a way to use the kernel trick in the space of product-proﬁles differences. [sent-266, score-0.089]

92 [9] approach would allow us to improve our approximate MAP solution by learning a variational approximation of a non-isotropic diagonal covariance matrix. [sent-268, score-0.113]

93 with a maximum likelihood type II17 (ML II) step, in contrast of sampling from the posterior, is known in the statistics community as bayesian multinomial logistic regression. [sent-271, score-0.138]

94 [18] use Laplace approximation to compute integration over hyperparameters for multi-class classiﬁcation, while [8] develop a variational approximation of the posterior distribution. [sent-272, score-0.181]

95 New insights on learning gaussian process regression in a HB framework have just been given in [13], where a method combining an EM algorithm and a generalized Nystr¨ m apo proximation of covariance matrix is proposed, and could be incorporated in the HB-MAP approximation above. [sent-273, score-0.123]

96 15 Since the bottom-line goal of the conjoint analysis is not really to estimate the partworths but to design the “optimal” product, adaptive design can also be helpful by focusing on products which have a high estimated utility. [sent-274, score-1.229]

97 16 Indeed noisy answers are neither informative nor reliable for selecting the next question. [sent-275, score-0.118]

98 17 aka evidence maximization or hyperparameters learning 8 Conclusion Choice-based conjoint analysis seems to be a very promising application ﬁeld for machine learning techniques. [sent-276, score-0.646]

99 Further research include fully bayesian HB methods, extensions to non-linear models as well as more elaborate and realistic active learning schemes. [sent-277, score-0.094]

100 The importance of utility balance in efﬁcient choice designs. [sent-321, score-0.078]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('conjoint', 0.547), ('partworths', 0.411), ('consumer', 0.304), ('wi', 0.215), ('hb', 0.186), ('basket', 0.126), ('cw', 0.125), ('answers', 0.118), ('kb', 0.115), ('heterogeneity', 0.105), ('sawtooth', 0.097), ('aggregated', 0.096), ('marketing', 0.096), ('products', 0.093), ('yk', 0.09), ('ordinal', 0.08), ('individual', 0.079), ('utility', 0.078), ('xb', 0.077), ('attribute', 0.077), ('respondents', 0.073), ('xp', 0.072), ('performances', 0.071), ('product', 0.068), ('ranged', 0.068), ('population', 0.067), ('margin', 0.066), ('ratings', 0.064), ('rmse', 0.063), ('xk', 0.061), ('consumers', 0.058), ('utilities', 0.058), ('questions', 0.058), ('asked', 0.055), ('estimation', 0.054), ('qi', 0.052), ('designs', 0.051), ('bayesian', 0.05), ('het', 0.048), ('mag', 0.048), ('questionnaire', 0.048), ('toubia', 0.048), ('rhs', 0.048), ('sampling', 0.048), ('hyperparameters', 0.048), ('attributes', 0.048), ('covariance', 0.046), ('gibbs', 0.045), ('active', 0.044), ('regression', 0.041), ('collaborative', 0.041), ('logistic', 0.04), ('design', 0.039), ('question', 0.039), ('adaptive', 0.038), ('map', 0.037), ('approximation', 0.036), ('levels', 0.035), ('estimated', 0.034), ('thanks', 0.033), ('les', 0.033), ('choose', 0.033), ('table', 0.033), ('alleviate', 0.032), ('boils', 0.032), ('satisfactory', 0.032), ('magnitude', 0.031), ('variational', 0.031), ('samples', 0.031), ('classi', 0.031), ('hit', 0.031), ('posterior', 0.03), ('company', 0.03), ('laplace', 0.028), ('planck', 0.028), ('arti', 0.028), ('analysis', 0.028), ('whereas', 0.028), ('cial', 0.027), ('accurately', 0.027), ('orthogonal', 0.027), ('discretized', 0.027), ('preferences', 0.027), ('svm', 0.027), ('training', 0.026), ('software', 0.026), ('cybernetics', 0.026), ('jaakkola', 0.026), ('convex', 0.026), ('minimize', 0.025), ('posteriori', 0.025), ('hierarchical', 0.025), ('implementations', 0.024), ('real', 0.024), ('survey', 0.024), ('seems', 0.023), ('modelling', 0.023), ('datasets', 0.023), ('kernel', 0.023), ('irrelevant', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 8 nips-2004-A Machine Learning Approach to Conjoint Analysis

Author: Olivier Chapelle, Za\

2 0.13550103 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

Author: Anton Schwaighofer, Volker Tresp, Kai Yu

Abstract: We present a novel method for learning with Gaussian process regression in a hierarchical Bayesian framework. In a ﬁrst step, kernel matrices on a ﬁxed set of input points are learned from data using a simple and efﬁcient EM algorithm. This step is nonparametric, in that it does not require a parametric form of covariance function. In a second step, kernel functions are ﬁtted to approximate the learned covariance matrix using a generalized Nystr¨ m method, which results in a complex, data o driven kernel. We evaluate our approach as a recommendation engine for art images, where the proposed hierarchical Bayesian method leads to excellent prediction performance. 1

3 0.13506615 195 nips-2004-Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM

Author: Juan Coz, Gustavo F. Bayón, Jorge Díez, Oscar Luaces, Antonio Bahamonde, Carlos Sañudo

Abstract: In this paper we show that it is possible to model sensory impressions of consumers about beef meat. This is not a straightforward task; the reason is that when we are aiming to induce a function that maps object descriptions into ratings, we must consider that consumers’ ratings are just a way to express their preferences about the products presented in the same testing session. Therefore, we had to use a special purpose SVM polynomial kernel. The training data set used collects the ratings of panels of experts and consumers; the meat was provided by 103 bovines of 7 Spanish breeds with different carcass weights and aging periods. Additionally, to gain insight into consumer preferences, we used feature subset selection tools. The result is that aging is the most important trait for improving consumers’ appreciation of beef meat. 1

4 0.090892226 200 nips-2004-Using Random Forests in the Structured Language Model

Author: Peng Xu, Frederick Jelinek

Abstract: In this paper, we explore the use of Random Forests (RFs) in the structured language model (SLM), which uses rich syntactic information in predicting the next word based on words already seen. The goal in this work is to construct RFs by randomly growing Decision Trees (DTs) using syntactic information and investigate the performance of the SLM modeled by the RFs in automatic speech recognition. RFs, which were originally developed as classiﬁers, are a combination of decision tree classiﬁers. Each tree is grown based on random training data sampled independently and with the same distribution for all trees in the forest, and a random selection of possible questions at each node of the decision tree. Our approach extends the original idea of RFs to deal with the data sparseness problem encountered in language modeling. RFs have been studied in the context of n-gram language modeling and have been shown to generalize well to unseen data. We show in this paper that RFs using syntactic information can also achieve better performance in both perplexity (PPL) and word error rate (WER) in a large vocabulary speech recognition system, compared to a baseline that uses Kneser-Ney smoothing. 1

5 0.069867983 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky

Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classiﬁcation accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper ﬁrst proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efﬁcient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classiﬁcation accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1

6 0.061818808 115 nips-2004-Maximum Margin Clustering

7 0.061095592 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

8 0.060662914 136 nips-2004-On Semi-Supervised Classification

9 0.055015858 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

10 0.05380287 173 nips-2004-Spike-timing Dependent Plasticity and Mutual Information Maximization for a Spiking Neuron Model

11 0.053372703 143 nips-2004-PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data

12 0.053156704 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning

13 0.053034108 34 nips-2004-Breaking SVM Complexity with Cross-Training

14 0.052160136 50 nips-2004-Dependent Gaussian Processes

15 0.051635869 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

16 0.051292975 164 nips-2004-Semi-supervised Learning by Entropy Minimization

17 0.050397892 100 nips-2004-Learning Preferences for Multiclass Problems

18 0.049372952 113 nips-2004-Maximum-Margin Matrix Factorization

19 0.048105925 27 nips-2004-Bayesian Regularization and Nonnegative Deconvolution for Time Delay Estimation

20 0.048082102 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.171), (1, 0.028), (2, -0.021), (3, 0.057), (4, -0.042), (5, 0.02), (6, 0.023), (7, 0.028), (8, 0.083), (9, -0.002), (10, 0.017), (11, 0.04), (12, 0.047), (13, 0.068), (14, 0.021), (15, 0.013), (16, -0.015), (17, -0.049), (18, 0.111), (19, -0.053), (20, 0.004), (21, -0.018), (22, -0.055), (23, -0.076), (24, 0.113), (25, -0.053), (26, 0.172), (27, 0.147), (28, 0.14), (29, -0.023), (30, -0.015), (31, 0.142), (32, 0.102), (33, 0.066), (34, -0.145), (35, -0.024), (36, -0.053), (37, -0.11), (38, -0.09), (39, 0.121), (40, 0.004), (41, 0.008), (42, 0.017), (43, -0.1), (44, 0.042), (45, -0.019), (46, 0.059), (47, 0.088), (48, 0.061), (49, 0.144)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92344636 8 nips-2004-A Machine Learning Approach to Conjoint Analysis

Author: Olivier Chapelle, Za\

2 0.77229738 195 nips-2004-Trait Selection for Assessing Beef Meat Quality Using Non-linear SVM

Author: Juan Coz, Gustavo F. Bayón, Jorge Díez, Oscar Luaces, Antonio Bahamonde, Carlos Sañudo

3 0.57511842 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

Author: Anton Schwaighofer, Volker Tresp, Kai Yu

4 0.47600704 100 nips-2004-Learning Preferences for Multiclass Problems

Author: Fabio Aiolli, Alessandro Sperduti

Abstract: Many interesting multiclass problems can be cast in the general framework of label ranking deﬁned on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results. 1

5 0.41684034 200 nips-2004-Using Random Forests in the Structured Language Model

Author: Peng Xu, Frederick Jelinek

6 0.39704993 38 nips-2004-Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms

7 0.39331019 50 nips-2004-Dependent Gaussian Processes

8 0.37622342 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification

9 0.37442178 190 nips-2004-The Rescorla-Wagner Algorithm and Maximum Likelihood Estimation of Causal Parameters

10 0.34776723 136 nips-2004-On Semi-Supervised Classification

11 0.34398124 156 nips-2004-Result Analysis of the NIPS 2003 Feature Selection Challenge

12 0.32938254 149 nips-2004-Probabilistic Inference of Alternative Splicing Events in Microarray Data

13 0.3287223 36 nips-2004-Class-size Independent Generalization Analsysis of Some Discriminative Multi-Category Classification

14 0.32760805 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

15 0.30963552 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

16 0.30143175 158 nips-2004-Sampling Methods for Unsupervised Learning

17 0.29371846 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

18 0.29334658 130 nips-2004-Newscast EM

19 0.29256123 43 nips-2004-Conditional Models of Identity Uncertainty with Application to Noun Coreference

20 0.29219982 15 nips-2004-Active Learning for Anomaly and Rare-Category Detection

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.107), (15, 0.162), (17, 0.012), (26, 0.052), (31, 0.04), (32, 0.013), (33, 0.179), (35, 0.021), (39, 0.027), (50, 0.034), (73, 0.252)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84496415 8 nips-2004-A Machine Learning Approach to Conjoint Analysis

Author: Olivier Chapelle, Za\

2 0.72930974 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classiﬁcation, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efﬁciently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

3 0.72868907 131 nips-2004-Non-Local Manifold Tangent Learning

Author: Yoshua Bengio, Martin Monperrus

Abstract: We claim and present arguments to the effect that a large class of manifold learning algorithms that are essentially local and can be framed as kernel learning algorithms will suffer from the curse of dimensionality, at the dimension of the true underlying manifold. This observation suggests to explore non-local manifold learning algorithms which attempt to discover shared structure in the tangent planes at different positions. A criterion for such an algorithm is proposed and experiments estimating a tangent plane prediction function are presented, showing its advantages with respect to local manifold learning algorithms: it is able to generalize very far from training data (on learning handwritten character image rotations), where a local non-parametric method fails. 1

4 0.72837162 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

Author: Ofer Dekel, Shai Shalev-shwartz, Yoram Singer

Abstract: Prediction sufﬁx trees (PST) provide a popular and effective tool for tasks such as compression, classiﬁcation, and language modeling. In this paper we take a decision theoretic view of PSTs for the task of sequence prediction. Generalizing the notion of margin to PSTs, we present an online PST learning algorithm and derive a loss bound for it. The depth of the PST generated by this algorithm scales linearly with the length of the input. We then describe a self-bounded enhancement of our learning algorithm which automatically grows a bounded-depth PST. We also prove an analogous mistake-bound for the self-bounded algorithm. The result is an efﬁcient algorithm that neither relies on a-priori assumptions on the shape or maximal depth of the target PST nor does it require any parameters. To our knowledge, this is the ﬁrst provably-correct PST learning algorithm which generates a bounded-depth PST while being competitive with any ﬁxed PST determined in hindsight. 1

5 0.72071338 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

Author: Koji Tsuda, Gunnar Rätsch, Manfred K. Warmuth

Abstract: We address the problem of learning a symmetric positive deﬁnite matrix. The central issue is to design parameter updates that preserve positive deﬁniteness. Our updates are motivated with the von Neumann divergence. Rather than treating the most general case, we focus on two key applications that exemplify our methods: On-line learning with a simple square loss and ﬁnding a symmetric positive deﬁnite matrix subject to symmetric linear constraints. The updates generalize the Exponentiated Gradient (EG) update and AdaBoost, respectively: the parameter is now a symmetric positive deﬁnite matrix of trace one instead of a probability vector (which in this context is a diagonal positive deﬁnite matrix with trace one). The generalized updates use matrix logarithms and exponentials to preserve positive deﬁniteness. Most importantly, we show how the analysis of each algorithm generalizes to the non-diagonal case. We apply both new algorithms, called the Matrix Exponentiated Gradient (MEG) update and DeﬁniteBoost, to learn a kernel matrix from distance measurements.

6 0.72021109 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

7 0.71987152 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

8 0.71917033 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

9 0.71912116 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

10 0.71858525 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

11 0.71734226 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

12 0.71727711 102 nips-2004-Learning first-order Markov models for control

13 0.71712983 127 nips-2004-Neighbourhood Components Analysis

14 0.71697515 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

15 0.71666646 70 nips-2004-Following Curved Regularized Optimization Solution Paths

16 0.71631742 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

17 0.71540505 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

18 0.71443826 163 nips-2004-Semi-parametric Exponential Family PCA

19 0.71433973 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

20 0.71378917 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters