nips nips2008 nips2008-138 knowledge-graph by maker-knowledge-mining

138 nips-2008-Modeling human function learning with Gaussian processes


Source: pdf

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Modeling human function learning with Gaussian processes Thomas L. [sent-1, score-0.397]

2 edu Abstract Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. [sent-7, score-0.883]

3 We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. [sent-8, score-0.323]

4 Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. [sent-9, score-0.566]

5 We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. [sent-10, score-0.436]

6 1 Introduction Much research on how people acquire knowledge focuses on discrete structures, such as the nature of categories or the existence of causal relationships. [sent-11, score-0.229]

7 However, our knowledge of the world also includes relationships between continuous variables, such as the difference between linear and exponential growth, or the form of causal relationships, such as how pressing the accelerator of a car influences its velocity. [sent-12, score-0.216]

8 Research on how people learn relationships between two continuous variables – known in the psychological literature as function learning – has tended to emphasize two different ways in which people could be solving this problem. [sent-13, score-0.758]

9 , [1, 2, 3]) suggests that people are learning an explicit function from a given class, such as the polynomials of degree k. [sent-16, score-0.387]

10 This approach attributes rich representations to human learners, but has traditionally given limited treatment to the question of how such representations could be acquired. [sent-17, score-0.234]

11 , [4, 5]) emphasizes the possibility that people learn by forming associations between observed values of input and output variables, and generalize based on the similarity of new inputs to old. [sent-20, score-0.51]

12 This approach has a clear account of the underlying learning mechanisms, but faces challenges in explaining how people generalize so broadly beyond their experience, making predictions about variable values that are significantly removed from their previous observations. [sent-21, score-0.374]

13 , [6, 7]), with explicit functions being represented, but associative learning. [sent-24, score-0.266]

14 Previous models of human function learning have been oriented towards understanding the psychological processes by which people solve this problem. [sent-25, score-0.721]

15 This rational analysis provides a way to understand the relationship between the two approaches that have dominated previous work – rules and similarity – and suggests how they might be combined. [sent-27, score-0.342]

16 The basic strategy we pursue is to consider the abstract computational problem involved in function learning, and then to explore optimal solutions to that problem with the goal of shedding light on human behavior. [sent-28, score-0.264]

17 In particular, the problem of learning a functional relationship between two continuous variables is an instance of regression, and has been extensively studied in machine learning and statistics. [sent-29, score-0.221]

18 There are a variety of solution to regression problems, but we focus on methods related to Bayesian linear regression (e. [sent-30, score-0.355]

19 , [9]), which allow us to make the expectations of learners about the form of functions explicit through a prior distribution. [sent-32, score-0.272]

20 Bayesian linear regression is also directly related to a nonparametric approach known as Gaussian process prediction (e. [sent-33, score-0.311]

21 , [10]), in which predictions about the values of an output variable are based on the similarity between values of an input variable. [sent-35, score-0.275]

22 We exploit this fact to define a rational model of human function learning that incorporates the strengths of both approaches. [sent-37, score-0.445]

23 2 Models of human function learning In this section we review the two traditional approaches to modeling human function learning – rules and similarity – and some more recent hybrid approaches that combine the two. [sent-38, score-0.87]

24 1 Representing functions with rules The idea that people might represent functions explicitly appears in one of the first papers on human function learning [1]. [sent-40, score-0.918]

25 This paper proposed that people assume a particular class of functions (such as polynomials of degree k) and use the available observations to estimate the parameters of those functions, forming a representation that goes beyond the observed values of the variables involved. [sent-41, score-0.52]

26 Consistent with this hypothesis, people learned linear and quadratic functions better than random pairings of values for two variables, and extrapolated appropriately. [sent-42, score-0.606]

27 Similar assumptions guided subsequent work exploring the ease with which people learn functions from different classes (e. [sent-43, score-0.42]

28 , [2], and papers have tested statistical regression schemes as potential models of learning, examining how well human responses were described by different forms of nonlinear regression (e. [sent-45, score-0.65]

29 The first model to implement this approach was the Associative-Learning Model (ALM; [4, 5]), in which input and output arrays are used to represent a range of values for the two variables between which the functional relationship holds. [sent-50, score-0.187]

30 Presentation of an input activates input nodes close to that value, with activation falling off as a Gaussian function of distance, explicitly implementing a theory of similarity in the input space. [sent-51, score-0.336]

31 Learned weights determine the activation of the output nodes, being a weighted linear function of the activation of the input nodes. [sent-52, score-0.279]

32 As a consequence, the same authors introduced the Extrapolation-Association Model (EXAM), which constructs a linear approximation to the output of the ALM when selecting responses, producing a bias towards linearity that better matches human judgments. [sent-55, score-0.356]

33 3 Hybrid approaches Several papers have explored methods for combining rule-like representations of functions with associative learning. [sent-57, score-0.269]

34 These models used the same kind of input representation as ALM and EXAM, with activation 2 of a set of nodes similar to the input value. [sent-59, score-0.262]

35 The values of the hidden nodes – corresponding to the values of the rules they instantiate – are combined linearly to obtain output predictions, with the weight of each hidden node being learned through gradient descent (with a penalty for the curvature of the functions involved). [sent-61, score-0.414]

36 As a consequence, the model can learn non-linear functions by identifying a series of local linear approximations, and can even model situations in which people seem to learn different functions in different parts of the input space. [sent-63, score-0.682]

37 3 Rational solutions to regression problems The models outlined in the previous section all aim to describe the psychological processes involved in human function learning. [sent-64, score-0.631]

38 In this section, we consider the abstract computational problem underlying this task, using optimal solutions to this problem to shed light on both previous models and human learning. [sent-65, score-0.281]

39 Viewed abstractly, the computational problem behind function learning is to learn a function f mapping from x to y from a set of real-valued observations xn = (x1 , . [sent-66, score-0.53]

40 , tn ), where ti is assumed to be the true value yi = f (xi ) obscured by additive noise. [sent-72, score-0.321]

41 1 Bayesian linear regression Ideally, we would seek to solve our regression problem by combining some prior beliefs about the probability of encountering different kinds of functions in the world with the information provided by x and t. [sent-77, score-0.537]

42 Predictions about the value of the function f for a new input xn+1 can be made by integrating over the posterior distribution, p(yn+1 |xn+1 , tn , xn ) = p(yn+1 |f, xn+1 )p(f |xn , tn ) df, (2) f where p(yn+1 |f, xn+1 ) is a delta function placing all of its mass on yn+1 = f (xn+1 ). [sent-80, score-1.126]

43 Performing the calculations outlined in the previous paragraph for a general hypothesis space F is challenging, but becomes straightforward if we limit the hypothesis space to certain specific classes of functions. [sent-81, score-0.178]

44 If we take F to be all linear functions of the form y = b0 + xb1 , then our problem takes the familiar form of linear regression. [sent-82, score-0.291]

45 To perform Bayesian linear regression, we need to define a prior p(f ) over all linear functions. [sent-83, score-0.247]

46 Since these functions are identified by the parameters b0 and b1 , it is sufficient to define a prior over b = (b0 , b1 ), which we can do by assuming that b follows a multivariate Gaussian distribution with mean zero and covariance Σb . [sent-84, score-0.233]

47 a matrix with a vector of ones horizontally concatenated with xn+1 ) Since yn+1 is simply a linear function of b, applying Equation 2 yields a Gaussian predictive distribution, with yn+1 having mean [1 xn+1 ]E[b|xn , tn ] and variance [1 xn+1 ]cov[b|xn , tn ][1 xn+1 ]T . [sent-87, score-0.761]

48 While considering only linear functions might seem overly restrictive, linear regression actually gives us the basic tools we need to solve this problem for more general classes of functions. [sent-89, score-0.424]

49 Many classes of functions can be described as linear combinations of a small set of basis functions. [sent-90, score-0.305]

50 For example, all kth degree polynomials are linear combinations of functions of the form 1 (the constant function), x, x2 , . [sent-91, score-0.285]

51 , φ(k) denote a set of functions, we can define a prior on the class of functions that are linear combinations of this basis by expressing such functions in the form f (x) = b0 + φ(1) (x)b1 + . [sent-98, score-0.526]

52 2 Gaussian processes If our goal were merely to predict yn+1 from xn+1 , yn , and xn , we might consider a different approach, simply defining a joint distribution on yn+1 given xn+1 and conditioning on yn . [sent-113, score-0.803]

53 For example, we might take the yn+1 to be jointly Gaussian, with covariance matrix Kn kn,n+1 (5) Kn+1 = kT kn+1 n,n+1 where Kn depends on the values of xn , kn,n+1 depends on xn and xn+1 , and kn+1 depends only −1 on xn+1 . [sent-114, score-0.79]

54 If we condition on yn , the distribution of yn+1 is Gaussian with mean kT n,n+1 Kn y −1 and variance kn+1 − kT n,n+1 Kn kn,n+1 . [sent-115, score-0.177]

55 This approach 2 can also be extended to allow us to predict yn+1 from xn+1 , tn , and xn by adding σt In to Kn , where In is the n × n identity matrix, to take into account the additional variance associated with tn . [sent-117, score-0.995]

56 Common kinds of kernels include radial basis functions, e. [sent-120, score-0.248]

57 , 1 2 K(xi , xj ) = θ1 exp(− 2 (xi − xj )2 ) (6) θ2 with values of y for which values of x are close being correlated, and periodic functions, e. [sent-122, score-0.228]

58 , 2π 2 2 K(xi , xj ) = θ3 exp(θ4 (cos( [xi − xj ]))) (7) θ5 indicating that values of y for which values of x are close relative to the period θ3 are likely to be highly correlated. [sent-124, score-0.198]

59 Gaussian processes thus provide a flexible approach to prediction, with the kernel defining which values of x are likely to have similar values of y. [sent-125, score-0.256]

60 3 Two views of regression Bayesian linear regression and Gaussian processes appear to be quite different approaches. [sent-127, score-0.505]

61 Showing that Bayesian linear regression corresponds to Gaussian process prediction is straightforward. [sent-130, score-0.311]

62 Bayesian linear regression thus corresponds to prediction using Gaussian pron+1 cesses, with this covariance matrix playing the role of Kn+1 above (ie. [sent-133, score-0.309]

63 using the kernel function K(xi , xj ) = [1 xi ][1 xj ]T ). [sent-134, score-0.316]

64 Using a richer set of basis functions corresponds to taking Kn+1 = Φn+1 Σb ΦT (ie. [sent-135, score-0.186]

65 n+1 4 It is also possible to show that Gaussian process prediction can always be interpreted as Bayesian linear regression, albeit with potentially infinitely many basis functions. [sent-143, score-0.251]

66 Just as we can express a covariance matrix in terms of its eigenvectors and eigenvalues, we can express a given kernel K(xi , xj ) in terms of its eigenfunctions φ and eigenvalues λ, with ∞ λk φ(k) (xi )φ(k) (xj ) K(xi , xj ) = (8) k=1 for any xi and xj . [sent-144, score-0.438]

67 Using the results from the previous paragraph, any kernel can be viewed as the result of performing Bayesian linear regression with a set of basis functions corresponding to its eigenfunctions, and a prior with covariance matrix Σb = diag(λ). [sent-145, score-0.657]

68 These results establish an important duality between Bayesian linear regression and Gaussian processes: for every prior on functions, there exists a corresponding kernel, and for every kernel, there exists a corresponding prior on functions. [sent-146, score-0.36]

69 Bayesian linear regression and prediction with Gaussian processes are thus just two views of the same solution to regression problems. [sent-147, score-0.541]

70 4 Combining rules and similarity through Gaussian processes The results outlined in the previous section suggest that learning rules and generalizing based on similarity should not be viewed as conflicting accounts of human function learning. [sent-148, score-0.952]

71 In this section, we briefly highlight how previous accounts of function learning connect to statistical models, and then use this insight to define a model that combines the strengths of both approaches. [sent-149, score-0.225]

72 1 Reinterpreting previous accounts of human function learning The models presented above were chosen because the contrast between rules and similarity in function learning is analogous to the difference between Bayesian linear regression and Gaussian processes. [sent-151, score-0.899]

73 The idea that human function learning can be viewed as a kind of statistical regression [1, 3] clearly connects directly to Bayesian linear regression. [sent-152, score-0.601]

74 While there is no direct formal correspondence, the basic ideas behind Gaussian process regression with a radial basis kernel and similarity-based models such as ALM are closely related. [sent-153, score-0.487]

75 Gaussian processes with radial-basis kernels can thus be viewed as implementing a simple kind of similarity-based generalization, predicting similar y values for stimuli with similar x values. [sent-155, score-0.333]

76 Finally, the hybrid approach to rule learning taken in [6] is also closely related to Bayesian linear regression. [sent-156, score-0.179]

77 The rules represented by the hidden units serve as a basis set that specify a class of functions, and applying penalized gradient descent on the weights assigned to those basis elements serves as an online algorithm for finding the function with highest posterior probability [12]. [sent-157, score-0.369]

78 2 Mixing functions in a Gaussian process model The relationship between Gaussian processes and Bayesian linear regression suggests that we can define a single model that exploits both similarity and rules in forming predictions. [sent-159, score-0.766]

79 In particular, we can do this by taking a prior that covers a broad class of functions – including those consistent with a radial basis kernel – or, equivalently, modeling y as being produced by a Gaussian process with a kernel corresponding to one of a small number of types. [sent-160, score-0.583]

80 As indicated earlier, there is a large literature consisting of both models and data concerning human function learning, and these simulations are intended to demonstrate the potential of the Gaussian process model rather than to provide an exhaustive test of its performance. [sent-173, score-0.364]

81 1 Difficulty of learning A necessary criterion for a theory of human function learning is accounting for which functions people learn readily and which they find difficult – the relative difficulty of learning various functions. [sent-175, score-0.766]

82 Each entry in the table is the mean absolute deviation (MAD) of human or model responses from the actual value of the function, evaluated over the stimuli presented in training. [sent-177, score-0.272]

83 The MAD provides a measure of how difficult it is for people or a given model to learn a function. [sent-178, score-0.278]

84 The seven GP models incorporated different kernel functions by adjusting their prior probability. [sent-181, score-0.372]

85 2 Six other GP models were examined by assigning certain kernel functions zero prior probability and re-normalizing the modified value of π so that the prior probabilities summed to one. [sent-187, score-0.392]

86 The last two rows of Table 1 give the correlations between human and model performance across functions, expressing quantitatively how well each model captured the pattern of human function learning behavior. [sent-189, score-0.618]

87 The GP models perform well according to this metric, providing a closer match to the human data than any of the models considered in [6], with the quadratic kernel and the models with a mixture of kernels tending to provide a closer match to human behavior. [sent-190, score-0.933]

88 This capacity is assessed in the way in which people extrapolate, making judgments about stimuli they have not encountered before. [sent-193, score-0.309]

89 Figure 1 shows mean human predictions for a linear, exponential, and quadratic function (from [4]), together with the predictions of the most comprehensive GP model (with Linear, Quadratic and Nonlinear kernel functions). [sent-194, score-0.68]

90 Columns give the mean absolute deviation (MAD) from the true functions for human learners and different models (Gaussian process models with multiple kernels are denoted by the initials of their kernels, e. [sent-398, score-0.634]

91 The last two rows give the linear and rank-order correlations of the human and model MAD values, providing an indication of how well the model matches the difficulty people have in learning different functions. [sent-402, score-0.633]

92 (a)-(b) Mean predictions on linear, exponential, and quadratic functions for (a) human participants (from [4]) and (b) a Gaussian process model with Linear, Quadratic, and Nonlinear kernels. [sent-412, score-0.617]

93 Training data were presented in the region between the vertical lines, and extrapolation performance was evaluated outside this region. [sent-413, score-0.236]

94 Both people and the model extrapolate near optimally on the linear function, and reasonably accurate extrapolation also occurs for the exponential and quadratic function. [sent-417, score-0.787]

95 However, there is a bias towards a linear slope in the extrapolation of the exponential and quadratic functions, with extreme values of the quadratic and exponential function being overestimated. [sent-418, score-0.77]

96 Quantitative measures of extrapolation performance are shown in Figure 1 (c), which gives the correlation between human and model predictions for EXAM [4, 5] and the seven GP models. [sent-419, score-0.594]

97 While none of the GP models produce quite as high a correlation as EXAM on all three functions, all of the models except that with just the linear kernel produce respectable correlations. [sent-420, score-0.277]

98 Our Gaussian process model combines the strengths of both approaches, using a mixture of kernels to allow systematic extrapolation as well as sensitive non-linear interpolation. [sent-423, score-0.459]

99 Tests of the performance of this model on benchmark datasets show that it can capture some of the basic phenomena of human function learning, and is competitive with existing process models. [sent-424, score-0.317]

100 Prediction with Gaussian processes: From linear regression to linear prediction and beyond. [sent-495, score-0.347]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xn', 0.353), ('tn', 0.321), ('extrapolation', 0.236), ('human', 0.234), ('people', 0.229), ('mad', 0.21), ('kn', 0.19), ('yn', 0.177), ('alm', 0.157), ('exam', 0.157), ('quadratic', 0.142), ('regression', 0.133), ('rules', 0.121), ('associative', 0.115), ('gaussian', 0.114), ('functions', 0.113), ('byun', 0.105), ('gp', 0.1), ('processes', 0.096), ('similarity', 0.094), ('kernel', 0.094), ('rational', 0.092), ('linear', 0.089), ('kernels', 0.088), ('radial', 0.087), ('busemeyer', 0.079), ('delosh', 0.079), ('expt', 0.079), ('kalish', 0.079), ('bayesian', 0.078), ('predictions', 0.075), ('basis', 0.073), ('prior', 0.069), ('xj', 0.066), ('nonlinear', 0.062), ('culty', 0.062), ('activation', 0.06), ('xi', 0.06), ('paragraph', 0.059), ('views', 0.054), ('hybrid', 0.053), ('process', 0.053), ('polynomials', 0.053), ('mcdaniel', 0.052), ('learners', 0.052), ('strengths', 0.052), ('covariance', 0.051), ('functional', 0.05), ('exponential', 0.049), ('learn', 0.049), ('seven', 0.049), ('psychological', 0.048), ('accounts', 0.047), ('models', 0.047), ('lafayette', 0.046), ('lqr', 0.046), ('quad', 0.046), ('relationships', 0.045), ('correlations', 0.044), ('outlined', 0.043), ('kind', 0.043), ('extrapolate', 0.042), ('judgments', 0.042), ('hidden', 0.041), ('papers', 0.041), ('kt', 0.041), ('input', 0.04), ('expressing', 0.039), ('stimuli', 0.038), ('explicit', 0.038), ('hypothesis', 0.038), ('lq', 0.037), ('learning', 0.037), ('prediction', 0.036), ('lr', 0.035), ('eigenfunctions', 0.035), ('viewed', 0.035), ('relationship', 0.035), ('continuous', 0.033), ('values', 0.033), ('associations', 0.033), ('linearity', 0.033), ('df', 0.033), ('nodes', 0.032), ('forming', 0.032), ('cov', 0.032), ('observations', 0.031), ('psychology', 0.031), ('posterior', 0.031), ('drawing', 0.031), ('function', 0.03), ('combinations', 0.03), ('dif', 0.03), ('combines', 0.03), ('comprehensive', 0.03), ('periodic', 0.03), ('variables', 0.029), ('tended', 0.029), ('guided', 0.029), ('connect', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1

2 0.19214158 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1

3 0.1493154 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

Author: Pierre Garrigues, Laurent E. Ghaoui

Abstract: It has been shown that the problem of 1 -penalized least-square regression commonly referred to as the Lasso or Basis Pursuit DeNoising leads to solutions that are sparse and therefore achieves model selection. We propose in this paper RecLasso, an algorithm to solve the Lasso with online (sequential) observations. We introduce an optimization problem that allows us to compute an homotopy from the current solution to the solution after observing a new data point. We compare our method to Lars and Coordinate Descent, and present an application to compressive sensing with sequential observations. Our approach can easily be extended to compute an homotopy from the current solution to the solution that corresponds to removing a data point, which leads to an efficient algorithm for leave-one-out cross-validation. We also propose an algorithm to automatically update the regularization parameter after observing a new data point. 1

4 0.12634395 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

Author: Thomas L. Griffiths, Joseph L. Austerweil

Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1

5 0.12013741 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

Author: Jing Xu, Thomas L. Griffiths

Abstract: Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases. 1

6 0.11816907 249 nips-2008-Variational Mixture of Gaussian Process Experts

7 0.1086015 75 nips-2008-Estimating vector fields using sparse basis field expansions

8 0.10745534 107 nips-2008-Influence of graph construction on graph-based clustering measures

9 0.10733917 101 nips-2008-Human Active Learning

10 0.10689114 32 nips-2008-Bayesian Kernel Shaping for Learning Control

11 0.1046053 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

12 0.092013575 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

13 0.085711531 185 nips-2008-Privacy-preserving logistic regression

14 0.085402943 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

15 0.083653226 216 nips-2008-Sparse probabilistic projections

16 0.082365423 170 nips-2008-Online Optimization in X-Armed Bandits

17 0.079903029 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

18 0.079656348 10 nips-2008-A rational model of preference learning and choice prediction by children

19 0.079482868 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

20 0.078501672 231 nips-2008-Temporal Dynamics of Cognitive Control


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.261), (1, 0.005), (2, 0.044), (3, 0.061), (4, 0.12), (5, -0.08), (6, -0.053), (7, 0.154), (8, 0.072), (9, 0.093), (10, 0.101), (11, -0.025), (12, 0.046), (13, -0.068), (14, 0.135), (15, 0.052), (16, -0.028), (17, -0.013), (18, 0.008), (19, 0.073), (20, -0.093), (21, -0.138), (22, 0.183), (23, 0.114), (24, 0.04), (25, -0.021), (26, 0.09), (27, -0.073), (28, -0.019), (29, -0.065), (30, -0.009), (31, 0.006), (32, -0.081), (33, 0.1), (34, -0.071), (35, 0.014), (36, -0.056), (37, -0.097), (38, -0.053), (39, -0.034), (40, -0.213), (41, -0.057), (42, -0.042), (43, 0.043), (44, 0.13), (45, 0.09), (46, -0.042), (47, 0.016), (48, 0.056), (49, 0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96712363 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1

2 0.71934009 249 nips-2008-Variational Mixture of Gaussian Process Experts

Author: Chao Yuan, Claus Neubauer

Abstract: Mixture of Gaussian processes models extended a single Gaussian process with ability of modeling multi-modal data and reduction of training complexity. Previous inference algorithms for these models are mostly based on Gibbs sampling, which can be very slow, particularly for large-scale data sets. We present a new generative mixture of experts model. Each expert is still a Gaussian process but is reformulated by a linear model. This breaks the dependency among training outputs and enables us to use a much faster variational Bayesian algorithm for training. Our gating network is more flexible than previous generative approaches as inputs for each expert are modeled by a Gaussian mixture model. The number of experts and number of Gaussian components for an expert are inferred automatically. A variety of tests show the advantages of our method. 1

3 0.71890587 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction

Author: Jing Xu, Thomas L. Griffiths

Abstract: Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases. 1

4 0.62948811 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes

Author: Ben Calderhead, Mark Girolami, Neil D. Lawrence

Abstract: Identification and comparison of nonlinear dynamical system models using noisy and sparse experimental data is a vital task in many fields, however current methods are computationally expensive and prone to error due in part to the nonlinear nature of the likelihood surfaces induced. We present an accelerated sampling procedure which enables Bayesian inference of parameters in nonlinear ordinary and delay differential equations via the novel use of Gaussian processes (GP). Our method involves GP regression over time-series data, and the resulting derivative and time delay estimates make parameter inference possible without solving the dynamical system explicitly, resulting in dramatic savings of computational time. We demonstrate the speed and statistical accuracy of our approach using examples of both ordinary and delay differential equations, and provide a comprehensive comparison with current state of the art methods. 1

5 0.60559547 32 nips-2008-Bayesian Kernel Shaping for Learning Control

Author: Jo-anne Ting, Mrinal Kalakrishnan, Sethu Vijayakumar, Stefan Schaal

Abstract: In kernel-based regression learning, optimizing each kernel individually is useful when the data density, curvature of regression surfaces (or decision boundaries) or magnitude of output noise varies spatially. Previous work has suggested gradient descent techniques or complex statistical hypothesis methods for local kernel shaping, typically requiring some amount of manual tuning of meta parameters. We introduce a Bayesian formulation of nonparametric regression that, with the help of variational approximations, results in an EM-like algorithm for simultaneous estimation of regression and kernel parameters. The algorithm is computationally efficient, requires no sampling, automatically rejects outliers and has only one prior to be specified. It can be used for nonparametric regression with local polynomials or as a novel method to achieve nonstationary regression with Gaussian processes. Our methods are particularly useful for learning control, where reliable estimation of local tangent planes is essential for adaptive controllers and reinforcement learning. We evaluate our methods on several synthetic data sets and on an actual robot which learns a task-level control law. 1

6 0.60289973 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

7 0.58085757 101 nips-2008-Human Active Learning

8 0.55135632 233 nips-2008-The Gaussian Process Density Sampler

9 0.54834658 213 nips-2008-Sparse Convolved Gaussian Processes for Multi-output Regression

10 0.5301652 185 nips-2008-Privacy-preserving logistic regression

11 0.51996946 146 nips-2008-Multi-task Gaussian Process Learning of Robot Inverse Dynamics

12 0.50348341 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

13 0.50219095 111 nips-2008-Kernel Change-point Analysis

14 0.49658036 75 nips-2008-Estimating vector fields using sparse basis field expansions

15 0.47442672 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

16 0.45794603 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference

17 0.45348772 216 nips-2008-Sparse probabilistic projections

18 0.44558012 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data

19 0.43442801 211 nips-2008-Simple Local Models for Complex Dynamical Systems

20 0.43227601 10 nips-2008-A rational model of preference learning and choice prediction by children


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.011), (6, 0.062), (7, 0.132), (12, 0.056), (15, 0.013), (28, 0.186), (57, 0.113), (59, 0.015), (63, 0.028), (77, 0.059), (78, 0.014), (83, 0.071), (87, 0.162)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89657921 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees

Author: Alexandre Bouchard-côté, Dan Klein, Michael I. Jordan

Abstract: Accurate and efficient inference in evolutionary trees is a central problem in computational biology. While classical treatments have made unrealistic site independence assumptions, ignoring insertions and deletions, realistic approaches require tracking insertions and deletions along the phylogenetic tree—a challenging and unsolved computational problem. We propose a new ancestry resampling procedure for inference in evolutionary trees. We evaluate our method in two problem domains—multiple sequence alignment and reconstruction of ancestral sequences—and show substantial improvement over the current state of the art. 1

same-paper 2 0.8684538 138 nips-2008-Modeling human function learning with Gaussian processes

Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish

Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1

3 0.83300143 200 nips-2008-Robust Kernel Principal Component Analysis

Author: Minh H. Nguyen, Fernando Torre

Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1

4 0.83226275 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations

Author: Yen-yu Lin, Tyng-luh Liu, Chiou-shann Fuh

Abstract: In solving complex visual learning tasks, adopting multiple descriptors to more precisely characterize the data has been a feasible way for improving performance. These representations are typically high dimensional and assume diverse forms. Thus finding a way to transform them into a unified space of lower dimension generally facilitates the underlying tasks, such as object recognition or clustering. We describe an approach that incorporates multiple kernel learning with dimensionality reduction (MKL-DR). While the proposed framework is flexible in simultaneously tackling data in various feature representations, the formulation itself is general in that it is established upon graph embedding. It follows that any dimensionality reduction techniques explainable by graph embedding can be generalized by our method to consider data in multiple feature representations.

5 0.83151013 66 nips-2008-Dynamic visual attention: searching for coding length increments

Author: Xiaodi Hou, Liqing Zhang

Abstract: A visual attention system should respond placidly when common stimuli are presented, while at the same time keep alert to anomalous visual inputs. In this paper, a dynamic visual attention model based on the rarity of features is proposed. We introduce the Incremental Coding Length (ICL) to measure the perspective entropy gain of each feature. The objective of our model is to maximize the entropy of the sampled visual features. In order to optimize energy consumption, the limit amount of energy of the system is re-distributed amongst features according to their Incremental Coding Length. By selecting features with large coding length increments, the computational system can achieve attention selectivity in both static and dynamic scenes. We demonstrate that the proposed model achieves superior accuracy in comparison to mainstream approaches in static saliency map generation. Moreover, we also show that our model captures several less-reported dynamic visual search behaviors, such as attentional swing and inhibition of return. 1

6 0.82917017 62 nips-2008-Differentiable Sparse Coding

7 0.82775974 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables

8 0.82756066 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization

9 0.82565713 194 nips-2008-Regularized Learning with Networks of Features

10 0.82550025 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

11 0.82456517 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification

12 0.82302189 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

13 0.82242256 118 nips-2008-Learning Transformational Invariants from Natural Movies

14 0.82238084 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation

15 0.82209057 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

16 0.82181001 4 nips-2008-A Scalable Hierarchical Distributed Language Model

17 0.82158571 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC

18 0.81636715 248 nips-2008-Using matrices to model symbolic relationship

19 0.81625724 219 nips-2008-Spectral Hashing

20 0.81545788 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks