nips nips2000 nips2000-16 knowledge-graph by maker-knowledge-mining

16 nips-2000-Active Inference in Concept Learning

Source: pdf

Author: Jonathan D. Nelson, Javier R. Movellan

Abstract: People are active experimenters, not just passive observers, constantly seeking new information relevant to their goals. A reasonable approach to active information gathering is to ask questions and conduct experiments that maximize the expected information gain, given current beliefs (Fedorov 1972, MacKay 1992, Oaksford & Chater 1994). In this paper we present results on an exploratory experiment designed to study people's active information gathering behavior on a concept learning task (Tenenbaum 2000). The results of the experiment are analyzed in terms of the expected information gain of the questions asked by subjects. In scientific inquiry and in everyday life, people seek out information relevant to perceptual and cognitive tasks. Scientists perform experiments to uncover causal relationships; people saccade to informative areas of visual scenes, turn their head towards surprising sounds, and ask questions to understand the meaning of concepts . Consider a person learning a foreign language, who notices that a particular word,

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Active inference in concept learning Jonathan D. [sent-1, score-0.313]

2 A reasonable approach to active information gathering is to ask questions and conduct experiments that maximize the expected information gain, given current beliefs (Fedorov 1972, MacKay 1992, Oaksford & Chater 1994). [sent-8, score-0.617]

3 In this paper we present results on an exploratory experiment designed to study people's active information gathering behavior on a concept learning task (Tenenbaum 2000). [sent-9, score-0.577]

4 The results of the experiment are analyzed in terms of the expected information gain of the questions asked by subjects. [sent-10, score-0.561]

5 Scientists perform experiments to uncover causal relationships; people saccade to informative areas of visual scenes, turn their head towards surprising sounds, and ask questions to understand the meaning of concepts . [sent-12, score-0.537]

6 Consider a person learning a foreign language, who notices that a particular word, "tikos," is used for baby moose, baby penguins, and baby cheetahs. [sent-13, score-0.276]

7 For instance, tikos could mean baby animals, or simply animals, or even baby animals and antique telephones. [sent-16, score-0.324]

8 Suppose you can point to a baby duck, an adult duck, or an antique telephone, to inquire whether that object is "tikos. [sent-18, score-0.158]

9 When the goal is to learn as much as possible about a set of concepts, a reasonable strategy is to choose those questions which maximize the expected information gain, given current beliefs (Fedorov 1972, MacKay 1992, Oaksford & Chater 1994). [sent-22, score-0.522]

10 In this paper we present preliminary results on an experiment designed to quantify the information value of the questions asked by subjects on a concept learning task. [sent-23, score-1.245]

11 1 Tenenbaum's number concept task Tenenbaum (2000) developed a Bayesian model of number concept learning. [sent-25, score-0.722]

12 The model describes the intuitive beliefs shared by humans about simple number concepts, and how those beliefs change as new information is obtained, in terms of subjective probabilities. [sent-26, score-0.393]

13 Suppose a subject has been told that the number 16 is consistent with some unknown number concept. [sent-27, score-0.199]

14 With its current parameters, the model predicts that the subjective probability that the number 8 will also be consistent with that concept is about 0. [sent-28, score-0.594]

15 Tenenbaum (2000) included both mathematical and interval concepts in his number concept space. [sent-30, score-0.549]

16 Interval concepts were sets of numbers between nand m, where 1 ::; n ::; 100, and n ::; m ::; 100, such as numbers between 5 and 8, and numbers between 10 and 35. [sent-31, score-0.997]

17 There were 33 mathematical concepts: odd numbers, even numbers, square numbers, cube numbers, prime numbers, multiples of n (3 ::; n ::; 12), powers of n (2 ::; n ::; 10), and numbers ending in n (1 ::; n ::; 9). [sent-32, score-0.47]

18 Tenenbaum conducted a number concept learning experiment with 8 subjects and found a correlation of 0. [sent-33, score-0.827]

19 99 between the average probability judgments made by subjects and the model predictions. [sent-34, score-0.482]

20 Based on these results we decided to extend Tenenbaum's experiment, and allow subjects to actively ask questions about number concepts, instead of just observing examples given to them. [sent-38, score-0.841]

21 We used Tenenbaum's model to obtain estimates of the subjective probabilities of the different concepts given the examples at hand. [sent-39, score-0.402]

22 We hypothesized that the questions asked by subjects would have high information value, when information value was calculated according to the probability estimates produced by Tenenbaum's model. [sent-40, score-0.99]

23 A subject is given examples of numbers that are consistent with a particular concept, but is not told the concept itself. [sent-43, score-0.825]

24 Then the subject is allowed to pick a number, to test whether it follows the same concept as the examples given. [sent-44, score-0.494]

25 For example, the subject may be given the numbers 2, 6 and 4 as examples of the underlying concept and she may then choose to ask whether the number 8 is also consistent with the concept. [sent-45, score-0.947]

26 The random variable C represents the correct concept on a given trial. [sent-48, score-0.415]

27 We represent the examples given to the subjects by the random vector X. [sent-50, score-0.529]

28 The subject beliefs about which concepts are probable prior to the presentation of any examples is represented by the probability function p(e = c). [sent-51, score-0.375]

29 The subject beliefs after the examples are presented is represented by p(e = c I X = x). [sent-52, score-0.217]

30 For example, if c is the concept even numbers and x the numbers "2, 6, 4", then p(e = c I X = x) represents subjects' posterior probability that the correct concept is even numbers, given that 2, 6, and 4 are positive examples of that concept. [sent-53, score-1.429]

31 The binary random variable Y n represents whether the number n is a member of the correct concept. [sent-54, score-0.166]

32 For example, Y 8 =1 represents the event that 8 is an element of the correct concept, and Y 8= 0 the event that 8 is not. [sent-55, score-0.17]

33 In our experiment subjects are allowed to ask a question of the form "is the number n an element of the concept ? [sent-56, score-1.016]

34 We evaluate how good a question is in terms of the information about the correct concept expected for that question, given the example vector X=x. [sent-59, score-0.57]

35 The expected information gain for the question "Is the number n an element of the concept? [sent-60, score-0.251]

36 " so the maximum information value of any question in our experiment is one bit. [sent-63, score-0.203]

37 Here we approximate these subjective probabilities using Tenenbaum's (2000) number concept model. [sent-65, score-0.488]

38 An information-maximizing strategy (infomax) prescribes asking the question with the highest expected information gain, e. [sent-66, score-0.302]

39 Another strategy of interest is confirmatory sampling, which consists of asking questions whose answers are most likely to confirm current beliefs. [sent-69, score-0.618]

40 In other domains it has been proposed that subjects have a bias to use confirmatory strategies regardless of their information value (Klayman & Ha 1987, Popper 1959, Wason 1960). [sent-70, score-0.724]

41 Thus, it is interesting to see whether people use a confirmatory strategy on our concept learning task and to evaluate how informative such a strategy would be. [sent-71, score-0.968]

42 2 Human sampling in the number concept game Twenty-nine undergraduate students, recruited from Cognitive Science Department classes at the University of California, San Diego, participated in the experiment. [sent-72, score-0.406]

43 These examples will be randomly chosen from the numbers between 1 and 100 that follow the rule. [sent-77, score-0.379]

44 Finally, you will be asked to give your best estimation of what the true hidden rule is, and the chances that you are right. [sent-80, score-0.207]

45 If you thought the rule were " multiples of 1 I ", but also possibly "even numbers ", you could test a number of your choice, between 1-100, to see if it also follows the rule. [sent-82, score-0.465]

46 html On each trial subjects first saw a set of examples from the correct concept. [sent-86, score-0.693]

47 For instance, if the concept were even numbers, subjects might see the numbers "2, 6, 4" as examples . [sent-87, score-1.11]

48 Subjects were given feedback on whether the number they tested was an element of the correct concept. [sent-89, score-0.197]

49 We wrote a computer program that uses the probability estimates provided by Tenenbaum (2000) model to compute the information value of any possible question in the number concept task. [sent-90, score-0.585]

50 We used this program to evaluate the information value of the questions asked by subjects, the questions asked by an infomax strategy, the questions asked by a confirmatory strategy, and the questions asked by a random sampling strategy. [sent-91, score-2.12]

51 The random strategy consisted of randomly testing a number between 1 and 100 with equal probability. [sent-93, score-0.223]

52 The confirmatory strategy consisted of testing the number (excluding the examples) that had the highest posterior probability, as given by Tenenbaum's model, of being consistent with the correct concept. [sent-94, score-0.571]

53 The trials are grouped into three types, according to the posterior beliefs of Tenenbaum's model, after the example numbers have been seen. [sent-96, score-0.572]

54 The specific questions subjects asked are considered in Sections 3. [sent-98, score-0.794]

55 Information value, as assessed using the subjective probabilities in Tenenbaum's number concept model, of several sampling strategies. [sent-138, score-0.548]

56 1 Single example, high uncertainty trials On these trials Tenenbaum's model is relatively uncertain about the correct concepts and gives some probability to many concepts. [sent-141, score-0.627]

57 Interestingly, the confirmatory strategy is identical to the infomax strategy on each of these trials, suggesting that a confirmatory sampling strategy may be optimal under conditions of high uncertainty. [sent-142, score-1.105]

58 On this trial, the concepts powers of 4, powers of 2, and square numbers each have moderate posterior probability (. [sent-144, score-0.565]

59 These trials provided the best qualitative agreement between infomax predictions and subjects' sampling behavior. [sent-148, score-0.407]

60 Unfortunately the results are inconclusive since on these trials both infomax and confirmatory strategy make the same predictions. [sent-149, score-0.685]

61 On the trial with the example number 16, subjects' modal response (8 of 29 subjects) was to test the number 4. [sent-150, score-0.238]

62 Several subjects (8 of 29) tested other square numbers, such as 49, 36, or 25, which also have high information value, relative to Tenenbaum's number concept model (Figure 1). [sent-152, score-0.924]

63 Subjects' questions also had a high information value on the trial with the example number 37, and the trial with the example number 60. [sent-153, score-0.673]

64 Information value of sampling each number, in bits, given that the number 16 is consistent with the correct concept. [sent-163, score-0.244]

65 2 Multiple example, low uncertainty trials On these trials Tenenbaum's model gives a single concept very high posterior probability. [sent-165, score-0.741]

66 When there is little or no information value in any question, infomax makes no particular predictions regarding which questions are best. [sent-166, score-0.483]

67 Most subjects tested numbers that were consistent with the most likely concept, but not specifically given as examples. [sent-167, score-0.809]

68 On the trial with the examples 81, 25,4, and 36, the model gave probability 1. [sent-169, score-0.326]

69 On this trial, the most commonly tested numbers were 49 (11 of 29 subjects) and 9 (4 of 29 subjects). [sent-171, score-0.332]

70 On the trial with the example numbers 60, 80, 10, and 30, the model gave probability . [sent-173, score-0.552]

71 94 to the concept multiples of 10, and probability . [sent-174, score-0.444]

72 On this trial, infomax tested odd multiples of 5, such as 15, each of which had expected information gain of 0. [sent-176, score-0.464]

73 The confirmatory strategy tested non-example multiples of 10, such as 50, and had an information value of Obits. [sent-178, score-0.57]

74 Most subjects (17 of 29) followed the confirmatory strategy; some subjects (5 of 29) followed the infomax strategy. [sent-179, score-1.247]

75 3 Interval trials It is desirable to consider situations in which: (1) the questions asked by the infomax strategy are different than the questions asked by the confirmatory strategy, and (2) the choice of questions matters, because some questions have high information value. [sent-181, score-1.941]

76 Trials in which the correct concept is an interval of numbers provide such situations. [sent-182, score-0.744]

77 Consider the trial with the example numbers 16, 23, 19, and 20. [sent-183, score-0.44]

78 On this trial, and the other "interval" trials, the concept learning model is certain that the correct concept is of the form numbers between m and n, because the observed examples rule out all the other concepts. [sent-184, score-1.153]

79 However, the model is not certain of the precise endpoints, such as whether the concept is numbers between 16 and 23, or numbers between 16 and 24, etc. [sent-185, score-0.973]

80 Infomax tests numbers near to, but outside of, the range spanned by the examples, such as 14 or 26, in this example (See Figure 2 at left). [sent-186, score-0.424]

81 The first is to test numbers outside of, but near to, the range of observed examples. [sent-189, score-0.375]

82 On the trial with example numbers between 16 and 23, a total of 15 of 29 subjects tested numbers between 10-15, or 24-30. [sent-190, score-1.19]

83 The second pattern of behavior, which is shown by about one third of the subjects, consists of testing (non-example) numbers within the range spanned by the observed examples. [sent-192, score-0.425]

84 If one is certain that the concept at hand is an interval then asking about numbers within the range spanned by the observed examples provides no information (Figure 2 at left). [sent-193, score-0.978]

85 Based on this surprising result, we went back to the results of Experiment 1, and reanalyzed the accuracy of Tenenbaum's model on trials in which the model gave high probability to interval concepts. [sent-195, score-0.394]

86 We found that on such trials the model significantly deviated from the subjects' beliefs. [sent-196, score-0.178]

87 In particular, subjects gave probability lower than one that non-example numbers within the range spanned by observed examples were consistent with the true concept. [sent-197, score-1.051]

88 The model, however, gives all numbers within the range spanned by the examples probability 1. [sent-198, score-0.507]

89 See Figure 2, at right, and note the difference between subjective probabilities (points) and the model' s estimate of these probabilities (solid line). [sent-199, score-0.178]

90 We hypothesize that the apparent uninformativeness of the questions asked by subjects on these trials is due to imperfections in the current version of Tenenbaum's model, and are working to improve the model's descriptive accuracy, to test this hypothesis. [sent-200, score-1.019]

91 Information value, relative to Tenenbaum's model, of sampling each number, given the example numbers 16, 23, 19, and 20 (left). [sent-203, score-0.374]

92 In this case the model is certain that the correct concept is some interval of numbers; thus, it is not informative to ask questions about numbers within the range spanned by that examples. [sent-204, score-1.233]

93 At right, the probability that each number is consistent with the correct concept. [sent-205, score-0.187]

94 First we performed a large scale replication of Tenenbaum's number concept experiment (Tenenbaum, 2000), in which subjects estimated the probability that each of several test numbers were consistent with the same concept as some example numbers . [sent-209, score-1.861]

95 We then extended Tenenbaum's experiment by allowing subjects to ask questions about the concepts at hand. [sent-212, score-0.907]

96 Our goal was to evaluate the information value of the questions asked by subjects. [sent-213, score-0.481]

97 We found that in some situations, a simple confirmatory strategy maximizes information gain. [sent-214, score-0.401]

98 We also found that the current version of Tenenbaum's number concept model has significant imperfections, which limit its ability to estimate the informativeness of subjects' questions. [sent-215, score-0.395]

99 We expect that modifications to Tenenbaum's model will enable info max to predict sampling behavior in the number concept domain. [sent-216, score-0.467]

100 We are also working to generalize the infomax analysis of active inference to more complex and natural problems. [sent-218, score-0.229]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tenenbaum', 0.459), ('subjects', 0.418), ('concept', 0.313), ('numbers', 0.291), ('confirmatory', 0.231), ('questions', 0.209), ('infomax', 0.18), ('asked', 0.167), ('trials', 0.148), ('strategy', 0.126), ('trial', 0.126), ('concepts', 0.124), ('subjective', 0.106), ('multiples', 0.097), ('ask', 0.093), ('baby', 0.092), ('beliefs', 0.09), ('cix', 0.089), ('examples', 0.088), ('interval', 0.079), ('chater', 0.071), ('oaksford', 0.071), ('tikos', 0.071), ('spanned', 0.068), ('question', 0.065), ('people', 0.065), ('experiment', 0.063), ('correct', 0.061), ('sampling', 0.06), ('consistent', 0.059), ('uncertainty', 0.057), ('fedorov', 0.053), ('active', 0.049), ('gave', 0.048), ('informative', 0.046), ('gain', 0.044), ('information', 0.044), ('tested', 0.041), ('subject', 0.039), ('powers', 0.038), ('probabilities', 0.036), ('antique', 0.035), ('confirmation', 0.035), ('duck', 0.035), ('gathering', 0.035), ('imperfections', 0.035), ('klayman', 0.035), ('popper', 0.035), ('told', 0.035), ('wason', 0.035), ('diego', 0.034), ('animals', 0.034), ('expected', 0.034), ('probability', 0.034), ('number', 0.033), ('asking', 0.033), ('element', 0.031), ('whether', 0.031), ('behavior', 0.031), ('value', 0.031), ('evaluate', 0.03), ('event', 0.03), ('model', 0.03), ('credit', 0.028), ('jolla', 0.028), ('rational', 0.028), ('mackay', 0.027), ('range', 0.026), ('ha', 0.026), ('high', 0.025), ('movellan', 0.024), ('odd', 0.024), ('exploratory', 0.024), ('ve', 0.024), ('california', 0.023), ('example', 0.023), ('random', 0.023), ('test', 0.023), ('cognitive', 0.022), ('psychological', 0.022), ('testing', 0.021), ('rule', 0.021), ('consisted', 0.02), ('square', 0.02), ('posterior', 0.02), ('observed', 0.019), ('current', 0.019), ('hidden', 0.019), ('predictions', 0.019), ('anderson', 0.018), ('san', 0.018), ('estimates', 0.018), ('study', 0.018), ('represents', 0.018), ('yn', 0.017), ('certain', 0.017), ('program', 0.017), ('scientific', 0.017), ('situations', 0.017), ('outside', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999881 16 nips-2000-Active Inference in Concept Learning

Author: Jonathan D. Nelson, Javier R. Movellan

2 0.08217866 103 nips-2000-Probabilistic Semantic Video Indexing

Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang

Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1

3 0.071894176 127 nips-2000-Structure Learning in Human Causal Induction

Author: Joshua B. Tenenbaum, Thomas L. Griffiths

Abstract: We use graphical models to explore the question of how people learn simple causal relationships from data. The two leading psychological theories can both be seen as estimating the parameters of a fixed graph. We argue that a complete account of causal induction should also consider how people learn the underlying causal graph structure, and we propose to model this inductive process as a Bayesian inference. Our argument is supported through the discussion of three data sets.

4 0.065508261 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

Author: Simon Tong, Daphne Koller

Abstract: Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.

5 0.055793867 43 nips-2000-Dopamine Bonuses

Author: Sham Kakade, Peter Dayan

Abstract: Substantial data support a temporal difference (TO) model of dopamine (OA) neuron activity in which the cells provide a global error signal for reinforcement learning. However, in certain circumstances, OA activity seems anomalous under the TO model, responding to non-rewarding stimuli. We address these anomalies by suggesting that OA cells multiplex information about reward bonuses, including Sutton's exploration bonuses and Ng et al's non-distorting shaping bonuses. We interpret this additional role for OA in terms of the unconditional attentional and psychomotor effects of dopamine, having the computational role of guiding exploration. 1

6 0.049433198 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

7 0.049069934 49 nips-2000-Explaining Away in Weight Space

8 0.048606668 92 nips-2000-Occam's Razor

9 0.046118516 37 nips-2000-Convergence of Large Margin Separable Linear Classification

10 0.04604787 61 nips-2000-Generalizable Singular Value Decomposition for Ill-posed Datasets

11 0.042619646 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

12 0.040659562 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications

13 0.037020788 130 nips-2000-Text Classification using String Kernels

14 0.036802512 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors

15 0.035371073 74 nips-2000-Kernel Expansions with Unlabeled Examples

16 0.034675371 129 nips-2000-Temporally Dependent Plasticity: An Information Theoretic Account

17 0.032186996 75 nips-2000-Large Scale Bayes Point Machines

18 0.031469144 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

19 0.031353768 88 nips-2000-Multiple Timescales of Adaptation in a Neural Code

20 0.031173598 52 nips-2000-Fast Training of Support Vector Classifiers

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.118), (1, -0.025), (2, 0.018), (3, -0.014), (4, -0.013), (5, 0.006), (6, 0.059), (7, 0.025), (8, 0.004), (9, 0.007), (10, 0.035), (11, 0.028), (12, 0.064), (13, 0.033), (14, 0.083), (15, 0.005), (16, -0.094), (17, -0.02), (18, -0.049), (19, 0.096), (20, 0.08), (21, 0.073), (22, 0.006), (23, -0.178), (24, 0.13), (25, -0.015), (26, 0.122), (27, 0.018), (28, 0.002), (29, 0.01), (30, 0.074), (31, -0.007), (32, -0.282), (33, 0.005), (34, 0.062), (35, 0.187), (36, 0.164), (37, 0.14), (38, -0.159), (39, -0.125), (40, 0.165), (41, 0.145), (42, -0.181), (43, 0.031), (44, 0.004), (45, 0.041), (46, 0.004), (47, 0.058), (48, -0.272), (49, -0.141)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97831774 16 nips-2000-Active Inference in Concept Learning

Author: Jonathan D. Nelson, Javier R. Movellan

2 0.60370559 127 nips-2000-Structure Learning in Human Causal Induction

Author: Joshua B. Tenenbaum, Thomas L. Griffiths

3 0.37380013 103 nips-2000-Probabilistic Semantic Video Indexing

Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang

4 0.32786062 61 nips-2000-Generalizable Singular Value Decomposition for Ill-posed Datasets

Author: Ulrik Kjems, Lars Kai Hansen, Stephen C. Strother

Abstract: We demonstrate that statistical analysis of ill-posed data sets is subject to a bias, which can be observed when projecting independent test set examples onto a basis defined by the training examples. Because the training examples in an ill-posed data set do not fully span the signal space the observed training set variances in each basis vector will be too high compared to the average variance of the test set projections onto the same basis vectors. On basis of this understanding we introduce the Generalizable Singular Value Decomposition (GenSVD) as a means to reduce this bias by re-estimation of the singular values obtained in a conventional Singular Value Decomposition, allowing for a generalization performance increase of a subsequent statistical model. We demonstrate that the algorithm succesfully corrects bias in a data set from a functional PET activation study of the human brain. 1 Ill-posed Data Sets An ill-posed data set has more dimensions in each example than there are examples. Such data sets occur in many fields of research typically in connection with image measurements. The associated statistical problem is that of extracting structure from the observed high-dimensional vectors in the presence of noise. The statistical analysis can be done either supervised (Le. modelling with target values: classification, regresssion) or unsupervised (modelling with no target values: clustering, PCA, ICA). In both types of analysis the ill-posedness may lead to immediate problems if one tries to apply conventional statistical methods of analysis, for example the empirical covariance matrix is prohibitively large and will be rank-deficient. A common approach is to use Singular Value Decomposition (SVD) or the analogue Principal Component Analysis (PCA) to reduce the dimensionality of the data. Let the N observed i-dimensional samples Xj, j = L .N, collected in the data matrix X = [Xl ... XN] of size I x N, I> N . The SVD-theorem states that such a matrix can be decomposed as (1) where U is a matrix of the same size as X with orthogonal basis vectors spanning the space of X, so that UTU = INxN. The square matrix A contains the singular values in the diagonal, A = diag( AI, ... , >w), which are ordered and positive Al ~ A2 ~ ... ~ AN ~ 0, and V is N x N and orthogonal V TV = IN. If there is a mean value significantly different from zero it may at times be advantageous to perform the above analysis on mean-subtracted data, i.e. X - X = U A V T where columns of X all contain the mean vector x = Lj xj/N. Each observation Xj can be expressed in coordinates in the basis defined by the vectors of U with no loss of information[Lautrup et al., 1995]. A change of basis is obtained by qj = U T Xj as the orthogonal basis rotation Q = [ql ... qN] = U T X = UTUAV T = AVT . (2) Since Q is only N x Nand N « I, Q is a compact representation of the data. Having now N examples of N dimension we have reduced the problem to a marginally illposed one. To further reduce the dimensionality, it is common to retain only a subset of the coordinates, e.g. the top P coordinates (P < N) and the supervised or unsupervised model can be formed in this smaller but now well-posed space. So far we have considered the procedure for modelling from a training set. Our hope is that the statistical description generalizes well to new examples proving that is is a good description of the generating process. The model should, in other words, be able to perform well on a new example, x*, and in the above framework this would mean the predictions based on q* = U T x* should generalize well. We will show in the following, that in general, the distribution of the test set projection q* is quite different from the statistics of the projections of the training examples qj. It has been noted in previous work [Hansen and Larsen, 1996, Roweis, 1998, Hansen et al., 1999] that PCA/SVD of ill-posed data does not by itself represent a probabilistic model where we can assign a likelihood to a new test data point, and procedures have been proposed which make this possible. In [Bishop, 1999] PCA has been considered in a Bayesian framework, but does not address the significant bias of the variance in training set projections in ill-posed data sets. In [Jackson, 1991] an asymptotic expression is given for the bias of eigen-values in a sample covariance matrix, but this expression is valid only in the well-posed case and is not applicable for ill-posed data. 1.1 Example Let the signal source be I-dimensional multivariate Gaussian distribution N(O,~) with a covariance matrix where the first K eigen-values equal u 2 and the last 1- K are zero, so that the covariance matrix has the decomposition ~=u2YDyT, D=diag(1, ... ,1,0, ... ,0), yTY=I (3) Our N samples of the distribution are collected in the matrix X = [Xij] with the SVD (4) A = diag(Al, ... , AN) and the representation ofthe N examples in the N basis vector coordinates defined by U is Q = [%] = U T X = A V T. The total variance per training example is ~ LX;j ~Tr(XTX) = ~Tr(VAUTUAVT) = ~Tr(VA2VT) i,j = ~ Tr(VVT A2) = ~ Tr(A2) = ~L A; i (5) Note that this variance is the same in the U-basis coordinates: 1 L...J 2 N '

5 0.27842209 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

Author: Simon Tong, Daphne Koller

6 0.26966685 49 nips-2000-Explaining Away in Weight Space

7 0.24148193 43 nips-2000-Dopamine Bonuses

8 0.22348189 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

9 0.19563209 92 nips-2000-Occam's Razor

10 0.19322333 37 nips-2000-Convergence of Large Margin Separable Linear Classification

11 0.18576071 30 nips-2000-Bayesian Video Shot Segmentation

12 0.18113472 25 nips-2000-Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators

13 0.17494181 109 nips-2000-Redundancy and Dimensionality Reduction in Sparse-Distributed Representations of Natural Objects in Terms of Their Local Features

14 0.17337768 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

15 0.17048049 76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors

16 0.16281837 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals

17 0.15893726 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

18 0.15563303 74 nips-2000-Kernel Expansions with Unlabeled Examples

19 0.15054069 22 nips-2000-Algorithms for Non-negative Matrix Factorization

20 0.14250387 82 nips-2000-Learning and Tracking Cyclic Human Motion

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.013), (17, 0.081), (32, 0.021), (33, 0.043), (36, 0.011), (55, 0.029), (62, 0.052), (65, 0.011), (67, 0.043), (76, 0.022), (79, 0.037), (81, 0.031), (90, 0.049), (99, 0.416)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.83477002 25 nips-2000-Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators

Author: Toshiyuki Tanaka

Abstract: We analyze the bit error probability of multiuser demodulators for directsequence binary phase-shift-keying (DSIBPSK) CDMA channel with additive gaussian noise. The problem of multiuser demodulation is cast into the finite-temperature decoding problem, and replica analysis is applied to evaluate the performance of the resulting MPM (Marginal Posterior Mode) demodulators, which include the optimal demodulator and the MAP demodulator as special cases. An approximate implementation of demodulators is proposed using analog-valued Hopfield model as a naive mean-field approximation to the MPM demodulators, and its performance is also evaluated by the replica analysis. Results of the performance evaluation shows effectiveness of the optimal demodulator and the mean-field demodulator compared with the conventional one, especially in the cases of small information bit rate and low noise level. 1

same-paper 2 0.8165831 16 nips-2000-Active Inference in Concept Learning

Author: Jonathan D. Nelson, Javier R. Movellan

3 0.71096689 93 nips-2000-On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

Author: Eiji Mizutani, James Demmel

Abstract: This paper describes a method of dogleg trust-region steps, or restricted Levenberg-Marquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative algorithm for solving the linearized Gauss-Newton normal equation, whereas the outer nonlinear algorithm repeatedly takes so-called

4 0.29695877 43 nips-2000-Dopamine Bonuses

Author: Sham Kakade, Peter Dayan

5 0.29341644 131 nips-2000-The Early Word Catches the Weights

Author: Mark A. Smith, Garrison W. Cottrell, Karen L. Anderson

Abstract: The strong correlation between the frequency of words and their naming latency has been well documented. However, as early as 1973, the Age of Acquisition (AoA) of a word was alleged to be the actual variable of interest, but these studies seem to have been ignored in most of the literature. Recently, there has been a resurgence of interest in AoA. While some studies have shown that frequency has no effect when AoA is controlled for, more recent studies have found independent contributions of frequency and AoA. Connectionist models have repeatedly shown strong effects of frequency, but little attention has been paid to whether they can also show AoA effects. Indeed, several researchers have explicitly claimed that they cannot show AoA effects. In this work, we explore these claims using a simple feed forward neural network. We find a significant contribution of AoA to naming latency, as well as conditions under which frequency provides an independent contribution. 1 Background Naming latency is the time between the presentation of a picture or written word and the beginning of the correct utterance of that word. It is undisputed that there are significant differences in the naming latency of many words, even when controlling word length, syllabic complexity, and other structural variants. The cause of differences in naming latency has been the subject of numerous studies. Earlier studies found that the frequency with which a word appears in spoken English is the best determinant of its naming latency (Oldfield & Wingfield, 1965). More recent psychological studies, however, show that the age at which a word is learned, or its Age of Acquisition (AoA), may be a better predictor of naming latency. Further, in many multiple regression analyses, frequency is not found to be significant when AoA is controlled for (Brown & Watson, 1987; Carroll & White, 1973; Morrison et al. 1992; Morrison & Ellis, 1995). These studies show that frequency and AoA are highly correlated (typically r =-.6) explaining the confound of older studies on frequency. However, still more recent studies question this finding and find that both AoA and frequency are significant and contribute independently to naming latency (Ellis & Morrison, 1998; Gerhand & Barry, 1998,1999). Much like their psychological counterparts, connectionist networks also show very strong frequency effects. However, the ability of a connectionist network to show AoA effects has been doubted (Gerhand & Barry, 1998; Morrison & Ellis, 1995). Most of these claims are based on the well known fact that connectionist networks exhibit

6 0.29121473 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

7 0.28704572 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

8 0.2863248 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

9 0.28465971 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

10 0.28317234 74 nips-2000-Kernel Expansions with Unlabeled Examples

11 0.28120005 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

12 0.27837956 133 nips-2000-The Kernel Gibbs Sampler

13 0.27814591 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

14 0.27766529 92 nips-2000-Occam's Razor

15 0.27729523 4 nips-2000-A Linear Programming Approach to Novelty Detection

16 0.27583879 69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing

17 0.27561402 111 nips-2000-Regularized Winnow Methods

18 0.27507815 142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

19 0.27501073 146 nips-2000-What Can a Single Neuron Compute?

20 0.27433398 79 nips-2000-Learning Segmentation by Random Walks