nips nips2008 nips2008-101 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Rui M. Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Tim Rogers, Xiaojin Zhu
Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. [sent-4, score-1.19]
2 Furthermore, we compare human active learning performance with predictions from statistical learning theory. [sent-5, score-0.738]
3 We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. [sent-6, score-1.668]
4 However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. [sent-8, score-1.158]
5 To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings. [sent-9, score-1.456]
6 In contrast, passive learning is a paradigm in which the learner has no control over the labeled examples it is given. [sent-12, score-0.788]
7 In machine learning, active learning has been a topic of intense interest. [sent-13, score-0.421]
8 In certain machine learning problems it has been shown that active learning algorithms perform much better than passive learning, with superior convergence bounds (see [1, 4] and references therein) and/or superior empirical performance [5, 19]. [sent-14, score-1.134]
9 In this paper we focus on the application of active learning to classification, in both machines and humans. [sent-15, score-0.421]
10 To our knowledge, no previous work has attempted to quantify human active learning performance in probabilistic category learning (i. [sent-16, score-0.869]
11 , classification), contrast human active and passive learning, and compare against theoretically optimal theory bounds. [sent-18, score-1.262]
12 Theories of human category learning often cast the learner as a passive learner, who observes some object (typically represented as a feature vector), is presented with the object’s category label, and does some statistical processing to determine how the label should generalize. [sent-19, score-1.41]
13 Active querying provides children with information that they would otherwise be less likely to encounter through passive observation; and so, presumably, such active querying has important implications for category learning. [sent-22, score-1.298]
14 Early research in human concept attainment suggested that learners do benefit from the opportunity to actively select examples during learning [11]. [sent-23, score-0.503]
15 1 Figure 1: The two-category learning task with boundary θ and noise level . [sent-27, score-0.481]
16 teria for assessing the magnitude of the active learning benefit (e. [sent-30, score-0.455]
17 Partly as a result, nearly all contemporary research in classification and categorization has ignored active learning. [sent-33, score-0.358]
18 Several authors have argued that pessimistic views of the human ability to choose relevant queries are based on faulty task analyses; and that, when the learning task is properly construed, humans do an excellent, even optimal job of selection [7, 14]. [sent-36, score-0.665]
19 Specification of the theoretical benefits of active learning in this context allows us to address the following questions regarding human performance: [Q1] Do humans perform better when they can select their own examples for labeling, compared to passive observation of labeled examples? [sent-39, score-1.622]
20 [Q2] If so, do they achieve the full benefit of active learning suggested by statistical learning theory? [sent-40, score-0.484]
21 [Q3] If they do not, can machine learning be used to enhance human performance? [sent-41, score-0.354]
22 The goal of this paper is to answer these questions in a quantitative way by studying human and machine performance in one well-understood classification task. [sent-43, score-0.363]
23 Answers to these questions have important theoretical and practical implications for our understanding of human learning and cognition. [sent-44, score-0.45]
24 As previously noted, most theories of human category learning assume passive sampling of the environment. [sent-45, score-1.121]
25 Some researchers have argued that the environment provides little information regarding the category structure of the world, and so conclude that human category learning must be subject to strong initial constraints [6, 3, 9]. [sent-46, score-0.666]
26 If, however, human learning benefits from active querying of the environment, it is not clear that such conclusions are justified. [sent-47, score-0.746]
27 From an applied perspective, if machines can be shown to aid human learning in certain predictable circumstances, this has clear implications for the design of intelligent tutoring systems and other machine-human hybrid applications. [sent-48, score-0.352]
28 2 A Two-Category Learning Task For the study in this paper we consider learning in a relatively simple setting, where there is a good theoretical understanding of both active and passive machine learning, offering an ideal test-bed for assessing active learning in humans. [sent-49, score-1.56]
29 The label i=1 Yi is related to the sample Xi in the following noisy way: Yi is equal to the category of Xi with probability 1 − and equal to the other category with probability , where 0 ≤ < 1/2. [sent-55, score-0.413]
30 At this point we have not specified how the sample locations Xi are generated, and in this lies the major difference between passive and active learning. [sent-60, score-1.056]
31 In the passive learning setting the sample locations are randomly distributed, independent of the labels. [sent-61, score-0.761]
32 On the other hand, in the active learning setting the learner can choose the sample locations in a sequential way depending on the past, that is Xi = h(X1 , . [sent-62, score-0.517]
33 If = 0, that is when there is no label noise, the optimal methodologies for passive and active learning are quite obvious. [sent-69, score-1.161]
34 In passive learning, the optimal inference is that θ lies somewhere between the rightmost location where a label of zero was observed and the leftmost location where a label of one was observed. [sent-70, score-0.894]
35 On the other hand, in active learning the optimal strategy is a deterministic binary bisection: begin by taking X1 = 1/2. [sent-72, score-0.421]
36 Therefore after n samples the error of the active learning inference is at most 2−(n+1) . [sent-76, score-0.462]
37 Clearly active learning, where the error decays exponentially with the number of samples, is much better than passive learning, where the error can decay only polynomially. [sent-77, score-1.313]
38 Under passive learning, the maximum likelihood estimator yields the optimal rate of error convergence. [sent-79, score-0.691]
39 Furthermore it is possible to show a performance lower bound that clarifies what is the best possible performance of any passive learning algorithm. [sent-80, score-0.741]
40 ˆ inf sup E[|θn − θ|] ≥ ˆ θn θ∈[0,1] 1 4 1+2 1−2 2 1 , n+1 (1) ˆ where θn is the estimate of θ obtained after n observations, and the infimum is taken over all possible passive learning procedures. [sent-82, score-0.713]
41 This is a so-called minimax lower bound, and gives an indication of the best achievable performance of any passive learning algorithm. [sent-83, score-0.779]
42 That is, no passive algorithm can learn more rapidly. [sent-84, score-0.65]
43 For active learning, deterministic bisection cannot be used due to the label noise. [sent-87, score-0.55]
44 Nevertheless active learning is still extremely beneficial in this setting. [sent-88, score-0.421]
45 In [2] a slightly modified method was introduced, which is more amenable to analysis; the major difference involves 1 We use a constant noise level because the theoretical distinction between active and passive learning is dramatic in this case. [sent-106, score-1.329]
46 (2) Note that the expected estimation error decays exponentially with the number of observations, as opposed to the polynomial decay achievable using passive learning (1). [sent-119, score-1.031]
47 This shows that the accuracy of active learning is significantly better than passive learning, even under the presence of uncertainty. [sent-120, score-1.071]
48 Furthermore no active (or passive) learning algorithm can have their expected error decaying faster than exponentially with the number of samples, as in (2). [sent-121, score-0.513]
49 3 Human Passive and Active Learning Experiments Equipped with the theoretical performance of passive learning (1) and active learning (2), we now describe a behavioral study designed to answer Q1-Q4 posed earlier. [sent-122, score-1.213]
50 This is the passive learning condition where the human subject cannot select the queries, and is instead presented sequentially with examples {Xi }n sampled uniformly i=1 at random from [0, 1], and their noisy labels {Yi }n . [sent-125, score-1.141]
51 The subject is regularly asked to guess the i=1 boundary from these observations (without feedback). [sent-126, score-0.339]
52 If humans are capable of learning from passive observation of random samples, their boundary estimates should approach the true boundary with this polynomial rate too. [sent-128, score-1.228]
53 This is the active learning condition where the human subject, at iteration i, selects a query Xi based on her previous queries and their noisy labels {(Xj , Yj )}i−1 . [sent-130, score-0.905]
54 It is motivated by question Q3: Can machine learning assist human category learning? [sent-135, score-0.448]
55 Spiky eggs (X close to 0) most likely hatch alien snakes (category zero), and smooth eggs (X close to 1) most likely hatch alien birds (category one), but there could be exceptions (label noise). [sent-147, score-0.419]
56 For each session and participant the true decision boundary θ was randomly set in [1/16, 15/16] to avoid dependencies on the location of the true boundary. [sent-159, score-0.403]
57 Once the participant found the shape she wished to query (Xi+1 ), she clicked a “hatch” button and observed the outcome (bird or snake, corresponding to the noisy label), followed by a “Continue” button to move on to the next query. [sent-163, score-0.455]
58 In all conditions, the computer generated the noisy label Yi+1 according to the true boundary θ and noise level , and displayed it to the participant with either a snake picture (Yi+1 = 0) or a bird picture (Yi+1 = 1). [sent-166, score-0.765]
59 ˆ The participant was asked to guess the decision boundary (θ) after every three iterations. [sent-168, score-0.438]
60 In these “boundary queries,” the computer began by displaying the shape at X = 1/2, and the participant used the mouse wheel to change the shape until it matched her current best guess about the boundary ˆ ˆ ˆ ˆ shape. [sent-169, score-0.566]
61 These boundary estimates allowed us to compute mean (across subjects) human ˆ estimation errors |θn − θ| for different n, under different conditions and different noise levels. [sent-175, score-0.629]
62 4 Experimental Results Figure 4 shows, for each condition and noise level, how every participant’s boundary guesses approach the true boundary θ. [sent-177, score-0.615]
63 Qualitatively, human active learning (Human-Active) appears better than passive learning (Random) because the curves are more concentrated around zero. [sent-178, score-1.388]
64 [Q1] Do humans perform better when they can actively select samples for labeling compared to passive observation of randomly-selected samples? [sent-188, score-0.886]
65 ˆ To support our answer, we show that the human estimation error |θn − θ| is smaller in the HumanActive condition than Random condition. [sent-191, score-0.342]
66 The x-axis is iteration n, y-axis is the (signed) difference ˆ between human boundary guess and true boundary θn − θ. [sent-229, score-0.676]
67 Each curve shows performance from one human subject (though they overlap, it is sufficient to note the trends). [sent-230, score-0.318]
68 Overall, human active learning (Human-Active) is better than passive learning (Random), and machine-assisted human learning (Machine-Yoked) is even better. [sent-231, score-1.705]
69 Human-Active is better than Random when noise is low; Machine-Yoked is better than Human-Active when noise is high. [sent-251, score-0.338]
70 That is, with active learning the subjects quickly come up with better guesses and maintain this advantage till the end. [sent-252, score-0.542]
71 Human-Active performance deteriorates with higher noise levels, however, and at the highest noise levels is appears indistinguishable from performance in the Random condition. [sent-253, score-0.371]
72 [Q2] Can humans achieve the full benefit of active learning suggested by learning theory? [sent-254, score-0.627]
73 [A2] Human active learning does have exponential convergence, but with slower decay constants than the upper bound in (2). [sent-255, score-0.615]
74 Human passive learning, on the other hand, sometimes does not even achieve polynomial convergence as predicted in (1), and in no condition does the rate approach optimal performance. [sent-256, score-0.749]
75 To support these conclusions, consider that, for active learning, the theoretical estimation error bound in (2) has the form 2e−λn and decays exponentially with n. [sent-257, score-0.563]
76 The decay constant (1 − ) is determined by the noise level . [sent-258, score-0.321]
77 To determine whether human error decays exponentially as predicted, and with a comparable slope, one can similarly plot the logarithm of human active learning estimation error vs. [sent-262, score-1.113]
78 If human active learning decreases error exponentially (which is desirable), this relationship is linear, as Figure 6 (Upper) shows it to be. [sent-264, score-0.767]
79 This exponential decay of error offers further evidence that human active learning exceeds passive learning performance, where error can only decay polynomially (Figure 6, Lower). [sent-265, score-1.712]
80 The speed (decay constant) of the exponential decay in human active learning is, however, slower than the theoretical upper bound (2). [sent-266, score-0.903]
81 To see this, we fit one line per noise level in 6 noise ! [sent-267, score-0.369]
82 (Lower) Human passive learning in ˆ the Random condition is slower than O(1/n), since the slopes are shallower than -1 on log(|θn − θ|) (the y-axis) versus log(n) (the x-axis). [sent-288, score-0.846]
83 005 Table 1: The exponential decay constants of human active learning is slower than predicted by statistical learning theory for lower noise levels. [sent-303, score-1.099]
84 Figure 6 and use the negative slope of the fitted lines as the estimate of the decay constant in human active learning. [sent-304, score-0.798]
85 It is clear that human active learning’s error decays at a slower rate, especially when the noise is low. [sent-307, score-0.918]
86 For passive learning, the minimax lower bound (1) has a polynomial decay of O(1/n), which is a ˆ line with slope -1 on a plot of log(|θn − θ|) vs. [sent-308, score-0.928]
87 As shown in Figure 6 (Lower), the analogous log-log plot from human passive learning in the Random condition does seem to fit a line, but the slope is much shallower than -1. [sent-310, score-1.12]
88 [Q3] Can machine learning be used to enhance human learning? [sent-315, score-0.354]
89 As shown in Figure 5, the Machine-Yoked curve is no different than Human-Active in low noise levels, but substantially better in high noise levels. [sent-317, score-0.369]
90 It is important to remember that Machine-Yoked is human performance, not that of the machine learning algorithm. [sent-318, score-0.317]
91 The results seem to indicate that humans can utilize the training data chosen by a machine active learning algorithm to enhance their performance in settings where humans are not generally performing well. [sent-319, score-0.744]
92 [A4] One form of difficulty, the label noise level , has profound effects on human learning. [sent-327, score-0.544]
93 Specifically, the advantage of active learning diminishes with noise; and at high noise levels active learning arguably has no advantage over passive learning for humans in this setting. [sent-328, score-1.9]
94 Formal analysis 7 suggests that the advantage of active over passive sampling should diminish with increasing noise; but it also suggests that some benefit to active sampling should always be obtained. [sent-329, score-1.402]
95 5 Conclusions and Future Work We have conducted behavioral experiments to compare active versus passive learning by humans in a simple classification task, and compared human performance to that predicted by statistical learning theory. [sent-331, score-1.557]
96 In short, humans are able to actively select queries and use them to achieve faster category learning; but the advantages of active-learning diminish under higher noise conditions and do not approach theoretical bounds. [sent-332, score-0.729]
97 One important conclusion from this work is that passive learning may not be a very good model for how human beings learn to categorize. [sent-333, score-0.967]
98 The benefit of the current work is that it capitalizes on a simple learning task for which passive and active performance has been formally characterized. [sent-335, score-1.116]
99 Bayesian approaches to associative learning: From passive to active learning. [sent-391, score-1.008]
100 Combining active and semi-supervised learning for spoken u language understanding. [sent-445, score-0.421]
wordName wordTfidf (topN-words)
[('passive', 0.65), ('active', 0.358), ('human', 0.254), ('boundary', 0.173), ('noise', 0.169), ('participant', 0.16), ('humans', 0.143), ('category', 0.131), ('decay', 0.121), ('bisection', 0.102), ('label', 0.09), ('queries', 0.09), ('guess', 0.076), ('subjects', 0.068), ('slope', 0.065), ('questions', 0.064), ('actively', 0.064), ('learning', 0.063), ('participants', 0.061), ('alien', 0.061), ('hatch', 0.061), ('query', 0.055), ('button', 0.053), ('guesses', 0.053), ('bene', 0.052), ('exponentially', 0.051), ('decays', 0.051), ('shape', 0.05), ('querying', 0.049), ('psychology', 0.048), ('learner', 0.048), ('condition', 0.047), ('shapes', 0.047), ('clicked', 0.046), ('task', 0.045), ('slower', 0.045), ('answer', 0.045), ('xi', 0.044), ('yi', 0.044), ('observes', 0.043), ('error', 0.041), ('attainment', 0.041), ('eggs', 0.041), ('prt', 0.041), ('shallower', 0.041), ('snakes', 0.041), ('causal', 0.04), ('noisy', 0.038), ('session', 0.038), ('minimax', 0.038), ('cognitive', 0.037), ('enhance', 0.037), ('diminish', 0.036), ('snake', 0.036), ('yoked', 0.036), ('displayed', 0.035), ('implications', 0.035), ('theoretical', 0.034), ('assessing', 0.034), ('subject', 0.033), ('levels', 0.033), ('conditions', 0.033), ('rationality', 0.033), ('bird', 0.033), ('pilot', 0.033), ('location', 0.032), ('curve', 0.031), ('level', 0.031), ('sessions', 0.031), ('interventions', 0.031), ('select', 0.029), ('asked', 0.029), ('environment', 0.029), ('wheel', 0.029), ('answers', 0.029), ('continue', 0.029), ('bound', 0.028), ('psychological', 0.028), ('incorrect', 0.028), ('achievable', 0.028), ('mouse', 0.028), ('regularly', 0.028), ('examples', 0.027), ('told', 0.026), ('likely', 0.026), ('predicted', 0.026), ('polynomial', 0.026), ('argued', 0.025), ('locations', 0.025), ('interval', 0.025), ('opportunity', 0.025), ('ever', 0.025), ('people', 0.024), ('culty', 0.024), ('dramatic', 0.024), ('sample', 0.023), ('theories', 0.023), ('really', 0.023), ('smoothly', 0.023), ('conclusions', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 101 nips-2008-Human Active Learning
Author: Rui M. Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Tim Rogers, Xiaojin Zhu
Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings. 1
2 0.21546483 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
Author: Sudheendra Vijayanarasimhan, Kristen Grauman
Abstract: We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples. We propose to allow the categorylearner to strategically choose what annotations it receives—based on both the expected reduction in uncertainty as well as the relative costs of obtaining each annotation. We construct a multiple-instance discriminative classifier based on the initial training data. Then all remaining unlabeled and weakly labeled examples are surveyed to actively determine which annotation ought to be requested next. After each request, the current classifier is incrementally updated. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). As a result, it is possible to learn more accurate category models with a lower total expenditure of manual annotation effort. 1
3 0.1091277 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
Author: Pierre Garrigues, Laurent E. Ghaoui
Abstract: It has been shown that the problem of 1 -penalized least-square regression commonly referred to as the Lasso or Basis Pursuit DeNoising leads to solutions that are sparse and therefore achieves model selection. We propose in this paper RecLasso, an algorithm to solve the Lasso with online (sequential) observations. We introduce an optimization problem that allows us to compute an homotopy from the current solution to the solution after observing a new data point. We compare our method to Lars and Coordinate Descent, and present an application to compressive sensing with sequential observations. Our approach can easily be extended to compute an homotopy from the current solution to the solution that corresponds to removing a data point, which leads to an efficient algorithm for leave-one-out cross-validation. We also propose an algorithm to automatically update the regularization parameter after observing a new data point. 1
4 0.10788127 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions
Author: Giovanni Cavallanti, Nicolò Cesa-bianchi, Claudio Gentile
Abstract: We provide a new analysis of an efficient margin-based algorithm for selective sampling in classification problems. Using the so-called Tsybakov low noise condition to parametrize the instance distribution, we show bounds on the convergence rate to the Bayes risk of both the fully supervised and the selective sampling versions of the basic algorithm. Our analysis reveals that, excluding logarithmic factors, the average risk of the selective sampler converges to the Bayes risk at rate N −(1+α)(2+α)/2(3+α) where N denotes the number of √ queried labels, and α > 0 is the exponent in the low noise condition. For all α > 3 − 1 ≈ 0.73 this convergence rate is asymptotically faster than the rate N −(1+α)/(2+α) achieved by the fully supervised version of the same classifier, which queries all labels, and for α → ∞ the two rates exhibit an exponential gap. Experiments on textual data reveal that simple variants of the proposed selective sampler perform much better than popular and similarly efficient competitors. 1
5 0.10733917 138 nips-2008-Modeling human function learning with Gaussian processes
Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1
6 0.099950641 231 nips-2008-Temporal Dynamics of Cognitive Control
7 0.096158244 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
8 0.081365675 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
9 0.077996708 73 nips-2008-Estimating Robust Query Models with Convex Optimization
10 0.074326403 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
11 0.072043911 206 nips-2008-Sequential effects: Superstition or rational behavior?
12 0.069982409 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
13 0.06729579 249 nips-2008-Variational Mixture of Gaussian Process Experts
14 0.066107549 223 nips-2008-Structure Learning in Human Sequential Decision-Making
15 0.065447234 172 nips-2008-Optimal Response Initiation: Why Recent Experience Matters
16 0.063101418 24 nips-2008-An improved estimator of Variance Explained in the presence of noise
17 0.058043532 6 nips-2008-A ``Shape Aware'' Model for semi-supervised Learning of Objects and its Context
18 0.055542611 247 nips-2008-Using Bayesian Dynamical Systems for Motion Template Libraries
19 0.05514897 179 nips-2008-Phase transitions for high-dimensional joint support recovery
20 0.054804504 228 nips-2008-Support Vector Machines with a Reject Option
topicId topicWeight
[(0, -0.185), (1, -0.004), (2, 0.04), (3, -0.047), (4, -0.015), (5, 0.008), (6, -0.029), (7, 0.022), (8, 0.082), (9, 0.112), (10, -0.016), (11, 0.059), (12, -0.097), (13, -0.141), (14, 0.037), (15, 0.128), (16, 0.012), (17, 0.029), (18, 0.104), (19, 0.028), (20, 0.025), (21, -0.051), (22, 0.175), (23, -0.007), (24, 0.01), (25, -0.095), (26, 0.099), (27, -0.111), (28, 0.027), (29, -0.015), (30, -0.043), (31, 0.114), (32, -0.029), (33, -0.167), (34, -0.127), (35, 0.08), (36, -0.085), (37, -0.096), (38, 0.062), (39, -0.113), (40, -0.124), (41, -0.098), (42, 0.012), (43, -0.018), (44, 0.188), (45, -0.044), (46, 0.003), (47, -0.146), (48, 0.055), (49, -0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.9658429 101 nips-2008-Human Active Learning
Author: Rui M. Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Tim Rogers, Xiaojin Zhu
Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings. 1
2 0.65792835 142 nips-2008-Multi-Level Active Prediction of Useful Image Annotations for Recognition
Author: Sudheendra Vijayanarasimhan, Kristen Grauman
Abstract: We introduce a framework for actively learning visual categories from a mixture of weakly and strongly labeled image examples. We propose to allow the categorylearner to strategically choose what annotations it receives—based on both the expected reduction in uncertainty as well as the relative costs of obtaining each annotation. We construct a multiple-instance discriminative classifier based on the initial training data. Then all remaining unlabeled and weakly labeled examples are surveyed to actively determine which annotation ought to be requested next. After each request, the current classifier is incrementally updated. Unlike previous work, our approach accounts for the fact that the optimal use of manual annotation may call for a combination of labels at multiple levels of granularity (e.g., a full segmentation on some images and a present/absent flag on others). As a result, it is possible to learn more accurate category models with a lower total expenditure of manual annotation effort. 1
3 0.58131123 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
Author: Jing Xu, Thomas L. Griffiths
Abstract: Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases. 1
4 0.54929519 41 nips-2008-Breaking Audio CAPTCHAs
Author: Jennifer Tam, Jiri Simsa, Sean Hyde, Luis V. Ahn
Abstract: CAP TCHAs are computer-generated tests that humans can pass but current computer systems cannot. CAP TCHAs provide a method for automatically distinguishing a human from a computer program, and therefore can protect Web services from abuse by so-called “bots.” Most CAP TCHAs consist of distorted images, usually text, for which a user must provide some description. Unfortunately, visual CAP TCHAs limit access to the millions of visually impaired people using the Web. Audio CAP TCHAs were created to solve this accessibility issue; however, the security of audio CAP TCHAs was never formally tested. Some visual CAP TCHAs have been broken using machine learning techniques, and we propose using similar ideas to test the security of audio CAP TCHAs. Audio CAP TCHAs are generally composed of a set of words to be identified, layered on top of noise. We analyzed the security of current audio CAP TCHAs from popular Web sites by using AdaBoost, SVM, and k-NN, and achieved correct solutions for test samples with accuracy up to 71%. Such accuracy is enough to consider these CAPTCHAs broken. Training several different machine learning algorithms on different types of audio CAP TCHAs allowed us to analyze the strengths and weaknesses of the algorithms so that we could suggest a design for a more robust audio CAPTCHA. 1 Int rod uct i o n CAP TCHAs [1] are automated tests designed to tell computers and humans apart by presenting users with a problem that humans can solve but current computer programs cannot. Because CAPTCHAs can distinguish between humans and computers with high probability, they are used for many different security applications: they prevent bots from voting continuously in online polls, automatically registering for millions of spam email accounts, automatically purchasing tickets to buy out an event, etc. Once a CAP TCHA is broken (i.e., computer programs can successfully pass the test), bots can impersonate humans and gain access to services that they should not. Therefore, it is important for CAP TCHAs to be secure. To pass the typical visual CAP TCHA, a user must correctly type the characters displayed in an image of distorted text. Many visual CAP TCHAs have been broken with machine learning techniques [2]-[3], though some remain secure against such attacks. Because visually impaired users who surf the Web using screen-reading programs cannot see this type of CAPTCHA, audio CAP TCHAs were created. Typical audio CAP TCHAs consist of one or several speakers saying letters or digits at randomly spaced intervals. A user must correctly identify the digits or characters spoken in the audio file to pass the CAP TCHA. To make this test difficult for current computer systems, specifically automatic speech recognition (ASR) programs, background noise is injected into the audio files. Since no official evaluation of existing audio CAP TCHAs has been reported, we tested the security of audio CAP TCHAs used by many popular Web sites by running machine learning experiments designed to break them. In the next section, we provide an overview of the literature related to our project. Section 3 describes our methods for creating training data, and section 4 describes how we create classifiers that can recognize letters, digits, and noise. In section 5, we discuss how we evaluated our methods on widely used audio CAP TCHAs and we give our results. In particular, we show that the audio CAP TCHAs used by sites such as Google and Digg are susceptible to machine learning attacks. Section 6 mentions the proposed design of a new more secure audio CAP TCHA based on our findings. 2 Lit erat u r e r ev i ew To break the audio CAP TCHAs, we derive features from the CAP TCHA audio and use several machine learning techniques to perform ASR on segments of the CAPTCHA. There are many popular techniques for extracting features from speech. The three techniques we use are mel-frequency cepstral coefficients (MFCC), perceptual linear prediction (PLP), and relative spectral transform-PLP (RAS TA-PLP). MFCC is one of the most popular speech feature representations used. Similar to a fast Fourier transform (FF T), MFCC transforms an audio file into frequency bands, but (unlike FF T) MFCC uses mel-frequency bands, which are better for approximating the range of frequencies humans hear. PLP was designed to extract speaker-independent features from speech [4]. Therefore, by using PLP and a variant such as RAS TA-PL P, we were able to train our classifiers to recognize letters and digits independently of who spoke them. Since many different people recorded the digits used in one of the types of audio CAP TCHAs we tested, PLP and RAS TA-PLP were needed to extract the features that were most useful for solving them. In [4]-[5], the authors conducted experiments on recognizing isolated digits in the presence of noise using both PLP and RAS TA-PL P. However, the noise used consisted of telephone or microphone static caused by recording in different locations. The audio CAP TCHAs we use contain this type of noise, as well as added vocal noise and/or music, which is supposed to make the automated recognition process much harder. The authors of [3] emphasize how many visual CAP TCHAs can be broken by successfully splitting the task into two smaller tasks: segmentation and recognition. We follow a similar approach in that we first automatically split the audio into segments, and then we classify these segments as noise or words. In early March 2008, concurrent to our work, the blog of Wintercore Labs [6] claimed to have successfully broken the Google audio CAP TCHA. After reading their Web article and viewing the video of how they solve the CAP TCHAs, we are unconvinced that the process is entirely automatic, and it is unclear what their exact pass rate is. Because we are unable to find any formal technical analysis of this program, we can neither be sure of its accuracy nor the extent of its automation. 3 Cr e at i o n of tra i n i n g dat a Since automated programs can attempt to pass a CAPTCHA repeatedly, a CAPTCHA is essentially broken when a program can pass it more than a non-trivial fraction of the time; e.g., a 5% pass rate is enough. Our approach to breaking the audio CAP TCHAs began by first splitting the audio files into segments of noise or words: for our experiments, the words were spoken letters or digits. We used manual transcriptions of the audio CAP TCHAs to get information regarding the location of each spoken word within the audio file. We were able to label our segments accurately by using this information. We gathered 1,000 audio CAP TCHAs from each of the following Web sites: google.com, digg.com, and an older version of the audio CAP TCHA in recaptcha.net. Each of the CAP TCHAs was annotated with the information regarding letter/digit locations provided by the manual transcriptions. For each type of CAPTCHA, we randomly selected 900 samples for training and used the remaining 100 for testing. Using the digit/letter location information provided in the manual CAP TCHA transcriptions, each training CAP TCHA is divided into segments of noise, the letters a-z, or the digits 0-9, and labeled as such. We ignore the annotation information of the CAP TCHAs we use for testing, and therefore we cannot identify the size of those segments. Instead, each test CAP TCHA is divided into a number of fixed-size segments. The segments with the highest energy peaks are then classified using machine learning techniques (Figure 1). Since the size of a feature vector extracted from a segment generally depends on the size of the segment, using fixed-size segments allows each segment to be described with a feature vector of the same length. We chose the window size by listening to a few training segments and adjusted accordingly to ensure that the segment contained the entire digit/letter. There is undoubtedly a more optimal way of selecting the window size, however, we were still able to break the three CAP TCHAs we tested with our method. Figure 1: A test audio CAP TCHA with the fixed-size segments containing the highest energy peaks highlighted. The information provided in the manual transcriptions of the audio CAP TCHAs contains a list of the time intervals within which words are spoken. However, these intervals are of variable size and the word might be spoken anywhere within this interval. To provide fixedsize segments for training, we developed the following heuristic. First, divide each file into variable-size segments using the time intervals provided and label each segment accordingly. Then, within each segment, detect the highest energy peak and return its fixed-size neighborhood labeled with the current segment’s label. This heuristic achieved nearly perfect labeling accuracy for the training set. Rare mistakes occurred when the highest energy peak of a digit or letter segment corresponded to noise rather than to a digit or letter. To summarize this subsection, an audio file is transformed into a set of fixed-size segments labeled as noise, a digit between 0 and 9, or a letter between a and z. These segments are then used for training. Classifiers are trained for one type of CAPTCHA at a time. 4 C l a s s i f i e r con s t ru ct i o n From the training data we extracted five sets of features using twelve MFCCs and twelfth- order spectral (SPEC) and cepstral (CEPS) coefficients from PLP Matlab functions for extracting these features were provided online Voicebox package. We use AdaBoost, SVM, and k-NN algorithms digit and letter recognition. We detail our implementation of following subsections. 4 .1 and RAS TA-PL P. The at [7] and as part of the to implement automated each algorithm in the AdaBoost Using decision stumps as weak classifiers for AdaBoost, anywhere from 11 to 37 ensemble classifiers are built. The number of classifiers built depends on which type of CAPTCHA we are solving. Each classifier trains on all the segments associated with that type of CAP TCHA, and for the purpose of building a single classifier, segments are labeled by either -1 (negative example) or +1 (positive example). Using cross-validation, we choose to use 50 iterations for our AdaBoost algorithm. A segment can then be classified as a particular letter, digit, or noise according to the ensemble classifier that outputs the number closest to 1. 4 .2 S u p p o rt v e ct o r m a c h i n e To conduct digit recognition with SVM, we used the C++ implementations of libSVM [8] version 2.85 with C-SMV and RBF kernel. First, all feature values are scaled to the range of -1 to 1 as suggested by [8]. The scale parameters are stored so that test samples can be scaled accordingly. Then, a single multiclass classifier is created for each set of features using all the segments for a particular type of CAPTCHA. We use cross-validation and grid search to discover the optimal slack penalty (C=32) and kernel parameter (γ=0.011). 4 .3 k - n e a re st n e i g h b o r ( k - N N ) We use k-NN as our final method for classifying digits. For each type of CAP TCHA, five different classifiers are created by using all of the training data and the five sets of features associated with that particular type of CAP TCHA. Again we use cross-validation to discover the optimal parameter, in this case k=1. We use Euclidian distance as our distance metric. 5 Ass e s sm e n t of cu rre n t a ud i o CAPTCHAs Our method for solving CAP TCHAs iteratively extracts an audio segment from a CAP TCHA, inputs the segment to one of our digit or letter recognizers, and outputs the label for that segment. We continue this process until the maximum solution size is reached or there are no unlabeled segments left. Some of the CAPTCHAs we evaluated have solutions that vary in length. Our method ensures that we get solutions of varying length that are never longer than the maximum solution length. A segment to be classified is identified by taking the neighborhood of the highest energy peak of an as yet unlabeled part of the CAP TCHA. Once a prediction of the solution to the CAPTCHA is computed, it is compared to the true solution. Given that at least one of the audio CAP TCHAs allows users to make a mistake in one of the digits (e.g., reCAPTCHA), we compute the pass rate for each of the different types of CAPTCHAs with all of the following conditions: • The prediction matches the true solution exactly. • Inserting one digit into the prediction would make it match the solution exactly. • Replacing one digit in the prediction would make it match the solution exactly. • Removing one digit from the prediction would make it match the solution exactly. However, since we are only sure that these conditions apply to reCAPTCHA audio CAP TCHAs, we also calculate the percentage of exact solution matches in our results for each type of audio CAP TCHA. These results are described in the following subsections. 5 .1 Goog le Google audio CAP TCHAs consist of one speaker saying random digits 0-9, the phrase “once again,” followed by the exact same recorded sequence of digits originally presented. The background noise consists of human voices speaking backwards at varying volumes. A solution can range in length from five to eight words. We set our classifier to find the 12 loudest segments and classify these segments as digits or noise. Because the phrase “once again” marks the halfway point of the CAPTCHA, we preprocessed the audio to only serve this half of the CAP TCHA to our classifiers. It is important to note, however, that the classifiers were always able to identify the segment containing “once again,” and these segments were identified before all other segments. Therefore, if necessary, we could have had our system cut the file in half after first labeling this segment. For AdaBoost, we create 12 classifiers: one classifier for each digit, one for noise, and one for the phrase “once again.” Our results ( Table 1) show that at best we achieved a 90% pass rate using the “one mistake” passing conditions and a 66% exact solution match rate. Using SVM and the “one mistake” passing conditions, at best we achieve a 92% pass rate and a 67% exact solution match. For k-NN, the “one mistake” pass rate is 62% and the exact solution match rate is 26%. Table 1: Google audio CAP TCHA results: Maximum 67% accuracy was achieved by SVM. Classifiers Used AdaBoost SVM k-NN one mistake exact match one mistake exact match 88% 61% 92% 67% 30% 1% PLPSPEC 90% 66% 90% 67% 60% 26% PLPCEPS 90% 66% 92% 67% 62% 23% RAS TAPLPSPEC 88% 48% 90% 61% 29% 1% RAS TAPLPCEPS 5 .2 exact match MFCC Features Used One mistake 90% 63% 92% 67% 33% 2% Digg Digg CAP TCHAs also consist of one speaker, in this case saying a random combination of letters and digits. The background noise consists of static or what sounds like trickling water and is not continuous throughout the entire file. We noticed in our training data that the following characters were never present in a solution: 0, 1, 2, 5, 7, 9, i, o, z. Since the Digg audio CAPTCHA is also the verbal transcription of the visual CAP TCHA, we believe that these characters are excluded to avoid confusion between digits and letters that are similar in appearance. The solution length varies between three and six words. Using AdaBoost, we create 28 classifiers: one classifier for each digit or letter that appears in our training data and one classifier for noise. Perhaps because we had fewer segments to train with and there was a far higher proportion of noise segments, AdaBoost failed to produce any correct solutions. We believe that the overwhelming number of negative training examples versus the small number of positive training samples used to create each decision stump severely affected AdaBoost’s ability to classify audio segments correctly. A histogram of the training samples is provided in Figure 2 to illustrate the amount of training data available for each character. When using SVM, the best feature set passed with 96% using “one mistake” passing conditions and passed with 71% when matching the solution exactly. For k-NN, the best feature set produced a 90% “one mistake” pass rate and a 49% exact solution match. Full results can be found in Table 2. Table 2: Digg audio CAP TCHA results: Maximum 71% accuracy was achieved by SVM. Classifiers Used AdaBoost SVM k-NN exact match one mistake exact match one mistake exact match MFCC - - 96% 71% 89% 49% PLPSPEC - - 94% 65% 90% 47% PLPCEPS - - 96% 71% 64% 17% RAS TAPLPSPEC - - 17% 3% 67% 17% RAS TAPLPCEPS Features Used one mistake - - 96% 71% 82% 34% 1000 900 800 # of Segments 700 600 500 400 300 200 100 y x v w t u r s q p n m l j k h f g e d c b a 8 6 4 3 noise 0 Segment Label Figure 2: Digg CAP TCHA training data distribution. 5 .3 reC A P T C H A The older version of reCAPTCHA’s audio CAP TCHAs we tested consist of several speakers who speak random digits. The background noise consists of human voices speaking backwards at varying volumes. The solution is always eight digits long. For AdaBoost, we create 11 classifiers: one classifier for each digit and one classifier for noise. Because we know that the reCAPTCHA passing conditions are the “one mistake” passing conditions, SVM produces our best pass rate of 58%. Our best exact match rate is 45% ( Table 3). Table 3: reCAPTCHA audio CAP TCHA results: Maximum 45% accuracy was achieved by SVM. Classifiers Used AdaBoost SVM k-NN one mistake exact match one mistake exact match 18% 6% 56% 43% 22% 11% PLPSPEC 27% 10% 58% 39% 43% 25% PLPCEPS 23% 10% 56% 45% 29% 14% RAS TAPLPSPEC 9% 3% 36% 18% 24% 4% RAS TAPLPCEPS 6 exact match MFCC Features Used one mistake 9% 3% 46% 30% 32% 12% Prop ert i e s of w ea k ver s u s st ro n g CAPT CHAs From our results, we note that the easiest CAP TCHAs to break were from Digg. Google had the next strongest CAP TCHAs followed by the strongest from reCAPTCHA. Although the Digg CAP TCHAs have the largest vocabulary, giving us less training data per label, the same woman recorded them all. More importantly, the same type of noise is used throughout the entire CAPTCHA. The noise sounds like running water and static which sounds very different from the human voice and does not produce the same energy spikes needed to locate segments, therefore making segmentation quite easy. The CAP TCHAs from Google and reCAPTCHA used other human voices for background noise, making segmentation much more difficult. Although Google used a smaller vocabulary than Digg and also only used one speaker, Google’s background noise made the CAP TCHA more difficult to solve. After listening to a few of Google’s CAP TCHAs, we noticed that although the background noise consisted of human voices, the same background noise was repeated. reCAP TCHA had similar noise to Google, but they had a larger selection of noise thus making it harder to learn. reCAP TCHA also has the longest solution length making it more difficult to get perfectly correct. Finally, reCAPTCHA used many different speakers causing it to be the strongest CAP TCHA of the three we tested. In conclusion, an audio CAP TCHA that consists of a finite vocabulary and background noise should have multiple speakers and noise similar to the speakers. 7 Recomm e n d at i o n s f or creat i n g st ro n g e r aud i o CAPTCHAs Due to our success in solving audio CAP TCHAs, we have decided to start developing new audio CAP TCHAs that our methods, and machine learning methods in general, will be less likely to solve. From our experiments, we note that CAP TCHAs containing longer solutions and multiple speakers tend to be more difficult to solve. Also, because our methods depend on the amount of training data we have, having a large vocabulary would make it more difficult to collect enough training data. Already since obtaining these results, reCAPTCHA.net has updated their audio CAP TCHA to contain more distortions and a larger vocabulary: the digits 0 through 99. In designing a new audio CAP TCHA we are also concerned with the human pass rate. The current human pass rate for the reCAPTCHA audio CAP TCHAs is only 70%. To develop an audio CAP TCHA with an improved human pass rate, we plan to take advantage of the human mind’s ability to understand distorted audio through context clues. By listening to a phrase instead of to random isolated words, humans are better able to decipher distorted utterances because they are familiar with the phrase or can use contextual clues to decipher the distorted audio. Using this idea, the audio for our new audio CAP TCHA will be taken from old-time radio programs in which the poor quality of the audio makes transcription by ASR systems difficult. Users will be presented with an audio clip consisting of a 4-6 word phrase. Half of the CAPTCHA consists of words, which validate a user to be human, while the other half of the words need to be transcribed. This is the same idea behind the visual reCAP TCHA that is currently digitizing text on which OCR fails. We expect that this new audio CAP TCHA will be more secure than the current version and easier for humans to pass. Initial experiments using this idea show this to be true [9]. 8 Co n c l u s i o n We have succeeded in “breaking” three different types of widely used audio CAP TCHAs, even though these were developed with the purpose of defeating attacks by machine learning techniques. We believe our results can be improved by selecting optimal segment sizes, but that is unnecessary given our already high success rate. For our experiments, segment sizes were not chosen in a special way; occasionally yielding results in which a segment only contained half of a word, causing our prediction to contain that particular word twice. We also believe that the AdaBoost results can be improved, particularly for the Digg audio CAP TCHAs, by ensuring that the number of negative training samples is closer to the number of positive training samples. We have shown that our approach is successful and can be used with many different audio CAP TCHAs that contain small finite vocabularies. A ck n o w l e d g m e n t s This work was partially supported by generous gifts from the Heinz Endowment, by an equipment grant from Intel Corporation, and by the Army Research Office through grant number DAAD19-02-1-0389 to CyLab at Carnegie Mellon University. Luis von Ahn was partially supported by a Microsoft Research New Faculty Fellowship and a MacArthur Fellowship. Jennifer Tam was partially supported by a Google Anita Borg Scholarship. R e f e re n c e s [1] L. von Ahn, M. Blum, and J. Langford. “Telling Humans and Computers Apart Automatically,” Communication s of the ACM, vol. 47, no. 2, pp. 57-60, Feb. 2004. [2] G. Mori and J. Malik. “Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA,” In Computer Vision and Pattern Recognition CVPR'03, June 2003. [3] K. Chellapilla, and P. Simard, “ U sing Machine Learning to Break Visual Human Interactio n Proofs (HIP s),” Advances in Neural Information P rocessing Systems 17, Neural Info rmatio n P rocessing Systems (NIPS'2004), MIT Press. [4] H. Hermansk y, “ Perceptual Linear Predictive (PL P) Analysis of Speech,” J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738-1752, Apr. 1990. [5] H. Hermansk y, N. Morgan, A. Bayya, and P. Kohn. “RASTA-PL P Speech Analysi s Technique,” In P roc. IEEE Int’l Conf. Acoustics, Speech & Signal Processing, vol. 1, pp. 121124, San Francisco, 1992. [6] R. Santamarta. “Breaking Gmail ’s Audio Captcha,” http://blog.wintercore.com/?p=11, 2008. [7] D. Ell is. “ P L P and RASTA (and MFCC, and inversion) in Matlab using melfcc.m and invmelfcc.m,” http:/ /ww w.ee.columbia.edu/~dpwe/resources/matlab/rastamat/, 2006. [8] C. Chang and C. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http: //ww w.csie.ntu.edu.tw/~cjlin/libsvm [9] A. Schlaikjer. “ A Dual-Use Speech CA PTCHA: Aiding Visually Impaired Web Users while Providing Transcriptions of Audio Streams,” Technical Report CMU-LTI-07-014, Carnegie Mellon Universi t y. November 2007.
5 0.50472665 138 nips-2008-Modeling human function learning with Gaussian processes
Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1
6 0.49983066 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
7 0.46440676 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions
8 0.4130989 73 nips-2008-Estimating Robust Query Models with Convex Optimization
9 0.40637487 249 nips-2008-Variational Mixture of Gaussian Process Experts
10 0.39309388 172 nips-2008-Optimal Response Initiation: Why Recent Experience Matters
11 0.38487229 185 nips-2008-Privacy-preserving logistic regression
12 0.36741742 5 nips-2008-A Transductive Bound for the Voted Classifier with an Application to Semi-supervised Learning
13 0.36177757 124 nips-2008-Load and Attentional Bayes
14 0.36023426 231 nips-2008-Temporal Dynamics of Cognitive Control
15 0.34855911 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
16 0.33965614 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
17 0.33414149 94 nips-2008-Goal-directed decision making in prefrontal cortex: a computational framework
18 0.3202334 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
19 0.31943029 75 nips-2008-Estimating vector fields using sparse basis field expansions
20 0.31204611 121 nips-2008-Learning to Use Working Memory in Partially Observable Environments through Dopaminergic Reinforcement
topicId topicWeight
[(6, 0.086), (7, 0.078), (12, 0.042), (15, 0.02), (18, 0.192), (28, 0.215), (57, 0.07), (59, 0.02), (63, 0.024), (64, 0.02), (71, 0.014), (77, 0.042), (78, 0.024), (83, 0.066)]
simIndex simValue paperId paperTitle
1 0.98439747 33 nips-2008-Bayesian Model of Behaviour in Economic Games
Author: Debajyoti Ray, Brooks King-casas, P. R. Montague, Peter Dayan
Abstract: Classical game theoretic approaches that make strong rationality assumptions have difficulty modeling human behaviour in economic games. We investigate the role of finite levels of iterated reasoning and non-selfish utility functions in a Partially Observable Markov Decision Process model that incorporates game theoretic notions of interactivity. Our generative model captures a broad class of characteristic behaviours in a multi-round Investor-Trustee game. We invert the generative process for a recognition model that is used to classify 200 subjects playing this game against randomly matched opponents. 1
2 0.94570214 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring
Author: Norm Aleks, Stuart Russell, Michael G. Madden, Diane Morabito, Kristan Staudenmayer, Mitchell Cohen, Geoffrey T. Manley
Abstract: We describe an application of probabilistic modeling and inference technology to the problem of analyzing sensor data in the setting of an intensive care unit (ICU). In particular, we consider the arterial-line blood pressure sensor, which is subject to frequent data artifacts that cause false alarms in the ICU and make the raw data almost useless for automated decision making. The problem is complicated by the fact that the sensor data are averaged over fixed intervals whereas the events causing data artifacts may occur at any time and often have durations significantly shorter than the data collection interval. We show that careful modeling of the sensor, combined with a general technique for detecting sub-interval events and estimating their duration, enables detection of artifacts and accurate estimation of the underlying blood pressure values. Our model’s performance identifying artifacts is superior to two other classifiers’ and about as good as a physician’s. 1
3 0.93659872 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
Author: Zhengdong Lu, Jeffrey Kaye, Todd K. Leen
Abstract: We develop new techniques for time series classification based on hierarchical Bayesian generative models (called mixed-effect models) and the Fisher kernel derived from them. A key advantage of the new formulation is that one can compute the Fisher information matrix despite varying sequence lengths and varying sampling intervals. This avoids the commonly-used ad hoc replacement of the Fisher information matrix with the identity which destroys the geometric invariance of the kernel. Our construction retains the geometric invariance, resulting in a kernel that is properly invariant under change of coordinates in the model parameter space. Experiments on detecting cognitive decline show that classifiers based on the proposed kernel out-perform those based on generative models and other feature extraction routines, and on Fisher kernels that use the identity in place of the Fisher information.
same-paper 4 0.86406088 101 nips-2008-Human Active Learning
Author: Rui M. Castro, Charles Kalish, Robert Nowak, Ruichen Qian, Tim Rogers, Xiaojin Zhu
Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings. 1
5 0.8041653 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
7 0.79915947 231 nips-2008-Temporal Dynamics of Cognitive Control
8 0.79883027 62 nips-2008-Differentiable Sparse Coding
9 0.79795885 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
10 0.79732823 4 nips-2008-A Scalable Hierarchical Distributed Language Model
11 0.79713947 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
12 0.79626054 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
13 0.79622233 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
14 0.79532754 194 nips-2008-Regularized Learning with Networks of Features
15 0.79514617 196 nips-2008-Relative Margin Machines
16 0.79513824 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
17 0.79436046 226 nips-2008-Supervised Dictionary Learning
18 0.79381043 151 nips-2008-Non-parametric Regression Between Manifolds
19 0.79372007 202 nips-2008-Robust Regression and Lasso
20 0.79371494 113 nips-2008-Kernelized Sorting