nips nips2012 nips2012-97 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yung-kyun Noh, Frank Park, Daniel D. Lee
Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. [sent-8, score-1.04]
2 We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. [sent-9, score-0.996]
3 By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. [sent-10, score-1.155]
4 Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. [sent-11, score-0.628]
5 1 Introduction The recent interest in understanding human perception and behavior from the perspective of neuroscience and cognitive psychology has spurred a revival of interest in mathematical decision theory. [sent-13, score-0.394]
6 One of the standard interpretations of this theory is that when there is a continuous input of noisy information, a decision becomes certain only after accumulating sufficient information. [sent-14, score-0.326]
7 Among the many theoretical explanations for this phenomenon, the diffusion decision model offers a particularly appealing explanation of how information is accumulated and how the time involved in making a decision affects overall accuracy. [sent-16, score-1.076]
8 The diffusion decision model considers the diffusion of accumulated evidence toward one of the competing choices, and reaches a decision when the evidence meets a pre-defined confidence level. [sent-17, score-1.715]
9 The diffusion decision model successfully explains the distribution of decision times for humans [13, 14, 15]. [sent-18, score-0.96]
10 More recently, this model offers a compelling explanation of the neuronal decision making process in the lateral intraparietal (LIP) area of the brain for perceptual decision making based on visual evidence [2, 11, 16]. [sent-19, score-0.891]
11 The fundamental premise behind this model is that there is a tradeoff between decision times and accuracy, and that both are controlled by the confidence level. [sent-20, score-0.306]
12 Now shifting our attention to machine learning, the well-known k-nearest neighbor classification uses a simple majority voting strategy that, at least in the asymptotic case, implicitly involves a similar tradeoff between time and accuracy. [sent-23, score-0.495]
13 At the same time, there is a natural preference to use less resources, or equivalently, a fewer number of nearest neighbors. [sent-25, score-0.379]
14 If one seeks to maximize the accuracy for a given number of total nearest neigh1 Figure 1: Diffusion decision model. [sent-26, score-0.698]
15 The evidence of decision making is accumulated, and it diffuses over time (to the right). [sent-27, score-0.509]
16 Once the accumulated evidence reaches one of the confidence levels of either choice, z or −z, the model stops collecting any more evidence and makes a decision. [sent-28, score-0.443]
17 In this work, we present a set of simple, theoretically sound criteria for adaptive k-nearest neighbor classification. [sent-31, score-0.466]
18 We first show that the conventional majority voting rule is identical to the diffusion decision model when applied to data from two different Poisson processes. [sent-32, score-1.061]
19 Depending on how the accumulating evidence is defined, it is possible to construct five different criteria based on different statistical tests. [sent-33, score-0.312]
20 Our five criteria are then used as diffusing evidence; once the evidence exceeds a certain confidence level, collection of information can cease and a decision can be made immediately. [sent-36, score-0.637]
21 In particular, all criteria can be cast as a function of the information of only one nearest neighbor in each class. [sent-39, score-0.74]
22 In Section 2, a particular form of the diffusion decision model is reviewed for Poisson processes, and two simple tests based on SPRT are derived. [sent-43, score-0.678]
23 The relationship between k-nearest neighbor classification and diffusion decision making is explained in Section 3. [sent-44, score-0.954]
24 In Section 4, we describe the adaptive k-nearest neighbor classification procedure in terms of the diffusion decision model, and we introduce five different criteria within this context. [sent-45, score-1.144]
25 2 Diffusion Decision Model for Two Poisson Processes The diffusion decision model is a stochastic model for decision making. [sent-47, score-0.96]
26 The model considers the diffusion of an evidence in favor of either of two possible choices by continuously accumulating the information. [sent-48, score-0.586]
27 After initial wavering between the two choices, the evidence finally reaches a level of confidence where a decision is made as in Fig. [sent-49, score-0.476]
28 In mathematical modeling of this diffusion process, Gaussian noise has been predominantly used as a model for zigzagging upon a constant drift toward a choice [3, 13]. [sent-51, score-0.422]
29 In the studies of decision making in the lateral intraparietal (LIP) area of the brain [2, 11], two Poisson processes are assumed to have rate parameters of either λ+ and λ− where we know that λ+ > λ− , but exact values are unknown. [sent-53, score-0.457]
30 When it should be determined which Poisson process has the larger rate λ+ , a sequential probability ratio test (SPRT) can be used to explain a diffusion decision model [6, 21]. [sent-54, score-0.778]
31 Here, we can consider ∆N = N1 − N2 and ∆T = T1 − T2 as two different evidences in the diffusion decision model. [sent-67, score-0.678]
32 The evidence diffuses as we collect more information, and we come to make a decision once the evidence reaches the confidence levels, ±zN for ∆N , and ±zT for ∆T . [sent-68, score-0.633]
33 Although the ∆N rule has been previously derived and used [21], we propse four more test criteria in this paper including Eq. [sent-70, score-0.314]
34 Later, we show that the diffusion decision making with these five criteria is related to different methods for k-nearest neighbor classification. [sent-72, score-1.103]
35 3 Equivalence of Diffusion Decision Model and k-Nearest Neighbor Classification A conventional k-nearest neighbor (k-NN) classification takes a majority voting strategy using k number of nearest neighbors. [sent-73, score-0.87]
36 According to Cover and Hart [4], in the limit of infinite sampling, this simple majority voting rule can produce a fairly low expected error and furthermore, this error decreases even more as a bigger k is used. [sent-74, score-0.32]
37 This theoretical result is obtained from the relationship between the k-NN classification error and the optimal Bayes error: the expected error with one nearest neighbor is always less than twice the Bayes error, and the error decreases with the number of k asymptotically to the Bayes error [4]. [sent-75, score-0.591]
38 In this situation, we can claim that the k-NN classification actually performs the aforementioned diffusion decision making for Poisson processes. [sent-76, score-0.742]
39 The identity comes from two equivalence relationships: first, the logical equivalence between two decision rules; second, the equivalence of distribution of nearest neighbors to the Poisson distribution in an asymptotic situation. [sent-77, score-1.197]
40 1 Equivalent Strategy of Majority Voting Here, we first show an equivalence between the conventional k-NN classification and a novel comparison algorithm: 3 Theorem: For two-class data, we consider the N -th nearest datum of each class from the testing point. [sent-79, score-0.593]
41 With an odd number k, majority voting rule in k-NN classification is equivalent to the rule of picking up the class to which a datum with smaller distance to the testing point belongs, for k = 2N − 1. [sent-80, score-0.537]
42 Proof: Among k-NNs of a test point, if there are more than or equal to N data having label C, for C ∈ {1, 2}, the test point is classified as class C according to the majority voting because N = (k + 1)/2 > k . [sent-81, score-0.289]
43 If we consider three distances dk to the k-th nearest neighbor among all data, 2 dN,C to the N -th nearest neighbor in class C, and dN,¬C to the N -th nearest neighbor in class nonC, then both dN,C ≤ dk and dN,¬C > dk are satisfied in this case. [sent-82, score-1.93]
44 Therefore, instead of counting the number of nearest neighbors, we can classify a test point using two separate N -th nearest neighbors of two classes and comparing the distances. [sent-85, score-1.085]
45 2 Nearest neighbors as Poisson processes The random generation of data from a particular underlying density function induces a density function of distance to the nearest neighbors. [sent-88, score-0.783]
46 Γ(N + 1) (6) This equation shows that the appearance of nearest neighbors can be approximated with Poisson processes. [sent-93, score-0.646]
47 In other words, with a growing hypersphere at a constant rate in volume, the occurrence of new points within a hypersphere will follow a Poisson distribution. [sent-94, score-0.268]
48 (6) explain data correctly, we can expect the equivalence between the diffusion decision making and k-NN classification. [sent-101, score-0.808]
49 In this case, the nearest neighbors are the samples of a Poisson process, having the rate parameter λ, which is the probability density at the test point. [sent-102, score-0.732]
50 This connection naturally exploits the conventional k-NN classification to the adaptive method of using different ks using the confidence level in the diffusion decision model. [sent-105, score-0.906]
51 4 4 Criteria for Adaptive k-NN Classification Using the equivalence settings of the diffusion decision model and the k-NN classification, we can extend the conventional majority voting strategy to more sophisticated adaptive strategies. [sent-106, score-1.128]
52 First, the SPRT criteria in the previous section, ∆N rule and ∆T rule can be used. [sent-107, score-0.413]
53 (3), we can use the numbers of nearest neighbors N1 and N2 within a fixed distance d, then compare |∆N | = |N1 − N2 | with a pre-defined confidence level zN . [sent-109, score-0.679]
54 Instead of making an immediate decision, we can collect more nearst neighbors by increasing d until Eq. [sent-110, score-0.358]
55 (4), using the correspondence of time in the original SPRT to the volume within the hypersphere in k-NN classification, we can make two different criteria for adaptive k-NN classification. [sent-114, score-0.423]
56 First, we consider two volume elements, V1 and V2 of N -th nearest neighbors, and the criterion can be rewritten as |V1 − V2 | > zV . [sent-115, score-0.47]
57 Additional criterion for the ∆T rule considers a more conservative rule using the volume of (N +1)th nearest neighbor hypersphere. [sent-117, score-1.002]
58 Since a slightly smaller hypersphere than this hypersphere still contains N number of nearest neighbors, we can make the same test more difficult to stop diffusing by replacing the smaller volume in the ∆V rule with the volume of (N + 1)-th nearest neighbor hypersphere of that class. [sent-118, score-1.665]
59 We refer to this rule as the “Conservative ∆V rule” because it is more cautious in making a decision with this strategy. [sent-119, score-0.478]
60 In the following section, we show how we can derive these probabilities and how these probabilities can be used as evidence in the diffusion decision making model. [sent-122, score-0.961]
61 (6) are obtained easily: p(λ|u) = p(λ|N ) = (u + b)N +a N +a−1 λ exp(−λ(u + b)) Γ(N + a) (b + 1)N +a N +a−1 λ exp(−λ(b + 1)) Γ(N + a) (8) (9) First, we derive P (λ1 > λ2 |u1 , u2 ) for u1 and u2 obtained using the N -th nearest neighbors in class 1 and class 2. [sent-131, score-0.74]
62 This probability from the Bayesian approach can be efficiently computed 5 (a) (b) Figure 2: Decision making process for the nearest neighbor classification with (a) 80% and (b) 90% confidence level. [sent-134, score-0.655]
63 For incrementing N -th nearest neighbors of different classes, the criterion probabilities P (λ1 > λ2 |u1 , u2 ) and P (λ1 < λ2 |u1 , u2 ) are calculated and compared with the confidence level. [sent-138, score-0.779]
64 Unless the probability exceeds the confidence level, the next (N + 1)-th nearest neighbors are collected and the criterion probabilities are calculated again. [sent-139, score-0.769]
65 In this figure, the diffusion of the criterion probability P (λ1 > λ2 |u1 , u2 ) is displayed for different realizations, where the evidence stops diffusing once the criterion passes the threshold where enough evidence has accumulated. [sent-140, score-0.867]
66 Using a larger confidence results in less error, but with a concomitant increase in the number of nearest neighbors used. [sent-142, score-0.646]
67 in an incremental fashion, and the nearest neighbor computation can be adaptively stopped with enough confidence of the evidence probability. [sent-143, score-0.71]
68 The second probability P (λ1 > λ2 |N1 , N2 ) for the number of nearest neighbors N1 and N2 within a particular distance can be similarly derived. [sent-144, score-0.646]
69 Stopping criteria for diffusion can be derived using these probabilities. [sent-150, score-0.545]
70 2 Adaptive k-NN Classification Of interest in the diffusion decision model is the relationship between the accuracy and the amount of resources needed to obtain the accuracy. [sent-152, score-0.77]
71 In a diffusion decision setting for k-NN classification, we can control the amount of resources using the confidence level. [sent-153, score-0.733]
72 2 shows the decision results of the classification with incrementing N for 1000 realizations, and a few diffusion examples of the evidence probability in Eq. [sent-164, score-0.836]
73 According to the confidence level, the average number of nearest neighbors used differs. [sent-166, score-0.646]
74 65 0 5 10 15 20 25 Average number of nearest neighbors PN DN PV CDV DV kNN CNN Race minRace MinMaxRatio Jigang 0. [sent-180, score-0.646]
75 71 0 30 (a) 5 10 15 20 Number of nearest neighbors used 25 (b) 0. [sent-186, score-0.646]
76 62 5 10 15 20 Average number of nearest neighbors 25 (c) 0. [sent-201, score-0.646]
77 6 0 5 10 15 20 Average number of nearest neighbors (d) Figure 3: Classification accuracy (vertical axis) versus the average number of nearest neighbors used (horizontal axis) for adaptive k-NN classification. [sent-202, score-1.434]
78 between strategies can be compared using the accuracies as well as the average number of nearest neighbors used. [sent-206, score-0.674]
79 5 Experiments In the experiments, we compare the accuracy of the algorithms to the number of nearest neighbors used, for various confidence levels for criteria. [sent-207, score-0.707]
80 Adaptive classification includes the comparison rule of N th nearest neighbors using three criteria—the ∆V rule (DV), the Conservative ∆V rule (CDV), and Bayesian probability in Eq. [sent-209, score-1.042]
81 (11) (PV)—, as well as the comparison rule of N1 -th and N2 -th at a given volume using two rules—the ∆N rule (DN) and Bayesian probability in Eq. [sent-210, score-0.299]
82 We present the average accuracies resulting from the use of these k-NN classification and five adaptive rules with respect to the average number of nearest neighbors used. [sent-212, score-0.779]
83 2 in 100-dimensional space, and we classified a test point based on the nearest neighbors. [sent-217, score-0.412]
84 In this figure, all algorithms are expected to approach the Bayes performance based on Cover and Hart’s approach when the average number of nearest neighbors increase. [sent-218, score-0.646]
85 Except for Jigang’s method, all algorithms perform poorly, while our five algorithms perform equally well though they use different information, probably because the performance produced by diffusion decision making algorithms is optimal. [sent-223, score-0.742]
86 Because the underlying density is non-uniform here, the result shows the performance decrease when algorithms use non-close nearest neighbors. [sent-230, score-0.432]
87 6 Conclusions In this work, we showed that k-NN classification in the asymptotic limit is equivalent to the diffusion decision model for decision making. [sent-240, score-1.003]
88 Nearest neighbor classification and the diffusion decision model are both very well known models in machine learning and cognitive science respectively, but the intimate connection between them has not been studied before. [sent-241, score-0.916]
89 Using analysis of Poisson processes, we showed how classification using incrementally increasing nearest neighbors can be mapped to a simple threshold based decision model. [sent-242, score-0.928]
90 In the diffusion decision model, the confidence level plays a key role in determining the tradeoff between speed and accuracy. [sent-243, score-0.735]
91 The notion of confidence can also be applied to nearest neighbor classification to adapt the number of nearest neighbors used in making the classification decision. [sent-244, score-1.301]
92 We presented several different criteria for choosing the appropriate number of nearest neighbors based on the sequential probability ratio test in addition to Bayesian inference. [sent-245, score-0.895]
93 Estimating the posterior probabilities using the k-nearest neighbor rule. [sent-254, score-0.25]
94 The physics of optimal decision making: A formal analysis of models of performance in two-alternative forced-choice tasks. [sent-280, score-0.282]
95 A probabilistic nearest neighbour method for statistical pattern recognition. [sent-308, score-0.379]
96 A Bayesian approach to diffusion models of decisionmaking and response time. [sent-317, score-0.396]
97 Adaptive k-nearest-neighbor classification using a dynamic number of nearest neighbors. [sent-343, score-0.379]
98 The diffusion decision model: theory and data for two-choice decision tasks. [sent-348, score-0.96]
99 A diffusion model account of masking in two-choice letter identification. [sent-354, score-0.396]
100 Optimal decision making on the basis of evidence represented in spike trains. [sent-400, score-0.465]
wordName wordTfidf (topN-words)
[('diffusion', 0.396), ('nearest', 0.379), ('decision', 0.282), ('neighbors', 0.267), ('neighbor', 0.212), ('sprt', 0.197), ('poisson', 0.18), ('jigang', 0.153), ('criteria', 0.149), ('dence', 0.146), ('hypersphere', 0.134), ('rule', 0.132), ('minmaxratio', 0.131), ('minrace', 0.131), ('evidence', 0.119), ('voting', 0.11), ('cdv', 0.109), ('classi', 0.109), ('adaptive', 0.105), ('race', 0.1), ('pv', 0.1), ('cnn', 0.095), ('dn', 0.088), ('erlang', 0.088), ('dv', 0.085), ('hart', 0.08), ('majority', 0.078), ('con', 0.069), ('equivalence', 0.066), ('making', 0.064), ('conventional', 0.063), ('stops', 0.063), ('ve', 0.06), ('diffusing', 0.058), ('criterion', 0.056), ('knn', 0.053), ('density', 0.053), ('accumulated', 0.052), ('cation', 0.052), ('datum', 0.05), ('zn', 0.049), ('pn', 0.049), ('wald', 0.048), ('accumulating', 0.044), ('shadlen', 0.044), ('diffuses', 0.044), ('asymptotic', 0.043), ('cover', 0.042), ('reaches', 0.042), ('bayesian', 0.039), ('incrementing', 0.039), ('bogacz', 0.039), ('ratcliff', 0.039), ('seoul', 0.039), ('probabilities', 0.038), ('bayes', 0.037), ('accuracy', 0.037), ('sequential', 0.036), ('zt', 0.036), ('lip', 0.036), ('volume', 0.035), ('class', 0.035), ('realizations', 0.034), ('churchland', 0.033), ('hanks', 0.033), ('kiani', 0.033), ('intraparietal', 0.033), ('test', 0.033), ('level', 0.033), ('latham', 0.032), ('memoryless', 0.032), ('processes', 0.031), ('psychology', 0.031), ('ratio', 0.031), ('decisions', 0.03), ('exceeds', 0.029), ('conservative', 0.029), ('dk', 0.029), ('resources', 0.029), ('neuroscience', 0.029), ('signals', 0.029), ('bars', 0.029), ('logical', 0.028), ('strategy', 0.028), ('accuracies', 0.028), ('ks', 0.027), ('classes', 0.027), ('collect', 0.027), ('considers', 0.027), ('mathematical', 0.026), ('cognitive', 0.026), ('amount', 0.026), ('densities', 0.025), ('holmes', 0.025), ('collecting', 0.024), ('lateral', 0.024), ('derive', 0.024), ('levels', 0.024), ('tradeoff', 0.024), ('brain', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
Author: Yung-kyun Noh, Frank Park, Daniel D. Lee
Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1
2 0.20507306 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
Author: Simon Lyons, Amos J. Storkey, Simo Särkkä
Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1
3 0.14474438 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
Author: Christoph H. Lampert
Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1
4 0.13016556 182 nips-2012-Learning Networks of Heterogeneous Influence
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
5 0.1168644 338 nips-2012-The Perturbed Variation
Author: Maayan Harel, Shie Mannor
Abstract: We introduce a new discrepancy score between two distributions that gives an indication on their similarity. While much research has been done to determine if two samples come from exactly the same distribution, much less research considered the problem of determining if two finite samples come from similar distributions. The new score gives an intuitive interpretation of similarity; it optimally perturbs the distributions so that they best fit each other. The score is defined between distributions, and can be efficiently estimated from samples. We provide convergence bounds of the estimated score, and develop hypothesis testing procedures that test if two data sets come from similar distributions. The statistical power of this procedures is presented in simulations. We also compare the score’s capacity to detect similarity with that of other known measures on real data. 1
6 0.11625334 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
7 0.10753005 242 nips-2012-Non-linear Metric Learning
8 0.098090857 200 nips-2012-Local Supervised Learning through Space Partitioning
9 0.08697705 81 nips-2012-Context-Sensitive Decision Forests for Object Detection
10 0.082005195 279 nips-2012-Projection Retrieval for Classification
11 0.080889791 148 nips-2012-Hamming Distance Metric Learning
12 0.075317748 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
13 0.071809255 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
14 0.071332753 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking
15 0.067883886 265 nips-2012-Parametric Local Metric Learning for Nearest Neighbor Classification
16 0.060045294 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning
17 0.057697512 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
18 0.056789536 75 nips-2012-Collaborative Ranking With 17 Parameters
19 0.055932526 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
20 0.054098181 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss
topicId topicWeight
[(0, 0.171), (1, -0.0), (2, -0.043), (3, 0.011), (4, -0.028), (5, 0.001), (6, 0.01), (7, 0.109), (8, -0.03), (9, 0.034), (10, -0.019), (11, -0.087), (12, 0.101), (13, -0.044), (14, -0.111), (15, -0.013), (16, -0.081), (17, -0.025), (18, 0.026), (19, 0.044), (20, 0.076), (21, 0.048), (22, 0.023), (23, 0.057), (24, 0.048), (25, -0.117), (26, 0.023), (27, -0.028), (28, -0.013), (29, -0.073), (30, -0.047), (31, -0.001), (32, 0.006), (33, -0.054), (34, 0.059), (35, 0.102), (36, 0.074), (37, 0.137), (38, 0.042), (39, 0.24), (40, -0.184), (41, -0.06), (42, 0.033), (43, 0.055), (44, 0.043), (45, -0.008), (46, -0.016), (47, -0.05), (48, -0.05), (49, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.94409734 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
Author: Yung-kyun Noh, Frank Park, Daniel D. Lee
Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1
2 0.63296413 140 nips-2012-Fusion with Diffusion for Robust Visual Tracking
Author: Yu Zhou, Xiang Bai, Wenyu Liu, Longin J. Latecki
Abstract: A weighted graph is used as an underlying structure of many algorithms like semisupervised learning and spectral clustering. If the edge weights are determined by a single similarity measure, then it hard if not impossible to capture all relevant aspects of similarity when using a single similarity measure. In particular, in the case of visual object matching it is beneficial to integrate different similarity measures that focus on different visual representations. In this paper, a novel approach to integrate multiple similarity measures is proposed. First pairs of similarity measures are combined with a diffusion process on their tensor product graph (TPG). Hence the diffused similarity of each pair of objects becomes a function of joint diffusion of the two original similarities, which in turn depends on the neighborhood structure of the TPG. We call this process Fusion with Diffusion (FD). However, a higher order graph like the TPG usually means significant increase in time complexity. This is not the case in the proposed approach. A key feature of our approach is that the time complexity of the diffusion on the TPG is the same as the diffusion process on each of the original graphs. Moreover, it is not necessary to explicitly construct the TPG in our framework. Finally all diffused pairs of similarity measures are combined as a weighted sum. We demonstrate the advantages of the proposed approach on the task of visual tracking, where different aspects of the appearance similarity between the target object in frame t − 1 and target object candidates in frame t are integrated. The obtained method is tested on several challenge video sequences and the experimental results show that it outperforms state-of-the-art tracking methods. 1
3 0.63115764 336 nips-2012-The Coloured Noise Expansion and Parameter Estimation of Diffusion Processes
Author: Simon Lyons, Amos J. Storkey, Simo Särkkä
Abstract: Stochastic differential equations (SDE) are a natural tool for modelling systems that are inherently noisy or contain uncertainties that can be modelled as stochastic processes. Crucial to the process of using SDE to build mathematical models is the ability to estimate parameters of those models from observed data. Over the past few decades, significant progress has been made on this problem, but we are still far from having a definitive solution. We describe a novel method of approximating a diffusion process that we show to be useful in Markov chain Monte-Carlo (MCMC) inference algorithms. We take the ‘white’ noise that drives a diffusion process and decompose it into two terms. The first is a ‘coloured noise’ term that can be deterministically controlled by a set of auxilliary variables. The second term is small and enables us to form a linear Gaussian ‘small noise’ approximation. The decomposition allows us to take a diffusion process of interest and cast it in a form that is amenable to sampling by MCMC methods. We explain why many state-of-the-art inference methods fail on highly nonlinear inference problems, and we demonstrate experimentally that our method performs well in such situations. Our results show that this method is a promising new tool for use in inference and parameter estimation problems. 1
4 0.56066811 182 nips-2012-Learning Networks of Heterogeneous Influence
Author: Nan Du, Le Song, Ming Yuan, Alex J. Smola
Abstract: Information, disease, and influence diffuse over networks of entities in both natural systems and human society. Analyzing these transmission networks plays an important role in understanding the diffusion processes and predicting future events. However, the underlying transmission networks are often hidden and incomplete, and we observe only the time stamps when cascades of events happen. In this paper, we address the challenging problem of uncovering the hidden network only from the cascades. The structure discovery problem is complicated by the fact that the influence between networked entities is heterogeneous, which can not be described by a simple parametric model. Therefore, we propose a kernelbased method which can capture a diverse range of different types of influence without any prior assumption. In both synthetic and real cascade data, we show that our model can better recover the underlying diffusion network and drastically improve the estimation of the transmission functions among networked entities. 1
5 0.54264569 242 nips-2012-Non-linear Metric Learning
Author: Dor Kedem, Stephen Tyree, Fei Sha, Gert R. Lanckriet, Kilian Q. Weinberger
Abstract: In this paper, we introduce two novel metric learning algorithms, χ2 -LMNN and GB-LMNN, which are explicitly designed to be non-linear and easy-to-use. The two approaches achieve this goal in fundamentally different ways: χ2 -LMNN inherits the computational benefits of a linear mapping from linear metric learning, but uses a non-linear χ2 -distance to explicitly capture similarities within histogram data sets; GB-LMNN applies gradient-boosting to learn non-linear mappings directly in function space and takes advantage of this approach’s robustness, speed, parallelizability and insensitivity towards the single additional hyperparameter. On various benchmark data sets, we demonstrate these methods not only match the current state-of-the-art in terms of kNN classification error, but in the case of χ2 -LMNN, obtain best results in 19 out of 20 learning settings. 1
6 0.5046435 279 nips-2012-Projection Retrieval for Classification
7 0.47204608 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning
8 0.45155457 73 nips-2012-Coding efficiency and detectability of rate fluctuations with non-Poisson neuronal firing
9 0.45135406 338 nips-2012-The Perturbed Variation
10 0.4453105 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis
11 0.43307504 148 nips-2012-Hamming Distance Metric Learning
12 0.42978454 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
13 0.42802489 200 nips-2012-Local Supervised Learning through Space Partitioning
14 0.42720196 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
15 0.42648694 219 nips-2012-Modelling Reciprocating Relationships with Hawkes Processes
16 0.424207 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
17 0.41483939 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data
18 0.41185716 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates
19 0.40190175 265 nips-2012-Parametric Local Metric Learning for Nearest Neighbor Classification
20 0.39350343 81 nips-2012-Context-Sensitive Decision Forests for Object Detection
topicId topicWeight
[(0, 0.035), (17, 0.354), (21, 0.047), (38, 0.095), (42, 0.033), (54, 0.014), (55, 0.02), (74, 0.034), (76, 0.154), (80, 0.101), (92, 0.03)]
simIndex simValue paperId paperTitle
Author: C. M. Niu, Sirish Nandyala, Won J. Sohn, Terence Sanger
Abstract: Our central goal is to quantify the long-term progression of pediatric neurological diseases, such as a typical 10-15 years progression of child dystonia. To this purpose, quantitative models are convincing only if they can provide multi-scale details ranging from neuron spikes to limb biomechanics. The models also need to be evaluated in hyper-time, i.e. significantly faster than real-time, for producing useful predictions. We designed a platform with digital VLSI hardware for multiscale hyper-time emulations of human motor nervous systems. The platform is constructed on a scalable, distributed array of Field Programmable Gate Array (FPGA) devices. All devices operate asynchronously with 1 millisecond time granularity, and the overall system is accelerated to 365x real-time. Each physiological component is implemented using models from well documented studies and can be flexibly modified. Thus the validity of emulation can be easily advised by neurophysiologists and clinicians. For maximizing the speed of emulation, all calculations are implemented in combinational logic instead of clocked iterative circuits. This paper presents the methodology of building FPGA modules emulating a monosynaptic spinal loop. Emulated activities are qualitatively similar to real human data. Also discussed is the rationale of approximating neural circuitry by organizing neurons with sparse interconnections. In conclusion, our platform allows emulating pathological abnormalities such that motor symptoms will emerge and can be analyzed. It compels us to test the origins of childhood motor disorders and predict their long-term progressions. 1 Challenges of studying developmental motor disorders There is currently no quantitative model of how a neuropathological condition, which mainly affects the function of neurons, ends up causing the functional abnormalities identified in clinical examinations. The gap in knowledge is particularly evident for disorders in developing human nervous systems, i.e. childhood neurological diseases. In these cases, the ultimate clinical effect of cellu1 lar injury is compounded by a complex interplay among the child’s injury, development, behavior, experience, plasticity, etc. Qualitative insight has been provided by clinical experiences into the association between particular types of injury and particular types of outcome. Their quantitative linkages, nevertheless, have yet to be created – neither in clinic nor in cellular physiological tests. This discrepancy is significantly more prominent for individual child patients, which makes it very difficult to estimate the efficacy of treatment plans. In order to understand the consequence of injury and discover new treatments, it is necessary to create a modeling toolset with certain design guidelines, such that child neurological diseases can be quantitatively analyzed. Perhaps more than any other organ, the brain necessarily operates on multiple spatial and temporal scales. On the one hand, it is the neurons that perform fundamental computations, but neurons have to interact with large-scale organs (ears, eyes, skeletal muscles, etc.) to achieve global functions. This multi-scale nature worths more attention in injuries, where the overall deficits depend on both the cellular effects of injuries and the propagated consequences. On the other hand, neural processes in developmental diseases usually operate on drastically different time scales, e.g. spinal reflex in milliseconds versus learning in years. Thus when studying motor nervous systems, mathematical modeling is convincing only if it can provide multi-scale details, ranging from neuron spikes to limb biomechanics; also the models should be evaluated with time granularity as small as 1 millisecond, meanwhile the evaluation needs to continue trillions of cycles in order to cover years of life. It is particularly challenging to describe the multi-scale nature of human nervous system when modeling childhood movement disorders. Note that for a child who suffered brain injury at birth, the full development of all motor symptoms may easily take more than 10 years. Therefore the millisecondbased model needs to be evaluated significantly faster than real-time, otherwise the model will fail to produce any useful predictions in time. We have implemented realistic models for spiking motoneurons, sensory neurons, neural circuitry, muscle fibers and proprioceptors using VLSI and programmable logic technologies. All models are computed in Field Programmable Gate Array (FPGA) hardware in 365 times real-time. Therefore one year of disease progression can be assessed after one day of emulation. This paper presents the methodology of building the emulation platform. The results demonstrate that our platform is capable of producing physiologically realistic multi-scale signals, which are usually scarce in experiments. Successful emulations enabled by this platform will be used to verify theories of neuropathology. New treatment mechanisms and drug effects can also be emulated before animal experiments or clinical trials. 2 Methodology of multi-scale neural emulation A. Human arm B. Monosynaptic spinal loop C. Inner structure of muscle spindle Gamma Secondary dynamic Gamma output input static Primary input output Bag 1 αMN Bag 2 Chain Figure 1: Illustration of the multi-scale nature of motor nervous system. The motor part of human nervous system is responsible for maintaining body postures and generating voluntary movements. The multi-scale nature of motor nervous system is demonstrated in Fig.1. When the elbow (Fig.1A) is maintaining a posture or performing a movement, a force is established by the involved muscle based on how much spiking excitation the muscle receives from its αmotoneurons (Fig.1B). The α-motoneurons are regulated by a variety of sensory input, part of which comes directly from the proprioceptors in the muscle. As the primary proprioceptor found in skeletal muscles, a muscle spindle is another complex system that has its own microscopic Multiple-InputMultiple-Output structure (Fig.1C). Spindles continuously provide information about the length and lengthening speed of the muscle fiber. A muscle with its regulating motoneurons, sensory neurons and proprioceptors constitutes a monosynaptic spinal loop. This minimalist neurophysiological 2 structure is used as an example for explaining the multi-scale hyper-time emulation in hardware. Additional structures can be added to the backbone set-up using similar methodologies. 2.1 Modularized architecture for multi-scale models Decades of studies on neurophysiology provided an abundance of models characterizing different components of the human motor nervous system. The informational characteristics of physiological components allowed us to model them as functional structures, i.e. each of which converting input signals to certain outputs. In particular, within a monosynaptic spinal loop illustrated in Fig.1B, stretching the muscle will elicit a chain of physiological activities in: muscle stretch ⇒ spindle ⇒ sensory neuron ⇒ synapse ⇒ motoneuron ⇒ muscle contraction. The adjacent components must have compatible interfaces, and the interfacing variables must also be physiologically realistic. In our design, each component is mathematically described in Table 1: Table 1: Functional definition of neural models COMPONENT Neuron Synapse Muscle Spindle MATHEMATICAL DEFINITION S(t) = fneuron (I, t) I(t) = fsynapse (S, t) ˙ T (t) = fmuscle (S, L, L, t) ˙ Γdynamic , Γstatic , t) A(t) = fspindle (L, L, all components are modeled as black-box functions that map the inputs to the outputs. The meanings of these mathematical definitions are explained below. This design allows existing physiological models to be easily inserted and switched. In all models the input signals are time-varying, e.g. I = I(t), L = L(t) , etc. The argument of t in input signals are omitted throughout this paper. 2.2 Selection of models for emulation Models were selected in consideration of their computational cost, physiological verisimilitude, and whether it can be adapted to the mathematical form defined in Table 1. Model of Neuron The informational process for a neuron is to take post-synaptic current I as the input, and produce a binary spike train S in the output. The neuron model adopted in the emulation was developed by Izhikevich [1]: = 0.04v 2 + 5v + 140 − u + I = a(bv − u) v u (1) (2) if v = 30 mV, then v ← c, u ← u + d where a, b, c, d are free parameters tuned to achieve certain firing patterns. Membrane potential v directly determines a binary spike train S(t) that S(t) = 1 if v ≥ 30, otherwise S(t) = 0. Note that v in Izhikevich model is in millivolts and time t is in milliseconds. Therefore the coefficients in eq.1 need to be adjusted in correspondence to SI units. Model of Synapse When a pre-synaptic neuron spikes, i.e. S(0) = 1, an excitatory synapse subsequently issues an Excitatory Post-Synaptic Current (EPSC) that drives the post-synaptic neuron. Neural recording of hair cells in rats [2] provided evidence that the time profile of EPSC can be well characterized using the equations below: I(t) = Vm × e t d Vm −τ 0 t − e− τr Vm if t ≥ 0 (3) otherwise The key parameters in a synapse model is the time constants for rising (τr ) and decaying (τd ). In our emulation τr = 0.001 s and τr = 0.003 s. 3 Model of Muscle force and electromyograph (EMG) The primary effect of skeletal muscle is to convert α-motoneuron spikes S into force T , depending ˙ on the muscle’s instantaneous length L and lengthening speed L. We used Hill’s muscle model in the emulation with parameter tuning described in [3]. Another measurable output of muscle is electromyograph (EMG). EMG is the small skin current polarized by motor unit action potential (MUAP) when it travels along muscle fibers. Models exist to describe the typical waveform picked by surface EMG electrodes. In this project we chose to implement the one described in [4]. Model of Proprioceptor Spindle is a sensory organ that provides the main source of proprioceptive information. As can be seen in Fig.1C, a spindle typically produces two afferent outputs (primary Ia and secondary II) ˙ according to its gamma fusimotor drives (Γdynamic and Γstatic ) and muscle states (L and L). There is currently no closed-form models describing spindle functions due to spindle’s significant nonlinearity. On representative model that numerically approximates the spindle dynamics was developed by Mileusnic et al. [5]. The model used differential equations to characterize a typical cat soleus spindle. Eqs.4-10 present a subset of this model for one type of spindle fiber (bag1): Γdynamic − x0 /τ Γdynamic + Ω2 bag1 x0 ˙ = x1 ˙ = x2 1 = [TSR − TB − TP R − Γ1 x0 ] M x2 ˙ (4) (5) (6) where TSR TB TP R CSS = KSR (L − x1 − LSR0 ) (7) 0.3 = (B0 + B1 x0 ) · (x1 − R) · CSS · |x2 | = KP R (x1 − LP R0 ) 2 = −1 −1000x2 1+e (8) (9) (10) Eq.8 and 10 suggest that evaluating the spindle model requires multiplication, division as well as more complex arithmetics like polynomials and exponentials. The implementation details are described in Section 3. 2.3 Neuron connectivity with sparse interconnections Although the number of spinal neurons (~1 billion) is significantly less compared to that of cortical neurons (~100 billion), a fully connected spinal network still means approximately 2 trillion synaptic endings [6]. Implementing such a huge number of synapses imposes a major challenge, if not impossible, given limited hardware resource. In this platform we approximated the neural connectivity by sparsely connecting sensory neurons to motoneurons as parallel pathways. We do not attempt to introduce the full connectivity. The rationale is that in a neural control system, the effect of a single neuron can be considered as mapping current state x to change in state x through a band-limited channel. Therefore when a collection of ˙ neurons are firing stochastically, the probability of x depends on both x and the firing behavior s ˙ (s = 1 when spiking, otherwise s = 0) of each neuron, as such: p(x|x, s) = p(x|s = 1)p(s = 1|x) + p(x|s = 0)p(s = 0|x) ˙ ˙ ˙ (11) Eq.11 is a master equation that determines a probability flow on the state. From the Kramers-Moyal expansion we can associate this probability flow with a partial differential equation: ∂ p(x, t) ∂t ∞ − = i=1 ∂ ∂x i D(i) (x)p(x, t) (12) where D(i) (x) is a time-invariant term that modifies the change of probability density based on its i-th gradient. 4 Under certain conditions [7, 8], D(i) (x) for i > 2 all vanish and therefore the probability flow can be described deterministically using a linear operator L: ∂ ∂ ∂ 2 (2) D (x) p(x, t) = Lp(x, t) (13) p(x, t) = − D(1) (x) + ∂t ∂x ∂x2 This means that various Ls can be superimposed to achieve complex system dynamics (illustrated in Fig.2A). B. Equivalent network with sparse interconnections A. Neuron function as superimposed linear operators SN Sensory Input + SN SN SN αMN αMN αMN Motor Output αMN Figure 2: Functions of neuron population can be described as the combination of linear operators (A). Therefore the original neural function can be equivalently produced by sparsely connected neurons formalizing parallel pathways (B). As a consequence, the statistical effect of two fully connected neuron populations is equivalent to ones that are only sparsely connected, as long as the probability flow can be described by the same L. For a movement task, in particular, it is the statistical effect from the neuron ensemble onto skeletal muscles that determines the global behavior. Therefore we argue that it is feasible to approximate the spinal cord connectivity by sparsely interconnecting sensory and motor neurons (Fig.2B). Here a pool of homogenous sensory neurons projects to another pool of homogeneous α-motoneurons. Pseudorandom noise is added to the input of all homogeneous neurons within a population. It is worth noting that this approximation significantly reduces the number of synapses that need to be implemented in hardware. 3 Hardware implementation on FPGA We select FPGA as the implementation device due to its inherent parallelism that resembles the nervous system. FPGA is favored over GPU or clustered CPUs because it is relatively easy to network hundreds of nodes under flexible protocols. The platform is distributed on multiple nodes of Xilinx Spartan-6 devices. The interfacing among FPGAs and computers is created using OpalKelly development board XEM6010. The dynamic range of variables is tight in models of Izhikevich neuron, synapse and EMG. This helps maintaining the accuracy of models even when they are evaluated in 32-bit fixed-point arithmetics. The spindle model, in contrast, requires floating-point arithmetics due to its wide dynamic range and complex calculations (see eq.4-10). Hyper-time computations with floating-point numbers are resource consuming and therefore need to be implemented with special attentions. 3.1 Floating-point arithmetics in combinational logic Our arithmetic implementations are compatible with IEEE-754 standard. Typical floating-point arithmetic IP cores are either pipe-lined or based on iterative algorithms such as CORDIC, all of which require clocks to schedule the calculation. In our platform, no clock is provided for model evaluations thus all arithmetics need to be executed in pure combinational logic. Taking advantage of combinational logic allows all model evaluations to be 1) fast, the evaluation time depends entirely on the propagating and settling time of signals, which is on the order of microseconds, and 2) parallel, each model is evaluated on its own circuit without waiting for any other results. Our implementations of adder and multiplier are inspired by the open source project “Free FloatingPoint Madness”, available at http://www.hmc.edu/chips/. Please contact the authors of this paper if the modified code is needed. 5 Fast combinational floating-point division Floating-point division is even more resource demanding than multiplications. We avoided directly implementing the dividing algorithm by approximating it with additions and multiplications. Our approach is inspired by an algorithm described in [9], which provides a good approximation of the inverse square root for any positive number x within one Newton-Raphson iteration: 1 x Q(x) = √ ≈ x(1.5 − · x2 ) 2 x (x > 0) (14) Q(x) can be implemented only using floating-point adders and multipliers. Thereby any division with a positive divisor can be achieved if two blocks of Q(x) are concatenated: a a (15) = √ √ = a · Q(b) · Q(b) (b > 0) b b· b This algorithm has been adjusted to also work with negative divisors (b < 0). Numerical integrators for differential equations Evaluating the instantaneous states of differential equation models require a fixed-step numerical integrator. Backward Euler’s Method was chosen to balance the numerical error and FPGA usage: x ˙ xn+1 = f (x, t) = xn + T f (xn+1 , tn+1 ) (16) (17) where T is the sampling interval. f (x, t) is the derivative function for state variable x. 3.2 Asynchronous spike-based communication between FPGA chips Clock Spike clean count Counter 1 1 2 1 2 3 Figure 3: Timing diagram of asynchronous spike-based communication FPGA nodes are networked by transferring 1-bit binary spikes to each other. Our design allowed the sender and the receiver to operate on independent clocks without having to synchronize. The timing diagram of the spike-based communication is shown in Fig.3. The sender issues Spike with a pulse width of 1/(365 × Femu ) second. Each Spike then triggers a counting event on the receiver, meanwhile each Clock first reads the accumulated spike count and subsequently cleans the counter. Note that the phase difference between Spike and Clock is not predictable due to asynchronicity. 3.3 Serialize neuron evaluations within a homogeneous population Different neuron populations are instantiated as standalone circuits. Within in each population, however, homogeneous neurons mentioned in Section 2.3 are evaluated in series in order to optimize FPGA usage. Within each FPGA node all modules operate with a central clock, which is the only source allowed to trigger any updating event. Therefore the maximal number of neurons that can be serialized (Nserial ) is restrained by the following relationship: Ffpga = C × Nserial × 365 × Femu (18) Here Ffpga is the fastest clock rate that a FPGA can operate on; C = 4 is the minimal clock cycles needed for updating each state variable in the on-chip memory; Femu = 1 kHz is the time granularity of emulation (1 millisecond), and 365 × Femu represents 365x real-time. Consider that Xilinx 6 Spartan-6 FPGA devices peaks at 200MHz central clock frequency, the theoretical maximum of neurons that can be serialized is Nserial 200 MHz/(4 × 365 × 1 kHz) ≈ 137 (19) In the current design we choose Nserial = 128. 4 Results: emulated activities of motor nervous system Figure 4 shows the implemented monosynaptic spinal loop in schematics and in operation. Each FPGA node is able to emulate monosynaptic spinal loops consisting of 1,024 sensory and 1,024 motor neurons, i.e. 2,048 neurons in total. The spike-based asynchronous communication is successful between two FPGA nodes. Note that the emulation has to be significantly slowed down for on-line plotting. When the emulation is at full speed (365x real-time) the software front-end is not able to visualize the signals due to limited data throughput. 128 SNs 128 αMNs SN αMN 128 SNs 128 αMNs SN αMN ... 8 parallel pathways 2,048 neurons Figure 4: The neural emulation platform in operation. Left: Neural circuits implemented for each FPGA node including 2,048 neurons. SN = Sensory Neuron; αMN = α-motoneuron. Center: One working FPGA node. Right: Two FPGA nodes networked using asynchronous spiking protocol. The emulation platform successfully created multi-scale information when the muscle is externally stretched (Fig.5A). We also tested if our emulated motor system is able to produce the recruitment order and size principles observed in real physiological data. It has been well known that when a voluntary motor command is sent to the α-motoneuron pool, the motor units are recruited in an order that small ones get recruited first, followed by the big ones [10]. The comparison between our results and real data are shown in Fig.5B, where the top panel shows 20 motor unit activities emulated using our platform, and the bottom panel shows decoded motor unit activities from real human EMG [11]. No qualitative difference was found. 5 Discussion and future work We designed a hardware platform for emulating the multi-scale motor nervous activities in hypertime. We managed to use one node of single Xilinx Spartan-6 FPGA to emulate monosynaptic spinal loops consisting of 2,048 neurons, associated muscles and proprioceptors. The neurons are organized as parallel pathways with sparse interconnections. The emulation is successfully accelerated to 365x real-time. The platform can be scaled by networking multiple FPGA nodes, which is enabled by an asynchronous spike-based communication protocol. The emulated monosynaptic spinal loops are capable of producing reflex-like activities in response to muscle stretch. Our results of motor unit recruitment order are compatible with the physiological data collected in real human subjects. There is a question of whether this stochastic system turns out chaotic, especially with accumulated errors from Backward Euler’s integrator. Note that the firing property of a neuron population is usually stable even with explicit noise [8], and spindle inputs are measured from real robots so the integrator errors are corrected at every iteration. To our knowledge, the system is not critically sensitive to the initial conditions or integrator errors. This question, however, is both interesting and important for in-depth investigations in the future. 7 It has been shown [12] that replicating classic types of spinal interneurons (propriospinal, Iaexcitatory, Ia-inhibitory, Renshaw, etc.) is sufficient to produce stabilizing responses and rapid reaching movement in a wrist. Our platform will introduce those interneurons to describe the known spinal circuitry in further details. Physiological models will also be refined as needed. For the purpose of modeling movement behavior or diseases, Izhikevich model is a good balance between verisimilitude and computational cost. Nevertheless when testing drug effects along disease progression, neuron models are expected to cover sufficient molecular details including how neurotransmitters affect various ion channels. With the advancing of programmable semiconductor technology, it is expected to upgrade our neuron model to Hodgkin-Huxley’s. For the muscle models, Hill’s type of model does not fit the muscle properties accurately enough when the muscle is being shortened. Alternative models will be tested. Other studies showed that the functional dexterity of human limbs – especially in the hands – is critically enabled by the tendon configurations and joint geometry [13]. As a result, if our platform is used to understand whether known neurophysiology and biomechanics are sufficient to produce able and pathological movements, it will be necessary to use this platform to control human-like limbs. Since the emulation speed can be flexibly adjusted from arbitrarily slow to 365x real-time, when speeded to exactly 1x real-time the platform will function as a digital controller with 1kHz refresh rate. The main purpose of the emulation is to learn how certain motor disorders progress during childhood development. This first requires the platform to reproduce motor symptoms that are compatible with clinical observations. For example it has been suggested that muscle spasticity in rats is associated with decreased soma size of α-motoneurons [14], which presumably reduced the firing threshold of neurons. Thus when lower firing threshold is introduced to the emulated motoneuron pool, similar EMG patterns as in [15] should be observed. It is also necessary for the symptoms to evolve with neural plasticity. In the current version we presume that the structure of each component remains time invariant. In the future work Spike Timing Dependent Plasticity (STDP) will be introduced such that all components are subject to temporal modifications. B. Verify motor unit recruitment pattern A. Multi-scale activities from emulation Emulation 1s Stretch Spindle Ia Sensory post-synaptic current Real Data Motoneurons Muscle Force EMG Figure 5: A) Physiological activity emulated by each model when the muscle is sinusoidally stretched. B) Comparing the emulated motor unit recruitment order with real experimental data. Acknowledgments The authors thank Dr. Gerald Loeb for helping set up the emulation of spindle models. This project is supported by NIH NINDS grant R01NS069214-02. 8 References [1] Izhikevich, E. M. Simple model of spiking neurons. IEEE transactions on neural networks / a publication of the IEEE Neural Networks Council 14, 1569–1572 (2003). [2] Glowatzki, E. & Fuchs, P. A. Transmitter release at the hair cell ribbon synapse. Nature neuroscience 5, 147–154 (2002). [3] Shadmehr, R. & Wise, S. P. A Mathematical Muscle Model. In Supplementary documents for “Computational Neurobiology of Reaching and Pointing”, 1–18 (MIT Press, Cambridge, MA, 2005). [4] Fuglevand, A. J., Winter, D. A. & Patla, A. E. Models of recruitment and rate coding organization in motor-unit pools. Journal of neurophysiology 70, 2470–2488 (1993). [5] Mileusnic, M. P., Brown, I. E., Lan, N. & Loeb, G. E. Mathematical models of proprioceptors. I. Control and transduction in the muscle spindle. Journal of neurophysiology 96, 1772–1788 (2006). [6] Gelfan, S., Kao, G. & Ruchkin, D. S. The dendritic tree of spinal neurons. The Journal of comparative neurology 139, 385–411 (1970). [7] Sanger, T. D. Neuro-mechanical control using differential stochastic operators. In Engineering in Medicine and Biology Society (EMBC), 2010 Annual International Conference of the IEEE, 4494–4497 (2010). [8] Sanger, T. D. Distributed control of uncertain systems using superpositions of linear operators. Neural computation 23, 1911–1934 (2011). [9] Lomont, C. Fast inverse square root (2003). URL http://www.lomont.org/Math/Papers/ 2003/InvSqrt.pdf. [10] Henneman, E. Relation between size of neurons and their susceptibility to discharge. Science (New York, N.Y.) 126, 1345–1347 (1957). [11] De Luca, C. J. & Hostage, E. C. Relationship between firing rate and recruitment threshold of motoneurons in voluntary isometric contractions. Journal of neurophysiology 104, 1034–1046 (2010). [12] Raphael, G., Tsianos, G. A. & Loeb, G. E. Spinal-like regulator facilitates control of a two-degree-offreedom wrist. The Journal of neuroscience : the official journal of the Society for Neuroscience 30, 9431–9444 (2010). [13] Valero-Cuevas, F. J. et al. The tendon network of the fingers performs anatomical computation at a macroscopic scale. IEEE transactions on bio-medical engineering 54, 1161–1166 (2007). [14] Brashear, A. & Elovic, E. Spasticity: Diagnosis and Management (Demos Medical, 2010), 1 edn. [15] Levin, M. F. & Feldman, A. G. The role of stretch reflex threshold regulation in normal and impaired motor control. Brain research 657, 23–30 (1994). 9
2 0.79383934 62 nips-2012-Burn-in, bias, and the rationality of anchoring
Author: Falk Lieder, Thomas Griffiths, Noah Goodman
Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1
3 0.79383934 116 nips-2012-Emergence of Object-Selective Features in Unsupervised Feature Learning
Author: Adam Coates, Andrej Karpathy, Andrew Y. Ng
Abstract: Recent work in unsupervised feature learning has focused on the goal of discovering high-level features from unlabeled images. Much progress has been made in this direction, but in most cases it is still standard to use a large amount of labeled data in order to construct detectors sensitive to object classes or other complex patterns in the data. In this paper, we aim to test the hypothesis that unsupervised feature learning methods, provided with only unlabeled data, can learn high-level, invariant features that are sensitive to commonly-occurring objects. Though a handful of prior results suggest that this is possible when each object class accounts for a large fraction of the data (as in many labeled datasets), it is unclear whether something similar can be accomplished when dealing with completely unlabeled data. A major obstacle to this test, however, is scale: we cannot expect to succeed with small datasets or with small numbers of learned features. Here, we propose a large-scale feature learning system that enables us to carry out this experiment, learning 150,000 features from tens of millions of unlabeled images. Based on two scalable clustering algorithms (K-means and agglomerative clustering), we find that our simple system can discover features sensitive to a commonly occurring object class (human faces) and can also combine these into detectors invariant to significant global distortions like large translations and scale. 1
same-paper 4 0.79168975 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification
Author: Yung-kyun Noh, Frank Park, Daniel D. Lee
Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classification. We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria. 1
5 0.76674068 283 nips-2012-Putting Bayes to sleep
Author: Dmitry Adamskiy, Manfred K. Warmuth, Wouter M. Koolen
Abstract: We consider sequential prediction algorithms that are given the predictions from a set of models as inputs. If the nature of the data is changing over time in that different models predict well on different segments of the data, then adaptivity is typically achieved by mixing into the weights in each round a bit of the initial prior (kind of like a weak restart). However, what if the favored models in each segment are from a small subset, i.e. the data is likely to be predicted well by models that predicted well before? Curiously, fitting such “sparse composite models” is achieved by mixing in a bit of all the past posteriors. This self-referential updating method is rather peculiar, but it is efficient and gives superior performance on many natural data sets. Also it is important because it introduces a long-term memory: any model that has done well in the past can be recovered quickly. While Bayesian interpretations can be found for mixing in a bit of the initial prior, no Bayesian interpretation is known for mixing in past posteriors. We build atop the “specialist” framework from the online learning literature to give the Mixing Past Posteriors update a proper Bayesian foundation. We apply our method to a well-studied multitask learning problem and obtain a new intriguing efficient update that achieves a significantly better bound. 1
6 0.66719311 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach
7 0.60989368 23 nips-2012-A lattice filter model of the visual pathway
8 0.60310239 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding
9 0.56976527 341 nips-2012-The topographic unsupervised learning of natural sounds in the auditory cortex
10 0.56515038 345 nips-2012-Topic-Partitioned Multinetwork Embeddings
11 0.56290084 190 nips-2012-Learning optimal spike-based representations
12 0.55842143 153 nips-2012-How Prior Probability Influences Decision Making: A Unifying Probabilistic Model
13 0.55748719 77 nips-2012-Complex Inference in Neural Circuits with Probabilistic Population Codes and Topic Models
14 0.55738449 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
15 0.55617476 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations
16 0.55454981 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization
17 0.55049062 279 nips-2012-Projection Retrieval for Classification
18 0.54969591 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks
19 0.54823476 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study
20 0.54764485 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model