nips nips2001 nips2001-83 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki
Abstract: Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. [sent-4, score-0.374]
2 At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. [sent-5, score-0.172]
3 It is important to study the relation between the generalization error and the training error at singularities. [sent-6, score-0.093]
4 The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. [sent-7, score-0.294]
5 1 Introduction A neural network is specified by a number of parameters which are synaptic weights and biases. [sent-8, score-0.067]
6 Learning takes place by modifying these parameters from observed input-output examples. [sent-9, score-0.073]
7 Then, a network is represented by a point in the parameter space S, where () plays the role of a coordinate system. [sent-14, score-0.124]
8 A learning process is represented by a trajectory in the neuromanifold. [sent-16, score-0.026]
9 The dynamical behavior of learning is known to be very slow, because of the plateau phenomenon. [sent-17, score-0.105]
10 The statistical physical method [1] has made it clear that plateaus are ubiquitous in a large-scale perceptron. [sent-18, score-0.22]
11 In order to improve the dynamics of learning, the natural gradient learning method has been introduced by taking the Riemannian geometrical structure of the neuromanifold into account [2, 3]. [sent-19, score-0.468]
12 Its adaptive version, where the inverse of the Fisher information matrix is estimated adaptively, is shown to have excellent behaviors by computer simulations [4, 5]. [sent-20, score-0.145]
13 Because of the symmetry in the architecture of the multilayer perceptrons, the parameter space of the MLP admits an equivalence relation [6, 7]. [sent-21, score-0.541]
14 The residue class divided by the equivalence relation gives rise to singularities in the neuromanifold, and plateaus exist at such singularities [8]. [sent-22, score-1.49]
15 The Fisher information matrix becomes singular at singularities, so that the neuromanifold is strongly curved like the spacetime including black holes. [sent-23, score-0.595]
16 In the neighborhood of singularit ies, the Fisher-Cramer-Rao paradigm does not hold, and the estimator is no more subject to the Gaussian distribution even asymptotically. [sent-24, score-0.224]
17 The AlC and MDL criteria of model selection use the Gaussian paradigm, so that it is not appropriate. [sent-26, score-0.028]
18 The problem was first pointed out by Hagiwara et al. [sent-27, score-0.034]
19 Watanabe [10] applied algebraic geometry to elucidate the behavior of the Bayesian predictive estimator in MLP, showing sharp difference in regular cases and singular cases. [sent-29, score-0.664]
20 Fukumizu [11] gives a general analysis of the maximum likelihood estimators in singular statistical models including the multilayer perceptrons. [sent-30, score-0.478]
21 The present paper is a first step to elucidate effects of singularities in the neuromanifold of multilayer perceptrons. [sent-31, score-1.447]
22 We use a simple cone model to elucidate how different the behaviors of the maximum likelihood estimator and the Bayes predictive distribution are from the regular case. [sent-32, score-0.852]
23 To this end, we introduce the Gaussian random field [11, 12, 13], and analyze the generalization error and training error for both the mle (maximum likelihood estimator) and the Bayes estimator. [sent-33, score-0.09]
24 2 Topology of neuromanifold Let us consider MLP with h hidden units and one output unit, h Y= L Vi<{J (Wi· x) + n. [sent-34, score-0.52]
25 Let us summarize all the parameters in a single parameter vector () = (Wl , ···, Wh; Vl , ···, Vh) and write h f(x; ()) = L Vi<{J (Wi· x). [sent-36, score-0.073]
26 (2) i=l Then, () is a coordinate system of the neuromanifold. [sent-37, score-0.04]
27 Because of the noise, the input-output relation is stochastic, given by the conditional probability distribution p(ylx,()) = 1 {I -2(y-f(x;())) 2} , J2 exp (3) where we normalized the scale of noise equal to 1. [sent-38, score-0.093]
28 Each point in the neuromanifold represents a neural network or its probability distribution. [sent-39, score-0.443]
29 It is known that the behavior of MLP is invariant under 1) permutations of hidden units , and 2) sign change of both Wi and Vi at the same time. [sent-40, score-0.24]
30 Two networks are equivalent when they are mapped by any of the above operations which form a group. [sent-41, score-0.029]
31 Hence, it is natural to treat the residual space SI ::::J, where ::::J is the equivalence relation. [sent-42, score-0.138]
32 There are some points which are invariant under a some nontrivial isotropy subgroup, on which singularities occurs. [sent-43, score-0.692]
33 When Vi = 0, vi<{J (Wi· x) = 0 so that all the points on the sub manifold Vi = 0 are equivalent whatever Wi is. [sent-44, score-0.161]
34 Hence, in M = SI ::::J, all of these points are reduced to one and the same point. [sent-46, score-0.029]
35 When Wi = Wj hold, these two units may be merged into one, and when Vi +Vj is the same, the two points are equivalent even when they differ in Vi - Vj. [sent-47, score-0.16]
36 Hence, the dimension reduction takes place in the subspace satisfying Wi = Wj. [sent-48, score-0.038]
37 Such singularities occur on the critical submanifolds of the two types (4) 3 Simple toy models Given training data, the parameters of the neural network are estimated or trained by learning. [sent-49, score-0.674]
38 It is important to elucidate the effects of singularities on learning or estimation. [sent-50, score-0.79]
39 One is a very simple multilayer percept ron having only one hidden unit. [sent-52, score-0.433]
40 The other is a simple cone model: Let x be Gaussian random variable x E R d +2 , with mean p, and identity covariance matrix I , (5) and let 5 = {p,Ip, E R d +2 } be the parameter space. [sent-53, score-0.334]
41 The cone model M is a subset of 5, embedded as M : p, (6) where c is a constant, IIa 2 11 = 1, When d = 1, 51 is a circle so that p, = W W E 5 d and 5 d is a d-dimensional unit sphere. [sent-54, score-0.287]
42 is replaced by angle B, and we have ~ VI + c2 See Figure 1. [sent-55, score-0.029]
wordName wordTfidf (topN-words)
[('singularities', 0.575), ('neuromanifold', 0.403), ('multilayer', 0.254), ('cone', 0.228), ('vi', 0.191), ('elucidate', 0.182), ('wi', 0.157), ('mlp', 0.152), ('singular', 0.122), ('estimator', 0.117), ('plateaus', 0.115), ('tomoko', 0.115), ('relation', 0.093), ('behaviors', 0.085), ('mdl', 0.085), ('equivalence', 0.082), ('paradigm', 0.08), ('fisher', 0.076), ('ubiquitous', 0.076), ('predictive', 0.072), ('perceptrons', 0.07), ('hold', 0.067), ('geometrical', 0.065), ('regular', 0.063), ('hidden', 0.061), ('toy', 0.059), ('units', 0.056), ('amari', 0.055), ('watanabe', 0.05), ('residue', 0.05), ('hirosawa', 0.05), ('subgroup', 0.05), ('si', 0.047), ('nontrivial', 0.046), ('merged', 0.046), ('ylx', 0.046), ('mle', 0.046), ('ron', 0.046), ('ies', 0.046), ('wako', 0.046), ('wl', 0.046), ('parameter', 0.044), ('likelihood', 0.044), ('whatever', 0.042), ('percept', 0.042), ('permutations', 0.042), ('vh', 0.042), ('riemannian', 0.042), ('adaptively', 0.042), ('invariant', 0.042), ('gaussian', 0.04), ('network', 0.04), ('coordinate', 0.04), ('plateau', 0.04), ('vl', 0.04), ('wj', 0.04), ('attack', 0.04), ('saitama', 0.04), ('behavior', 0.039), ('curved', 0.038), ('algebraic', 0.038), ('admits', 0.038), ('riken', 0.038), ('place', 0.038), ('wh', 0.036), ('modifying', 0.035), ('vj', 0.035), ('pointed', 0.034), ('sub', 0.034), ('topology', 0.034), ('effects', 0.033), ('bayes', 0.033), ('implying', 0.032), ('matrix', 0.032), ('park', 0.031), ('sharp', 0.031), ('circle', 0.031), ('maximum', 0.031), ('japan', 0.03), ('symmetry', 0.03), ('simple', 0.03), ('perceptron', 0.03), ('residual', 0.03), ('equivalent', 0.029), ('summarize', 0.029), ('fields', 0.029), ('angle', 0.029), ('physical', 0.029), ('points', 0.029), ('hence', 0.029), ('unit', 0.028), ('selection', 0.028), ('excellent', 0.028), ('estimators', 0.027), ('synaptic', 0.027), ('manifold', 0.027), ('neighborhood', 0.027), ('treat', 0.026), ('dynamical', 0.026), ('trajectory', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons
Author: Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki
Abstract: Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1
2 0.055724356 76 nips-2001-Fast Parameter Estimation Using Green's Functions
Author: K. Wong, F. Li
Abstract: We propose a method for the fast estimation of hyperparameters in large networks, based on the linear response relation in the cavity method, and an empirical measurement of the Green's function. Simulation results show that it is efficient and precise, when compared with cross-validation and other techniques which require matrix inversion. 1
3 0.04832188 35 nips-2001-Analysis of Sparse Bayesian Learning
Author: Anita C. Faul, Michael E. Tipping
Abstract: The recent introduction of the 'relevance vector machine' has effectively demonstrated how sparsity may be obtained in generalised linear models within a Bayesian framework. Using a particular form of Gaussian parameter prior, 'learning' is the maximisation, with respect to hyperparameters, of the marginal likelihood of the data. This paper studies the properties of that objective function, and demonstrates that conditioned on an individual hyperparameter, the marginal likelihood has a unique maximum which is computable in closed form. It is further shown that if a derived 'sparsity criterion' is satisfied, this maximum is exactly equivalent to 'pruning' the corresponding parameter from the model. 1
4 0.046625286 16 nips-2001-A Parallel Mixture of SVMs for Very Large Scale Problems
Author: Ronan Collobert, Samy Bengio, Yoshua Bengio
Abstract: Support Vector Machines (SVMs) are currently the state-of-the-art models for many classification problems but they suffer from the complexity of their training algorithm which is at least quadratic with respect to the number of examples. Hence, it is hopeless to try to solve real-life problems having more than a few hundreds of thousands examples with SVMs. The present paper proposes a new mixture of SVMs that can be easily implemented in parallel and where each SVM is trained on a small subset of the whole dataset. Experiments on a large benchmark dataset (Forest) as well as a difficult speech database , yielded significant time improvement (time complexity appears empirically to locally grow linearly with the number of examples) . In addition, and that is a surprise, a significant improvement in generalization was observed on Forest. 1
5 0.045577236 72 nips-2001-Exact differential equation population dynamics for integrate-and-fire neurons
Author: Julian Eggert, Berthold Bäuml
Abstract: Mesoscopical, mathematical descriptions of dynamics of populations of spiking neurons are getting increasingly important for the understanding of large-scale processes in the brain using simulations. In our previous work, integral equation formulations for population dynamics have been derived for a special type of spiking neurons. For Integrate- and- Fire type neurons , these formulations were only approximately correct. Here, we derive a mathematically compact, exact population dynamics formulation for Integrate- and- Fire type neurons. It can be shown quantitatively in simulations that the numerical correspondence with microscopically modeled neuronal populations is excellent. 1 Introduction and motivation The goal of the population dynamics approach is to model the time course of the collective activity of entire populations of functionally and dynamically similar neurons in a compact way, using a higher descriptionallevel than that of single neurons and spikes. The usual observable at the level of neuronal populations is the populationaveraged instantaneous firing rate A(t), with A(t)6.t being the number of neurons in the population that release a spike in an interval [t, t+6.t). Population dynamics are formulated in such a way, that they match quantitatively the time course of a given A(t), either gained experimentally or by microscopical, detailed simulation. At least three main reasons can be formulated which underline the importance of the population dynamics approach for computational neuroscience. First, it enables the simulation of extensive networks involving a massive number of neurons and connections, which is typically the case when dealing with biologically realistic functional models that go beyond the single neuron level. Second, it increases the analytical understanding of large-scale neuronal dynamics , opening the way towards better control and predictive capabilities when dealing with large networks. Third, it enables a systematic embedding of the numerous neuronal models operating at different descriptional scales into a generalized theoretic framework, explaining the relationships, dependencies and derivations of the respective models. Early efforts on population dynamics approaches date back as early as 1972, to the work of Wilson and Cowan [8] and Knight [4], which laid the basis for all current population-averaged graded-response models (see e.g. [6] for modeling work using these models). More recently, population-based approaches for spiking neurons were developed, mainly by Gerstner [3, 2] and Knight [5]. In our own previous work [1], we have developed a theoretical framework which enables to systematize and simulate a wide range of models for population-based dynamics. It was shown that the equations of the framework produce results that agree quantitatively well with detailed simulations using spiking neurons, so that they can be used for realistic simulations involving networks with large numbers of spiking neurons. Nevertheless, for neuronal populations composed of Integrate-and-Fire (I&F;) neurons, this framework was only correct in an approximation. In this paper, we derive the exact population dynamics formulation for I&F; neurons. This is achieved by reducing the I&F; population dynamics to a point process and by taking advantage of the particular properties of I&F; neurons. 2 2.1 Background: Integrate-and-Fire dynamics Differential form We start with the standard Integrate- and- Fire (I&F;) model in form of the wellknown differential equation [7] (1) which describes the dynamics of the membrane potential Vi of a neuron i that is modeled as a single compartment with RC circuit characteristics. The membrane relaxation time is in this case T = RC with R being the membrane resistance and C the membrane capacitance. The resting potential v R est is the stationary potential that is approached in the no-input case. The input arriving from other neurons is described in form of a current ji. In addition to eq. (1), which describes the integrate part of the I&F; model, the neuronal dynamics are completed by a nonlinear step. Every time the membrane potential Vi reaches a fixed threshold () from below, Vi is lowered by a fixed amount Ll > 0, and from the new value of the membrane potential integration according to eq. (1) starts again. if Vi(t) = () (from below) . (2) At the same time, it is said that the release of a spike occurred (i.e., the neuron fired), and the time ti = t of this singular event is stored. Here ti indicates the time of the most recent spike. Storing all the last firing times , we gain the sequence of spikes {t{} (spike ordering index j, neuronal index i). 2.2 Integral form Now we look at the single neuron in a neuronal compound. We assume that the input current contribution ji from presynaptic spiking neurons can be described using the presynaptic spike times tf, a response-function ~ and a connection weight W¡ . ',J ji(t) = Wi ,j ~(t - tf) (3) l: l: j f Integrating the I&F; equation (1) beginning at the last spiking time tT, which determines the initial condition by Vi(ti) = vi(ti - 0) - 6., where vi(ti - 0) is the membrane potential just before the neuron spikes, we get 1 Vi(t) = v Rest + fj(t - t:) + l: Wi ,j l: a(t - t:; t - tf) , j - Vi(t:)) e- S / T (4) f with the refractory function fj(s) = - (v Rest (5) and the alpha-function r ds
6 0.043157954 15 nips-2001-A New Discriminative Kernel From Probabilistic Models
7 0.042754564 84 nips-2001-Global Coordination of Local Linear Models
8 0.040999509 67 nips-2001-Efficient Resources Allocation for Markov Decision Processes
9 0.040672231 155 nips-2001-Quantizing Density Estimators
10 0.038830034 13 nips-2001-A Natural Policy Gradient
11 0.037191298 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
12 0.037073482 98 nips-2001-Information Geometrical Framework for Analyzing Belief Propagation Decoder
13 0.036699187 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
14 0.036464997 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
15 0.03630767 88 nips-2001-Grouping and dimensionality reduction by locally linear embedding
16 0.03577397 43 nips-2001-Bayesian time series classification
17 0.035676051 86 nips-2001-Grammatical Bigrams
18 0.035560925 21 nips-2001-A Variational Approach to Learning Curves
19 0.034342486 96 nips-2001-Information-Geometric Decomposition in Spike Analysis
20 0.034106281 9 nips-2001-A Generalization of Principal Components Analysis to the Exponential Family
topicId topicWeight
[(0, -0.115), (1, -0.018), (2, -0.01), (3, -0.041), (4, 0.004), (5, -0.019), (6, 0.029), (7, -0.019), (8, -0.017), (9, -0.007), (10, -0.017), (11, 0.036), (12, 0.024), (13, -0.056), (14, 0.04), (15, 0.002), (16, -0.009), (17, -0.056), (18, 0.07), (19, 0.001), (20, -0.002), (21, 0.035), (22, 0.079), (23, 0.052), (24, 0.0), (25, 0.002), (26, -0.004), (27, 0.002), (28, -0.08), (29, 0.015), (30, 0.103), (31, -0.016), (32, 0.048), (33, -0.042), (34, -0.019), (35, 0.032), (36, 0.02), (37, 0.026), (38, 0.022), (39, -0.059), (40, 0.066), (41, 0.025), (42, 0.058), (43, 0.157), (44, 0.026), (45, -0.168), (46, 0.094), (47, -0.01), (48, 0.018), (49, -0.113)]
simIndex simValue paperId paperTitle
same-paper 1 0.90604275 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons
Author: Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki
Abstract: Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1
2 0.50829035 76 nips-2001-Fast Parameter Estimation Using Green's Functions
Author: K. Wong, F. Li
Abstract: We propose a method for the fast estimation of hyperparameters in large networks, based on the linear response relation in the cavity method, and an empirical measurement of the Green's function. Simulation results show that it is efficient and precise, when compared with cross-validation and other techniques which require matrix inversion. 1
3 0.45675102 26 nips-2001-Active Portfolio-Management based on Error Correction Neural Networks
Author: Hans-Georg Zimmermann, Ralph Neuneier, Ralph Grothmann
Abstract: This paper deals with a neural network architecture which establishes a portfolio management system similar to the Black / Litterman approach. This allocation scheme distributes funds across various securities or financial markets while simultaneously complying with specific allocation constraints which meet the requirements of an investor. The portfolio optimization algorithm is modeled by a feedforward neural network. The underlying expected return forecasts are based on error correction neural networks (ECNN), which utilize the last model error as an auxiliary input to evaluate their own misspecification. The portfolio optimization is implemented such that (i.) the allocations comply with investor’s constraints and that (ii.) the risk of the portfolio can be controlled. We demonstrate the profitability of our approach by constructing internationally diversified portfolios across different financial markets of the G7 contries. It turns out, that our approach is superior to a preset benchmark portfolio. ¢ £¡ 1 Introduction: Portfolio-Management We integrate the portfolio optimization algorithm suggested by Black / Litterman [1] into a neural network architecture. Combining the mean-variance theory [5] with the capital asset pricing model (CAPM) [7], this approach utilizes excess returns of the CAPM equilibrium to define a neutral, well balanced benchmark portfolio. Deviations from the benchmark allocation are only allowed within preset boundaries. Hence, as an advantage, there are no unrealistic solutions (e. g. large short positions, huge portfolio changes). Moreover, there is no need of formulating return expectations for all assets. In contrast to Black / Litterman, excess return forecasts are estimated by time-delay recurrent error correction neural networks [8]. Investment decisions which comply with given allocation constraints are derived from these predictions. The risk exposure of the portfolio is implicitly controlled by a parameter-optimizing task over time (sec. 3 and 5). Our approach consists of the following three steps: (i.) Construction of forecast models on the basis of error correction neural networks (ECNN) for all assets (sec. 2). ¤§© © © § ¥ ¦¨¦¤ To whom correspondence should be addressed: Georg.Zimmermann@mchp.siemens.de. ¤ ¤ (ii.) Computation of excess returns by a higher-level feedforward network (sec. 3 and 4). By this, the profitability of an asset with respect to all others is measured. on the basis of the excess returns. (iii.) Optimization of the investment proportions Allocation constraints ensure, that the investment proportions may deviate from a given benchmark only within predefined intervals (sec. 3 and 4). £ § ¨¡ ¥ £¡ ¦¤¢ ¡ © ¡ © Finally, we apply our neural network based portfolio management system to an asset allocation problem concerning the G7 countries (sec. 6). 2 Forecasting by Error Correction Neural Networks Most dynamical systems are driven by a superposition of autonomous development and external influences [8]. For discrete time grids, such a dynamics can be described by a recurrent state transition and an output equation (Eq. 1). ¥ § § state transition eq. output eq. (1) $
4 0.43191656 155 nips-2001-Quantizing Density Estimators
Author: Peter Meinicke, Helge Ritter
Abstract: We suggest a nonparametric framework for unsupervised learning of projection models in terms of density estimation on quantized sample spaces. The objective is not to optimally reconstruct the data but instead the quantizer is chosen to optimally reconstruct the density of the data. For the resulting quantizing density estimator (QDE) we present a general method for parameter estimation and model selection. We show how projection sets which correspond to traditional unsupervised methods like vector quantization or PCA appear in the new framework. For a principal component quantizer we present results on synthetic and realworld data, which show that the QDE can improve the generalization of the kernel density estimator although its estimate is based on significantly lower-dimensional projection indices of the data.
5 0.37972093 79 nips-2001-Gaussian Process Regression with Mismatched Models
Author: Peter Sollich
Abstract: Learning curves for Gaussian process regression are well understood when the 'student' model happens to match the 'teacher' (true data generation process). I derive approximations to the learning curves for the more generic case of mismatched models, and find very rich behaviour: For large input space dimensionality, where the results become exact, there are universal (student-independent) plateaux in the learning curve, with transitions in between that can exhibit arbitrarily many over-fitting maxima; over-fitting can occur even if the student estimates the teacher noise level correctly. In lower dimensions, plateaux also appear, and the learning curve remains dependent on the mismatch between student and teacher even in the asymptotic limit of a large number of training examples. Learning with excessively strong smoothness assumptions can be particularly dangerous: For example, a student with a standard radial basis function covariance function will learn a rougher teacher function only logarithmically slowly. All predictions are confirmed by simulations. 1
6 0.37595028 85 nips-2001-Grammar Transfer in a Second Order Recurrent Neural Network
7 0.36525515 96 nips-2001-Information-Geometric Decomposition in Spike Analysis
8 0.36496845 131 nips-2001-Neural Implementation of Bayesian Inference in Population Codes
9 0.35262832 37 nips-2001-Associative memory in realistic neuronal networks
10 0.34757027 120 nips-2001-Minimax Probability Machine
11 0.33558393 35 nips-2001-Analysis of Sparse Bayesian Learning
12 0.33287895 154 nips-2001-Products of Gaussians
13 0.331655 177 nips-2001-Switch Packet Arbitration via Queue-Learning
14 0.32473683 116 nips-2001-Linking Motor Learning to Function Approximation: Learning in an Unlearnable Force Field
15 0.31480777 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
16 0.31429788 84 nips-2001-Global Coordination of Local Linear Models
17 0.31369439 197 nips-2001-Why Neuronal Dynamics Should Control Synaptic Learning Rules
18 0.31194675 95 nips-2001-Infinite Mixtures of Gaussian Process Experts
19 0.31118202 38 nips-2001-Asymptotic Universality for Learning Curves of Support Vector Machines
20 0.31100678 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
topicId topicWeight
[(14, 0.013), (17, 0.018), (19, 0.534), (27, 0.087), (30, 0.051), (38, 0.026), (59, 0.02), (72, 0.031), (79, 0.037), (91, 0.065), (93, 0.02)]
simIndex simValue paperId paperTitle
1 0.94189852 93 nips-2001-Incremental A*
Author: S. Koenig, M. Likhachev
Abstract: Incremental search techniques find optimal solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. While researchers have developed incremental versions of uninformed search methods, we develop an incremental version of A*. The first search of Lifelong Planning A* is the same as that of A* but all subsequent searches are much faster because it reuses those parts of the previous search tree that are identical to the new search tree. We then present experimental results that demonstrate the advantages of Lifelong Planning A* for simple route planning tasks. 1 Overview Artificial intelligence has investigated knowledge-based search techniques that allow one to solve search tasks in large domains. Most of the research on these methods has studied how to solve one-shot search problems. However, search is often a repetitive process, where one needs to solve a series of similar search tasks, for example, because the actual situation turns out to be slightly different from the one initially assumed or because the situation changes over time. An example for route planning tasks are changing traffic conditions. Thus, one needs to replan for the new situation, for example if one always wants to display the least time-consuming route from the airport to the conference center on a web page. In these situations, most search methods replan from scratch, that is, solve the search problems independently. Incremental search techniques share with case-based planning, plan adaptation, repair-based planning, and learning search-control knowledge the property that they find solutions to series of similar search tasks much faster than is possible by solving each search task from scratch. Incremental search techniques, however, differ from the other techniques in that the quality of their solutions is guaranteed to be as good as the quality of the solutions obtained by replanning from scratch. Although incremental search methods are not widely known in artificial intelligence and control, different researchers have developed incremental search versions of uninformed search methods in the algorithms literature. An overview can be found in [FMSN00]. We, on the other hand, develop an incremental version of A*, thus combining ideas from the algorithms literature and the artificial intelligence literature. We call the algorithm Lifelong Planning A* (LPA*), in analogy to “lifelong learning” [Thr98], because it reuses £ We thank Anthony Stentz for his support. The Intelligent Decision-Making Group is partly supported by NSF awards under contracts IIS9984827, IIS-0098807, and ITR/AP-0113881. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the sponsoring organizations and agencies or the U.S. government. information from previous searches. LPA* uses heuristics to focus the search and always finds a shortest path for the current edge costs. The first search of LPA* is the same as that of A* but all subsequent searches are much faster. LPA* produces at least the search tree that A* builds. However, it achieves a substantial speedup over A* because it reuses those parts of the previous search tree that are identical to the new search tree. 2 The Route Planning Task Lifelong Planning A* (LPA*) solves the following search task: It applies to finite graph search problems on known graphs whose edge costs can increase or decrease over time. denotes the finite set of vertices of the graph. denotes the set of successors of vertex . Similarly, denotes the set of predecessors of vertex . denotes the cost of moving from vertex to vertex . LPA* always determines a shortest path from a given start vertex to a given goal vertex , knowing both the topology of the graph and the current edge costs. We use to denote the start distance of vertex , that is, the length of a shortest path from to . ¨ ¨¦ £ £ ¡ ©§¥¤¢ FP HFE TSRQIGD¨ ¨¦ £ £ ¡ 4 ©D¥CBA@!¨ ¨ ¨¦
2 0.90904659 124 nips-2001-Modeling the Modulatory Effect of Attention on Human Spatial Vision
Author: Laurent Itti, Jochen Braun, Christof Koch
Abstract: We present new simulation results , in which a computational model of interacting visual neurons simultaneously predicts the modulation of spatial vision thresholds by focal visual attention, for five dual-task human psychophysics experiments. This new study complements our previous findings that attention activates a winnertake-all competition among early visual neurons within one cortical hypercolumn. This
same-paper 3 0.90061969 83 nips-2001-Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons
Author: Shun-ichi Amari, Hyeyoung Park, Tomoko Ozeki
Abstract: Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1
4 0.80854106 119 nips-2001-Means, Correlations and Bounds
Author: Martijn Leisink, Bert Kappen
Abstract: The partition function for a Boltzmann machine can be bounded from above and below. We can use this to bound the means and the correlations. For networks with small weights, the values of these statistics can be restricted to non-trivial regions (i.e. a subset of [-1 , 1]). Experimental results show that reasonable bounding occurs for weight sizes where mean field expansions generally give good results. 1
5 0.76401764 109 nips-2001-Learning Discriminative Feature Transforms to Low Dimensions in Low Dimentions
Author: Kari Torkkola
Abstract: The marriage of Renyi entropy with Parzen density estimation has been shown to be a viable tool in learning discriminative feature transforms. However, it suffers from computational complexity proportional to the square of the number of samples in the training data. This sets a practical limit to using large databases. We suggest immediate divorce of the two methods and remarriage of Renyi entropy with a semi-parametric density estimation method, such as a Gaussian Mixture Models (GMM). This allows all of the computation to take place in the low dimensional target space, and it reduces computational complexity proportional to square of the number of components in the mixtures. Furthermore, a convenient extension to Hidden Markov Models as commonly used in speech recognition becomes possible.
6 0.45460081 1 nips-2001-(Not) Bounding the True Error
7 0.40898201 13 nips-2001-A Natural Policy Gradient
8 0.40500617 103 nips-2001-Kernel Feature Spaces and Nonlinear Blind Souce Separation
9 0.40225309 186 nips-2001-The Noisy Euclidean Traveling Salesman Problem and Learning
10 0.39938655 192 nips-2001-Tree-based reparameterization for approximate inference on loopy graphs
11 0.39837405 52 nips-2001-Computing Time Lower Bounds for Recurrent Sigmoidal Neural Networks
12 0.37256238 138 nips-2001-On the Generalization Ability of On-Line Learning Algorithms
13 0.37055916 147 nips-2001-Pranking with Ranking
14 0.3686133 155 nips-2001-Quantizing Density Estimators
15 0.36842787 114 nips-2001-Learning from Infinite Data in Finite Time
16 0.36228892 68 nips-2001-Entropy and Inference, Revisited
17 0.3619079 54 nips-2001-Contextual Modulation of Target Saliency
18 0.36069262 137 nips-2001-On the Convergence of Leveraging
19 0.36004001 64 nips-2001-EM-DD: An Improved Multiple-Instance Learning Technique