nips nips2000 knowledge-graph by maker-knowledge-mining
1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams
Author: Robert St-Aubin, Jesse Hoey, Craig Boutilier
Abstract: We propose a method of approximate dynamic programming for Markov decision processes (MDPs) using algebraic decision diagrams (ADDs). We produce near-optimal value functions and policies with much lower time and space requirements than exact dynamic programming. Our method reduces the sizes of the intermediate value functions generated during value iteration by replacing the values at the terminals of the ADD with ranges of values. Our method is demonstrated on a class of large MDPs (with up to 34 billion states), and we compare the results with the optimal value functions.
2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications
Author: Michael S. Gray, Terrence J. Sejnowski, Javier R. Movellan
Abstract: We examine eight different techniques for developing visual representations in machine vision tasks. In particular we compare different versions of principal component and independent component analysis in combination with stepwise regression methods for variable selection. We found that local methods, based on the statistics of image patches, consistently outperformed global methods based on the statistics of entire images. This result is consistent with previous work on emotion and facial expression recognition. In addition, the use of a stepwise regression technique for selecting variables and regions of interest substantially boosted performance. 1
3 nips-2000-A Gradient-Based Boosting Algorithm for Regression Problems
Author: Richard S. Zemel, Toniann Pitassi
Abstract: In adaptive boosting, several weak learners trained sequentially are combined to boost the overall algorithm performance. Recently adaptive boosting methods for classification problems have been derived as gradient descent algorithms. This formulation justifies key elements and parameters in the methods, all chosen to optimize a single common objective function. We propose an analogous formulation for adaptive boosting of regression problems, utilizing a novel objective function that leads to a simple boosting algorithm. We prove that this method reduces training error, and compare its performance to other regression methods. The aim of boosting algorithms is to
4 nips-2000-A Linear Programming Approach to Novelty Detection
Author: Colin Campbell, Kristin P. Bennett
Abstract: Novelty detection involves modeling the normal behaviour of a system hence enabling detection of any divergence from normality. It has potential applications in many areas such as detection of machine damage or highlighting abnormal features in medical data. One approach is to build a hypothesis estimating the support of the normal data i. e. constructing a function which is positive in the region where the data is located and negative elsewhere. Recently kernel methods have been proposed for estimating the support of a distribution and they have performed well in practice - training involves solution of a quadratic programming problem. In this paper we propose a simpler kernel method for estimating the support based on linear programming. The method is easy to implement and can learn large datasets rapidly. We demonstrate the method on medical and fault detection datasets. 1 Introduction. An important classification task is the ability to distinguish b etween new instances similar to m embers of the training set and all other instances that can occur. For example, we may want to learn the normal running behaviour of a machine and highlight any significant divergence from normality which may indicate onset of damage or faults. This issue is a generic problem in many fields. For example, an abnormal event or feature in medical diagnostic data typically leads to further investigation. Novel events can be highlighted by constructing a real-valued density estimation function. However, here we will consider the simpler task of modelling the support of a data distribution i.e. creating a binary-valued function which is positive in those regions of input space where the data predominantly lies and negative elsewhere. Recently kernel methods have been applied to this problem [4]. In this approach data is implicitly mapped to a high-dimensional space called feature space [13]. Suppose the data points in input space are X i (with i = 1, . . . , m) and the mapping is Xi --+ ¢;(Xi) then in the span of {¢;(Xi)}, we can expand a vector w = Lj cr.j¢;(Xj). Hence we can define separating hyperplanes in feature space by w . ¢;(x;) + b = O. We will refer to w . ¢;(Xi) + b as the margin which will be positive on one side of the separating hyperplane and negative on the other. Thus we can also define a decision function: (1) where z is a new data point. The data appears in the form of an inner product in feature space so we can implicitly define feature space by our choice of kernel function: (2) A number of choices for the kernel are possible, for example, RBF kernels: (3) With the given kernel the decision function is therefore given by: (4) One approach to novelty detection is to find a hypersphere in feature space with a minimal radius R and centre a which contains most of the data: novel test points lie outside the boundary of this hypersphere [3 , 12] . This approach to novelty detection was proposed by Tax and Duin [10] and successfully used on real life applications [11] . The effect of outliers is reduced by using slack variables to allow for datapoints outside the sphere and the task is to minimise the volume of the sphere and number of datapoints outside i.e. e i mIll s.t. [R2 + oX L i ei 1 (Xi - a) . (Xi - a) S R2 + e ei i, ~ a (5) Since the data appears in the form of inner products kernel substitution can be applied and the learning task can be reduced to a quadratic programming problem. An alternative approach has been developed by Scholkopf et al. [7]. Suppose we restricted our attention to RBF kernels (3) then the data lies on the surface of a hypersphere in feature space since ¢;(x) . ¢;(x) = K(x , x) = l. The objective is therefore to separate off the surface region constaining data from the region containing no data. This is achieved by constructing a hyperplane which is maximally distant from the origin with all datapoints lying on the opposite side from the origin and such that the margin is positive. The learning task in dual form involves minimisation of: mIll s.t. W(cr.) = t L7,'k=l cr.icr.jK(Xi, Xj) a S cr.i S C, L::1 cr.i = l. (6) However, the origin plays a special role in this model. As the authors point out [9] this is a disadvantage since the origin effectively acts as a prior for where the class of abnormal instances is assumed to lie. In this paper we avoid this problem: rather than repelling the hyperplane away from an arbitrary point outside the data distribution we instead try and attract the hyperplane towards the centre of the data distribution. In this paper we will outline a new algorithm for novelty detection which can be easily implemented using linear programming (LP) techniques. As we illustrate in section 3 it performs well in practice on datasets involving the detection of abnormalities in medical data and fault detection in condition monitoring. 2 The Algorithm For the hard margin case (see Figure 1) the objective is to find a surface in input space which wraps around the data clusters: anything outside this surface is viewed as abnormal. This surface is defined as the level set, J(z) = 0, of some nonlinear function. In feature space, J(z) = L; O'.;K(z, x;) + b, this corresponds to a hyperplane which is pulled onto the mapped datapoints with the restriction that the margin always remains positive or zero. We make the fit of this nonlinear function or hyperplane as tight as possible by minimizing the mean value of the output of the function, i.e., Li J(x;). This is achieved by minimising: (7) subject to: m LO'.jK(x;,Xj) + b 2:: 0 (8) j=l m L 0'.; = 1, 0'.; 2:: 0 (9) ;=1 The bias b is just treated as an additional parameter in the minimisation process though unrestricted in sign. The added constraints (9) on 0'. bound the class of models to be considered - we don't want to consider simple linear rescalings of the model. These constraints amount to a choice of scale for the weight vector normal to the hyperplane in feature space and hence do not impose a restriction on the model. Also, these constraints ensure that the problem is well-posed and that an optimal solution with 0'. i- 0 exists. Other constraints on the class of functions are possible, e.g. 110'.111 = 1 with no restriction on the sign of O'.i. Many real-life datasets contain noise and outliers. To handle these we can introduce a soft margin in analogy to the usual approach used with support vector machines. In this case we minimise: (10) subject to: m LO:jJ{(Xi , Xj)+b~-ei' ei~O (11) j=l and constraints (9). The parameter). controls the extent of margin errors (larger ). means fewer outliers are ignored: ). -+ 00 corresponds to the hard margin limit). The above problem can be easily solved for problems with thousands of points using standard simplex or interior point algorithms for linear programming. With the addition of column generation techniques, these same approaches can be adopted for very large problems in which the kernel matrix exceeds the capacity of main memory. Column generation algorithms incrementally add and drop columns each corresponding to a single kernel function until optimality is reached. Such approaches have been successfully applied to other support vector problems [6 , 2]. Basic simplex algorithms were sufficient for the problems considered in this paper, so we defer a listing of the code for column generation to a later paper together with experiments on large datasets [1]. 3 Experiments Artificial datasets. Before considering experiments on real-life data we will first illustrate the performance of the algorithm on some artificial datasets. In Figure 1 the algorithm places a boundary around two data clusters in input space: a hard margin was used with RBF kernels and (J
5 nips-2000-A Mathematical Programming Approach to the Kernel Fisher Algorithm
Author: Sebastian Mika, Gunnar R채tsch, Klaus-Robert M체ller
Abstract: We investigate a new kernel-based classifier: the Kernel Fisher Discriminant (KFD). A mathematical programming formulation based on the observation that KFD maximizes the average margin permits an interesting modification of the original KFD algorithm yielding the sparse KFD. We find that both, KFD and the proposed sparse KFD, can be understood in an unifying probabilistic context. Furthermore, we show connections to Support Vector Machines and Relevance Vector Machines. From this understanding, we are able to outline an interesting kernel-regression technique based upon the KFD algorithm. Simulations support the usefulness of our approach.
6 nips-2000-A Neural Probabilistic Language Model
Author: Yoshua Bengio, Réjean Ducharme, Pascal Vincent
Abstract: A goal of statistical language modeling is to learn the joint probability function of sequences of words. This is intrinsically difficult because of the curse of dimensionality: we propose to fight it with its own weapons. In the proposed approach one learns simultaneously (1) a distributed representation for each word (i.e. a similarity between words) along with (2) the probability function for word sequences, expressed with these representations. Generalization is obtained because a sequence of words that has never been seen before gets high probability if it is made of words that are similar to words forming an already seen sentence. We report on experiments using neural networks for the probability function, showing on two text corpora that the proposed approach very significantly improves on a state-of-the-art trigram model.
7 nips-2000-A New Approximate Maximal Margin Classification Algorithm
Author: Claudio Gentile
Abstract: A new incremental learning algorithm is described which approximates the maximal margin hyperplane w.r.t. norm p ~ 2 for a set of linearly separable data. Our algorithm, called ALMAp (Approximate Large Margin algorithm w.r.t. norm p), takes 0 ((P~21;;2) corrections to separate the data with p-norm margin larger than (1 - 0:) ,,(, where,,( is the p-norm margin of the data and X is a bound on the p-norm of the instances. ALMAp avoids quadratic (or higher-order) programming methods. It is very easy to implement and is as fast as on-line algorithms, such as Rosenblatt's perceptron. We report on some experiments comparing ALMAp to two incremental algorithms: Perceptron and Li and Long's ROMMA. Our algorithm seems to perform quite better than both. The accuracy levels achieved by ALMAp are slightly inferior to those obtained by Support vector Machines (SVMs). On the other hand, ALMAp is quite faster and easier to implement than standard SVMs training algorithms.
8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas
Author: Sophie Denève, Jean-René Duhamel, Alexandre Pouget
Abstract: Most models of spatial representations in the cortex assume cells with limited receptive fields that are defined in a particular egocentric frame of reference. However, cells outside of primary sensory cortex are either gain modulated by postural input or partially shifting. We show that solving classical spatial tasks, like sensory prediction, multi-sensory integration, sensory-motor transformation and motor control requires more complicated intermediate representations that are not invariant in one frame of reference. We present an iterative basis function map that performs these spatial tasks optimally with gain modulated and partially shifting units, and tests it against neurophysiological and neuropsychological data. In order to perform an action directed toward an object, it is necessary to have a representation of its spatial location. The brain must be able to use spatial cues coming from different modalities (e.g. vision, audition, touch, proprioception), combine them to infer the position of the object, and compute the appropriate movement. These cues are in different frames of reference corresponding to different sensory or motor modalities. Visual inputs are primarily encoded in retinotopic maps, auditory inputs are encoded in head centered maps and tactile cues are encoded in skin-centered maps. Going from one frame of reference to the other might seem easy. For example, the head-centered position of an object can be approximated by the sum of its retinotopic position and the eye position. However, positions are represented by population codes in the brain, and computing a head-centered map from a retinotopic map is a more complex computation than the underlying sum. Moreover, as we get closer to sensory-motor areas it seems reasonable to assume Spksls 150 100 50 o Figure 1: Response of a VIP cell to visual stimuli appearing in different part of the screen, for three different eye positions. The level of grey represent the frequency of discharge (In spikes per seconds). The white cross is the fixation point (the head is fixed). The cell's receptive field is moving with the eyes, but only partially. Here the receptive field shift is 60% of the total gaze shift. Moreover this cell is gain modulated by eye position (adapted from Duhamel et al). that the representations should be useful for sensory-motor transformations, rather than encode an
9 nips-2000-A PAC-Bayesian Margin Bound for Linear Classifiers: Why SVMs work
Author: Ralf Herbrich, Thore Graepel
Abstract: We present a bound on the generalisation error of linear classifiers in terms of a refined margin quantity on the training set. The result is obtained in a PAC- Bayesian framework and is based on geometrical arguments in the space of linear classifiers. The new bound constitutes an exponential improvement of the so far tightest margin bound by Shawe-Taylor et al. [8] and scales logarithmically in the inverse margin. Even in the case of less training examples than input dimensions sufficiently large margins lead to non-trivial bound values and - for maximum margins - to a vanishing complexity term. Furthermore, the classical margin is too coarse a measure for the essential quantity that controls the generalisation error: the volume ratio between the whole hypothesis space and the subset of consistent hypotheses. The practical relevance of the result lies in the fact that the well-known support vector machine is optimal w.r.t. the new bound only if the feature vectors are all of the same length. As a consequence we recommend to use SVMs on normalised feature vectors only - a recommendation that is well supported by our numerical experiments on two benchmark data sets. 1
10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure
Author: Shimon Edelman, Nathan Intrator
Abstract: We describe a unified framework for the understanding of structure representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common
11 nips-2000-A Silicon Primitive for Competitive Learning
Author: David Hsu, Miguel Figueroa, Chris Diorio
Abstract: Competitive learning is a technique for training classification and clustering networks. We have designed and fabricated an 11transistor primitive, that we term an automaximizing bump circuit, that implements competitive learning dynamics. The circuit performs a similarity computation, affords nonvolatile storage, and implements simultaneous local adaptation and computation. We show that our primitive is suitable for implementing competitive learning in VLSI, and demonstrate its effectiveness in a standard clustering task.
12 nips-2000-A Support Vector Method for Clustering
Author: Asa Ben-Hur, David Horn, Hava T. Siegelmann, Vladimir Vapnik
Abstract: We present a novel method for clustering using the support vector machine approach. Data points are mapped to a high dimensional feature space, where support vectors are used to define a sphere enclosing them. The boundary of the sphere forms in data space a set of closed contours containing the data. Data points enclosed by each contour are defined as a cluster. As the width parameter of the Gaussian kernel is decreased, these contours fit the data more tightly and splitting of contours occurs. The algorithm works by separating clusters according to valleys in the underlying probability distribution, and thus clusters can take on arbitrary geometrical shapes. As in other SV algorithms, outliers can be dealt with by introducing a soft margin constant leading to smoother cluster boundaries. The structure of the data is explored by varying the two parameters. We investigate the dependence of our method on these parameters and apply it to several data sets.
13 nips-2000-A Tighter Bound for Graphical Models
Author: Martijn A. R. Leisink, Hilbert J. Kappen
Abstract: We present a method to bound the partition function of a Boltzmann machine neural network with any odd order polynomial. This is a direct extension of the mean field bound, which is first order. We show that the third order bound is strictly better than mean field. Additionally we show the rough outline how this bound is applicable to sigmoid belief networks. Numerical experiments indicate that an error reduction of a factor two is easily reached in the region where expansion based approximations are useful. 1
14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks
Author: Chiranjib Bhattacharyya, S. Sathiya Keerthi
Abstract: A variational derivation of Plefka's mean-field theory is presented. This theory is then applied to sigmoidal belief networks with the aid of further approximations. Empirical evaluation on small scale networks show that the proposed approximations are quite competitive. 1
15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation
Author: Brendan J. Frey, Anitha Kannan
Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce
16 nips-2000-Active Inference in Concept Learning
Author: Jonathan D. Nelson, Javier R. Movellan
Abstract: People are active experimenters, not just passive observers, constantly seeking new information relevant to their goals. A reasonable approach to active information gathering is to ask questions and conduct experiments that maximize the expected information gain, given current beliefs (Fedorov 1972, MacKay 1992, Oaksford & Chater 1994). In this paper we present results on an exploratory experiment designed to study people's active information gathering behavior on a concept learning task (Tenenbaum 2000). The results of the experiment are analyzed in terms of the expected information gain of the questions asked by subjects. In scientific inquiry and in everyday life, people seek out information relevant to perceptual and cognitive tasks. Scientists perform experiments to uncover causal relationships; people saccade to informative areas of visual scenes, turn their head towards surprising sounds, and ask questions to understand the meaning of concepts . Consider a person learning a foreign language, who notices that a particular word,
17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks
Author: Simon Tong, Daphne Koller
Abstract: Bayesian networks are graphical representations of probability distributions. In virtually all of the work on learning these networks, the assumption is that we are presented with a data set consisting of randomly generated instances from the underlying distribution. In many situations, however, we also have the option of active learning, where we have the possibility of guiding the sampling process by querying for certain types of samples. This paper addresses the problem of estimating the parameters of Bayesian networks in an active learning setting. We provide a theoretical framework for this problem, and an algorithm that chooses which active learning queries to generate based on the model learned so far. We present experimental results showing that our active learning algorithm can significantly reduce the need for training data in many situations.
18 nips-2000-Active Support Vector Machine Classification
Author: Olvi L. Mangasarian, David R. Musicant
Abstract: An active set strategy is applied to the dual of a simple reformulation of the standard quadratic program of a linear support vector machine. This application generates a fast new dual algorithm that consists of solving a finite number of linear equations, with a typically large dimensionality equal to the number of points to be classified. However, by making novel use of the Sherman-MorrisonWoodbury formula , a much smaller matrix of the order of the original input space is inverted at each step. Thus, a problem with a 32-dimensional input space and 7 million points required inverting positive definite symmetric matrices of size 33 x 33 with a total running time of 96 minutes on a 400 MHz Pentium II. The algorithm requires no specialized quadratic or linear programming code, but merely a linear equation solver which is publicly available. 1
19 nips-2000-Adaptive Object Representation with Hierarchically-Distributed Memory Sites
Author: Bosco S. Tjan
Abstract: Theories of object recognition often assume that only one representation scheme is used within one visual-processing pathway. Versatility of the visual system comes from having multiple visual-processing pathways, each specialized in a different category of objects. We propose a theoretically simpler alternative, capable of explaining the same set of data and more. A single primary visual-processing pathway, loosely modular, is assumed. Memory modules are attached to sites along this pathway. Object-identity decision is made independently at each site. A site's response time is a monotonic-decreasing function of its confidence regarding its decision. An observer's response is the first-arriving response from any site. The effective representation(s) of such a system, determined empirically, can appear to be specialized for different tasks and stimuli, consistent with recent clinical and functional-imaging findings. This, however, merely reflects a decision being made at its appropriate level of abstraction. The system itself is intrinsically flexible and adaptive.
20 nips-2000-Algebraic Information Geometry for Learning Machines with Singularities
Author: Sumio Watanabe
Abstract: Algebraic geometry is essential to learning theory. In hierarchical learning machines such as layered neural networks and gaussian mixtures, the asymptotic normality does not hold , since Fisher information matrices are singular. In this paper , the rigorous asymptotic form of the stochastic complexity is clarified based on resolution of singularities and two different problems are studied. (1) If the prior is positive, then the stochastic complexity is far smaller than BIO, resulting in the smaller generalization error than regular statistical models, even when the true distribution is not contained in the parametric model. (2) If Jeffreys' prior, which is coordinate free and equal to zero at singularities, is employed then the stochastic complexity has the same form as BIO. It is useful for model selection, but not for generalization. 1
21 nips-2000-Algorithmic Stability and Generalization Performance
22 nips-2000-Algorithms for Non-negative Matrix Factorization
23 nips-2000-An Adaptive Metric Machine for Pattern Classification
24 nips-2000-An Information Maximization Approach to Overcomplete and Recurrent Representations
25 nips-2000-Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators
26 nips-2000-Automated State Abstraction for Options using the U-Tree Algorithm
27 nips-2000-Automatic Choice of Dimensionality for PCA
28 nips-2000-Balancing Multiple Sources of Reward in Reinforcement Learning
29 nips-2000-Bayes Networks on Ice: Robotic Search for Antarctic Meteorites
30 nips-2000-Bayesian Video Shot Segmentation
33 nips-2000-Combining ICA and Top-Down Attention for Robust Speech Recognition
34 nips-2000-Competition and Arbors in Ocular Dominance
35 nips-2000-Computing with Finite and Infinite Networks
36 nips-2000-Constrained Independent Component Analysis
37 nips-2000-Convergence of Large Margin Separable Linear Classification
38 nips-2000-Data Clustering by Markovian Relaxation and the Information Bottleneck Method
41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach
42 nips-2000-Divisive and Subtractive Mask Effects: Linking Psychophysics and Biophysics
44 nips-2000-Efficient Learning of Linear Perceptrons
46 nips-2000-Ensemble Learning and Linear Response Theory for ICA
47 nips-2000-Error-correcting Codes on a Bethe-like Lattice
48 nips-2000-Exact Solutions to Time-Dependent MDPs
49 nips-2000-Explaining Away in Weight Space
51 nips-2000-Factored Semi-Tied Covariance Matrices
52 nips-2000-Fast Training of Support Vector Classifiers
53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach
54 nips-2000-Feature Selection for SVMs
55 nips-2000-Finding the Key to a Synapse
56 nips-2000-Foundations for a Circuit Complexity Theory of Sensory Processing
58 nips-2000-From Margin to Sparsity
59 nips-2000-From Mixtures of Mixtures to Adaptive Transform Coding
61 nips-2000-Generalizable Singular Value Decomposition for Ill-posed Datasets
62 nips-2000-Generalized Belief Propagation
63 nips-2000-Hierarchical Memory-Based Reinforcement Learning
64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data
65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals
66 nips-2000-Hippocampally-Dependent Consolidation in a Hierarchical Model of Neocortex
67 nips-2000-Homeostasis in a Silicon Integrate and Fire Neuron
68 nips-2000-Improved Output Coding for Classification Using Continuous Relaxation
69 nips-2000-Incorporating Second-Order Functional Knowledge for Better Option Pricing
70 nips-2000-Incremental and Decremental Support Vector Machine Learning
71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script
72 nips-2000-Keeping Flexible Active Contours on Track using Metropolis Updates
74 nips-2000-Kernel Expansions with Unlabeled Examples
75 nips-2000-Large Scale Bayes Point Machines
76 nips-2000-Learning Continuous Distributions: Simulations With Field Theoretic Priors
77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations
78 nips-2000-Learning Joint Statistical Models for Audio-Visual Fusion and Segregation
79 nips-2000-Learning Segmentation by Random Walks
80 nips-2000-Learning Switching Linear Models of Human Motion
82 nips-2000-Learning and Tracking Cyclic Human Motion
83 nips-2000-Machine Learning for Video-Based Rendering
84 nips-2000-Minimum Bayes Error Feature Selection for Continuous Speech Recognition
85 nips-2000-Mixtures of Gaussian Processes
86 nips-2000-Model Complexity, Goodness of Fit and Diminishing Returns
87 nips-2000-Modelling Spatial Recall, Mental Imagery and Neglect
88 nips-2000-Multiple Timescales of Adaptation in a Neural Code
89 nips-2000-Natural Sound Statistics and Divisive Normalization in the Auditory System
90 nips-2000-New Approaches Towards Robust and Adaptive Speech Recognition
94 nips-2000-On Reversing Jensen's Inequality
95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling
96 nips-2000-One Microphone Source Separation
97 nips-2000-Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping
98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks
100 nips-2000-Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks
102 nips-2000-Position Variance, Recurrence and Perceptual Learning
103 nips-2000-Probabilistic Semantic Video Indexing
104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics
105 nips-2000-Programmable Reinforcement Learning Agents
106 nips-2000-Propagation Algorithms for Variational Bayesian Learning
107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition
108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts
110 nips-2000-Regularization with Dot-Product Kernels
111 nips-2000-Regularized Winnow Methods
112 nips-2000-Reinforcement Learning with Function Approximation Converges to a Region
113 nips-2000-Robust Reinforcement Learning
114 nips-2000-Second Order Approximations for Probability Models
115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks
116 nips-2000-Sex with Support Vector Machines
117 nips-2000-Shape Context: A New Descriptor for Shape Matching and Object Recognition
118 nips-2000-Smart Vision Chip Fabricated Using Three Dimensional Integration Technology
119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers
120 nips-2000-Sparse Greedy Gaussian Process Regression
121 nips-2000-Sparse Kernel Principal Component Analysis
122 nips-2000-Sparse Representation for Gaussian Process Models
123 nips-2000-Speech Denoising and Dereverberation Using Probabilistic Models
124 nips-2000-Spike-Timing-Dependent Learning for Oscillatory Networks
125 nips-2000-Stability and Noise in Biochemical Switches
126 nips-2000-Stagewise Processing in Error-correcting Codes and Image Restoration
127 nips-2000-Structure Learning in Human Causal Induction
128 nips-2000-Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra
129 nips-2000-Temporally Dependent Plasticity: An Information Theoretic Account
130 nips-2000-Text Classification using String Kernels
131 nips-2000-The Early Word Catches the Weights
132 nips-2000-The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving
133 nips-2000-The Kernel Gibbs Sampler
134 nips-2000-The Kernel Trick for Distances
136 nips-2000-The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity
137 nips-2000-The Unscented Particle Filter
138 nips-2000-The Use of Classifiers in Sequential Inference
139 nips-2000-The Use of MDL to Select among Computational Models of Cognition
140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles
141 nips-2000-Universality and Individuality in a Neural Code
142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task
143 nips-2000-Using the Nyström Method to Speed Up Kernel Machines
144 nips-2000-Vicinal Risk Minimization
145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting
146 nips-2000-What Can a Single Neuron Compute?
147 nips-2000-Who Does What? A Novel Algorithm to Determine Function Localization
148 nips-2000-`N-Body' Problems in Statistical Learning