nips nips2003 nips2003-187 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bob Ricks, Dan Ventura
Abstract: Most proposals for quantum neural networks have skipped over the problem of how to train the networks. The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. We propose a simple quantum neural network and a training method for it. It can be shown that this algorithm works in quantum systems. Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Most proposals for quantum neural networks have skipped over the problem of how to train the networks. [sent-5, score-0.96]
2 The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. [sent-6, score-1.079]
3 We propose a simple quantum neural network and a training method for it. [sent-7, score-1.057]
4 It can be shown that this algorithm works in quantum systems. [sent-8, score-0.904]
5 Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms. [sent-9, score-0.982]
6 1 Introduction Many quantum neural networks have been proposed [1], but very few of these proposals have attempted to provide an in-depth method of training them. [sent-10, score-1.032]
7 Most either do not mention how the network will be trained or simply state that they use a standard gradient descent algorithm. [sent-11, score-0.15]
8 This assumes that training a quantum neural network will be straightforward and analogous to classical methods. [sent-12, score-1.133]
9 While some quantum neural networks seem quite similar to classical networks [2], others have proposed quantum networks that are vastly different [3, 4, 5]. [sent-13, score-1.976]
10 Several of these networks also employ methods which are speculative or difficult to do in quantum systems [7, 8]. [sent-15, score-0.922]
11 These significant differences between classical networks and quantum neural networks, as well as the problems associated with quantum computation itself, require us to look more deeply at the issue of training quantum neural networks. [sent-16, score-2.882]
12 It is an open question what advantages a quantum neural network (QNN) would have over a classical network. [sent-18, score-1.061]
13 Other results have shown that QNNs may work best with some classical components as well as quantum components [2]. [sent-20, score-0.962]
14 This paper details such a network and how training could be done on it. [sent-23, score-0.151]
15 2 Quantum Computation Several necessary ideas that form the basis for the study of quantum computation are briefly reviewed here. [sent-25, score-0.903]
16 The Hilbert space has a set of states, |φi , that form a basis, and the system is described by a quantum state |ψ = i ci |φi . [sent-30, score-0.938]
17 |ψ is said to be coherent or to be in a linear superposition of the basis states |φi , and in general the coefficients ci are complex. [sent-31, score-0.194]
18 A postulate of quantum mechanics is that if a coherent system interacts in any way with its environment (by being measured, for example), the superposition is destroyed. [sent-32, score-1.059]
19 A two-state quantum system is used as the basic unit of quantum computation. [sent-37, score-1.772]
20 Such a system is referred to as a quantum bit or qubit and naming the two states |0 and |1 , it is easy to see why this is so. [sent-38, score-0.959]
21 2 Operators Operators on a Hilbert space describe how one wave function is changed into another and they may be represented as matrices acting on vectors (the notation |· indicates a column vector and the ·| a [complex conjugate] row vector). [sent-40, score-0.153]
22 In the quantum formalism, all properties are represented as operators whose eigenstates are the basis for the Hilbert space associated with that property and whose eigenvalues are the quantum allowed values for that property. [sent-44, score-1.89]
23 It is important to note that operators in quantum mechanics must be linear operators and further that they must be unitary. [sent-45, score-1.039]
24 This is a phenomenon common to all kinds of wave mechanics from water waves to optics. [sent-49, score-0.164]
25 The well known double slit experiment demonstrates empirically that at the quantum level interference also applies to the probability waves of quantum mechanics. [sent-50, score-1.839]
26 The wave function interferes with itself through the action of an operator – the different parts of the wave function interfere constructively or destructively according to their relative phases just like any other kind of wave. [sent-51, score-0.238]
27 4 Entanglement Entanglement is the potential for quantum systems to exhibit correlations that cannot be accounted for classically. [sent-53, score-0.902]
28 From a computational standpoint, entanglement seems intuitive enough – it is simply the fact that correlations can exist between different qubits – for example if one qubit is in the |1 state, another will be in the |1 state. [sent-54, score-0.228]
29 However, from a physical standpoint, entanglement is little understood. [sent-55, score-0.154]
30 What makes it so powerful (and so little understood) is the fact that since quantum states exist as superpositions, these correlations exist in superposition as well. [sent-57, score-1.032]
31 There are different degrees of entanglement and much work has been done on better understanding and quantifying it [10, 11]. [sent-67, score-0.148]
32 Finally, it should be mentioned that while interference is a quantum property that has a classical cousin, entanglement is a completely quantum phenomenon for which there is no classical analog. [sent-68, score-2.113]
33 5 An Example – Quantum Search One of the best known quantum algorithms searches an unordered database quadratically faster than any classical method [12, 13]. [sent-72, score-1.005]
34 The algorithm begins with a superposition of all N data items and depends upon an oracle that can recognize the target of the search. [sent-73, score-0.247]
35 Classically, searching such a database requires O(N ) oracle calls; however, on a quan√ tum computer, the task requires only O( N ) oracle calls. [sent-74, score-0.252]
36 Each oracle call consists of a quantum operator that inverts the phase of the search target. [sent-75, score-1.099]
37 3 A Simple Quantum Neural Network We would like a QNN with features that make it easy for us to model, yet powerful enough to leverage quantum physics. [sent-78, score-0.907]
38 The output layer does the same thing as the hidden layer(s), except that it also checks its accuracy against the target output of the network. [sent-83, score-0.241]
39 The network as a whole computes a function by checking which output bit is high. [sent-84, score-0.182]
40 The output layer works similarly, taking a weighted sum of the hidden nodes and checking against a threshold. [sent-92, score-0.224]
41 At the quantum gate level, the network will require O(blm + m2 ) gates for each node of the network. [sent-95, score-1.059]
42 Here b is the number of bits used for floating point arithmetic in |β , l is the number of bits for each weight and m is the number of inputs to the node [14]-[15]. [sent-96, score-0.307]
43 The overall network works as follows on a training set. [sent-97, score-0.169]
44 In our example, the network has two input parameters, so all n training examples will have two input registers. [sent-98, score-0.173]
45 Each hidden or output node has a weight vector, represented by |ψ i , each vector containing weights for each of its inputs. [sent-101, score-0.35]
46 After classifying a training example, the registers |ϕ 1 and |ϕ 2 reflect the networks ability to classify that the training example. [sent-102, score-0.269]
47 When all training examples have Figure 2: QNN Training been classified, |ρ will be the sum of the output nodes that have the correct answer throughout the training set and will range between zero and the number of training examples times the number of output nodes. [sent-104, score-0.417]
48 4 Using Quantum Search to Learn Network Weights One possibility for training this kind of a network is to search through the possible weight vectors for one which is consistent with the training data. [sent-105, score-0.464]
49 Quantum searches have been used already in quantum learning [16] and many of the problems associated with them have already been explored [17]. [sent-106, score-0.913]
50 We would like to find a solution which classifies all training examples correctly; in other words we would like |ρ = n ∗ m where n is the number of training examples and m is the number of output nodes. [sent-107, score-0.238]
51 Since we generally do not know how many weight vectors will do this, we use a generalization of the original search algorithm [18], intended for problems where the number of solutions t is unknown. [sent-108, score-0.241]
52 The basic idea is that we will put |ψ into a superposition of all possible weight vectors and search for one which classifies all training examples correctly. [sent-109, score-0.446]
53 We start out with |ψ as a superposition of all possible weight vectors. [sent-110, score-0.234]
54 By using a superposition we classify the training examples with respect to every possible weight vector simultaneously. [sent-113, score-0.38]
55 Each weight vector is now entangled with |ρ in such a way that |ρ corresponds with how well every weight vector classifies all the training data. [sent-114, score-0.413]
56 In this case, the oracle for the quantum search is |ρ = n ∗ m, which corresponds to searching for a weight vector which correctly classifies the entire set. [sent-115, score-1.284]
57 Unfortunately, searching the weight vectors while entangled with |ρ would cause unwanted weight vectors to grow that would be entangled with the performance metric we are looking for. [sent-116, score-0.47]
58 The solution is to disentangle |ψ from the other registers after inverting the phase of those weights which match the search criteria, based on |ρ . [sent-117, score-0.18]
59 This means that the network will need to be recomputed each time we make an oracle call and after each measurement. [sent-119, score-0.177]
60 First, not all training data will have any solution networks that correctly classify all training instances. [sent-121, score-0.248]
61 This means that nothing will be marked by the search oracle, so every weight vector will have an equal chance of being measured. [sent-122, score-0.225]
62 Second, the amount of time needed to find a vector which correctly classifies the training set is O( 2b /t), which has exponential complexity with respect to the number of bits in the weight vector. [sent-124, score-0.306]
63 One way to deal with the first problem is to search until we find a solution which covers an acceptable percentage, p, of the training data. [sent-125, score-0.155]
64 In other words, the search oracle is modified to be |ρ ≥ n ∗ m ∗ p. [sent-126, score-0.181]
65 5 Piecewise Weight Learning Our quantum search algorithm gives us a good polynomial speed-up to the exponential task of finding a solution to the QNN. [sent-128, score-0.991]
66 This algorithm does not scale well, in fact it is exponential in the total number of weights in the network and the bits per weight. [sent-129, score-0.161]
67 Therefore, we propose a randomized training algorithm which searches each node’s weight vector independently. [sent-130, score-0.295]
68 The network starts off, once again, with training examples in |α , the corresponding answers in |Ω , and zeros in all the other registers. [sent-131, score-0.173]
69 A node is randomly selected and its weight vector, |ψ i , is put into superposition. [sent-132, score-0.197]
70 All other weight vectors start with random classical initial weights. [sent-133, score-0.234]
71 We then search for a weight vector for this node that causes the entire network to classify a certain percentage, p, of the training examples correctly. [sent-134, score-0.505]
72 That weight is fixed classically and the process is repeated randomly for the other nodes. [sent-136, score-0.148]
73 Searching each node’s weight vector separately is, in effect, a random search through the weight space where we select weight vectors which give a good level of performance for each node. [sent-137, score-0.506]
74 Each node takes on weight vectors that tend to increase performance with some amount of randomness that helps keep it out of local minima. [sent-138, score-0.253]
75 First, to insure that hidden nodes find weight vectors that compute something useful, a small performance penalty is added to weight vectors which cause a hidden node to output the same value for all training examples. [sent-141, score-0.63]
76 This helps select weight vectors which contain useful information for the output nodes. [sent-142, score-0.229]
77 Since each output node’s performance is independent of the performance or all output nodes, the algorithm only considers the accuracy of the output node being trained when training an output node. [sent-143, score-0.346]
78 Each of the hidden and the output nodes are thresholded nodes with three weights, one for each input and one for the threshold. [sent-145, score-0.166]
79 After an average of 58 weight updates, the algorithm was able to correctly classify the training data. [sent-150, score-0.263]
80 Since this is a randomized algorithm both in the number of iterations of the search algorithm before measuring and in the order which nodes update their weight vectors, the standard deviation for this method was much higher, but still reasonable. [sent-151, score-0.298]
81 In the randomized search algorithm, an epoch refers to finding and fixing the weight of a single node. [sent-152, score-0.26]
82 We also tried the randomized search algorithm for a few real-world machine learning problems: lenses, Hayes-Roth and the iris datasets [19]. [sent-153, score-0.176]
83 The lenses data set is a data set that tries to predict whether people will need soft contact lenses, hard contact lenses or no contacts. [sent-154, score-0.214]
84 98% Backprop 96% 92% 83% Table 1: Training Results The lenses data set can be solved with a network that has three hidden nodes. [sent-163, score-0.204]
85 The randomized search algorithm found the correct target for 97. [sent-168, score-0.175]
86 We used four hidden nodes with two bit weights for the hidden nodes. [sent-171, score-0.169]
87 We had to normalize the inputs to range from zero to one once again so the larger inputs would not dominate the weight vectors. [sent-172, score-0.159]
88 86% of the training examples correctly since we are checking each output node for accuracy on each training example. [sent-176, score-0.352]
89 7 Conclusions and Future Work This paper proposes a simple quantum neural network and a method of training it which works well in quantum systems. [sent-179, score-1.961]
90 By using a quantum search we are able to use a wellknown algorithm for quantum systems which has already been used for quantum learning. [sent-180, score-2.741]
91 The algorithm is able to search for solutions that cover an arbitrary percentage of the training set. [sent-181, score-0.172]
92 This algorithm is exponential in the number of qubits of each node’s weight vector instead of in the composite weight vector of the entire network. [sent-185, score-0.38]
93 There may be quantum methods which could be used to improve current gradient descent and other learning algorithms. [sent-189, score-0.935]
94 It may also be possible to combine some of these with a quantum search. [sent-190, score-0.886]
95 An example would be to use gradient descent to try and refine a composite weight vector found by quantum search. [sent-191, score-1.094]
96 Conversely, a quantum search could start with the weight vector of a gradient descent search. [sent-192, score-1.16]
97 This would allow the search to start with an accurate weight vector and search locally for weight vectors which improve overall performance. [sent-193, score-0.466]
98 Other types of QNNs may be able to use a quantum search as well since the algorithm only requires a weight space which can be searched in superposition. [sent-195, score-1.092]
99 In addition, more traditional gradient descent techniques might benefit from a quantum speed-up themselves. [sent-196, score-0.935]
100 Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. [sent-299, score-0.886]
wordName wordTfidf (topN-words)
[('quantum', 0.886), ('entanglement', 0.127), ('weight', 0.123), ('qnn', 0.113), ('superposition', 0.111), ('oracle', 0.098), ('lenses', 0.085), ('search', 0.083), ('wave', 0.08), ('network', 0.079), ('classical', 0.076), ('node', 0.074), ('training', 0.072), ('entangled', 0.057), ('qubits', 0.057), ('registers', 0.056), ('randomized', 0.054), ('operators', 0.054), ('output', 0.05), ('xor', 0.049), ('interference', 0.045), ('mechanics', 0.045), ('brigham', 0.042), ('qnns', 0.042), ('searching', 0.04), ('hidden', 0.04), ('dan', 0.039), ('iris', 0.039), ('target', 0.038), ('nodes', 0.038), ('ventura', 0.037), ('hilbert', 0.036), ('networks', 0.036), ('bits', 0.035), ('vectors', 0.035), ('correctly', 0.035), ('interfere', 0.034), ('classify', 0.033), ('descent', 0.032), ('layer', 0.032), ('checks', 0.031), ('ci', 0.03), ('alexandr', 0.028), ('behrman', 0.028), ('constructively', 0.028), ('eigenstates', 0.028), ('ezhov', 0.028), ('provo', 0.028), ('qubit', 0.028), ('vedral', 0.028), ('young', 0.028), ('volume', 0.028), ('physical', 0.027), ('searches', 0.027), ('checking', 0.027), ('factorized', 0.027), ('bit', 0.026), ('weights', 0.025), ('classically', 0.025), ('steck', 0.025), ('pages', 0.023), ('arithmetic', 0.022), ('contact', 0.022), ('waves', 0.022), ('lov', 0.022), ('state', 0.022), ('examples', 0.022), ('sciences', 0.022), ('exponential', 0.022), ('helps', 0.021), ('leverage', 0.021), ('quantifying', 0.021), ('associative', 0.02), ('standpoint', 0.02), ('coherence', 0.02), ('gates', 0.02), ('neural', 0.02), ('vector', 0.019), ('sum', 0.019), ('states', 0.019), ('classi', 0.019), ('ut', 0.019), ('represented', 0.019), ('amplitudes', 0.018), ('register', 0.018), ('proposals', 0.018), ('inputs', 0.018), ('works', 0.018), ('percentage', 0.017), ('coherent', 0.017), ('gradient', 0.017), ('basis', 0.017), ('phenomenon', 0.017), ('composite', 0.017), ('governed', 0.017), ('http', 0.016), ('operator', 0.016), ('phase', 0.016), ('database', 0.016), ('correlations', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 187 nips-2003-Training a Quantum Neural Network
Author: Bob Ricks, Dan Ventura
Abstract: Most proposals for quantum neural networks have skipped over the problem of how to train the networks. The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. We propose a simple quantum neural network and a training method for it. It can be shown that this algorithm works in quantum systems. Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms. 1
2 0.079345725 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
Author: Justin Werfel, Xiaohui Xie, H. S. Seung
Abstract: Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective. 1
3 0.045047753 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
4 0.044900138 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
Author: Xuanlong Nguyen, Michael I. Jordan
Abstract: We present an analysis of concentration-of-expectation phenomena in layered Bayesian networks that use generalized linear models as the local conditional probabilities. This framework encompasses a wide variety of probability distributions, including both discrete and continuous random variables. We utilize ideas from large deviation analysis and the delta method to devise and evaluate a class of approximate inference algorithms for layered Bayesian networks that have superior asymptotic error bounds and very fast computation time. 1
5 0.039210483 183 nips-2003-Synchrony Detection by Analogue VLSI Neurons with Bimodal STDP Synapses
Author: Adria Bofill-i-petit, Alan F. Murray
Abstract: We present test results from spike-timing correlation learning experiments carried out with silicon neurons with STDP (Spike Timing Dependent Plasticity) synapses. The weight change scheme of the STDP synapses can be set to either weight-independent or weight-dependent mode. We present results that characterise the learning window implemented for both modes of operation. When presented with spike trains with different types of synchronisation the neurons develop bimodal weight distributions. We also show that a 2-layered network of silicon spiking neurons with STDP synapses can perform hierarchical synchrony detection. 1
6 0.036780123 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data
7 0.035444394 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
8 0.034150485 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
9 0.033996195 123 nips-2003-Markov Models for Automated ECG Interval Analysis
10 0.030453114 42 nips-2003-Bounded Finite State Controllers
11 0.029704597 55 nips-2003-Distributed Optimization in Adaptive Networks
12 0.029455887 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations
13 0.028873267 182 nips-2003-Subject-Independent Magnetoencephalographic Source Localization by a Multilayer Perceptron
14 0.028762799 112 nips-2003-Learning to Find Pre-Images
15 0.027247012 102 nips-2003-Large Scale Online Learning
16 0.026356108 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines
17 0.026351249 119 nips-2003-Local Phase Coherence and the Perception of Blur
18 0.026163951 175 nips-2003-Sensory Modality Segregation
19 0.025351971 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian
20 0.025226211 110 nips-2003-Learning a World Model and Planning with a Self-Organizing, Dynamic Neural System
topicId topicWeight
[(0, -0.101), (1, 0.017), (2, 0.021), (3, 0.017), (4, 0.004), (5, -0.028), (6, -0.012), (7, -0.002), (8, 0.016), (9, 0.042), (10, 0.028), (11, -0.014), (12, 0.026), (13, -0.009), (14, -0.015), (15, -0.038), (16, -0.043), (17, 0.01), (18, 0.093), (19, 0.03), (20, 0.038), (21, -0.017), (22, -0.075), (23, 0.076), (24, -0.065), (25, 0.026), (26, -0.032), (27, 0.089), (28, -0.066), (29, -0.029), (30, -0.068), (31, 0.064), (32, 0.027), (33, -0.018), (34, 0.001), (35, 0.005), (36, 0.023), (37, -0.02), (38, 0.005), (39, 0.052), (40, -0.016), (41, -0.008), (42, -0.041), (43, 0.007), (44, 0.062), (45, 0.105), (46, -0.093), (47, -0.057), (48, -0.098), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.93459004 187 nips-2003-Training a Quantum Neural Network
Author: Bob Ricks, Dan Ventura
Abstract: Most proposals for quantum neural networks have skipped over the problem of how to train the networks. The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. We propose a simple quantum neural network and a training method for it. It can be shown that this algorithm works in quantum systems. Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms. 1
2 0.70482135 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
Author: Justin Werfel, Xiaohui Xie, H. S. Seung
Abstract: Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective. 1
Author: Eiji Mizutani, James Demmel
Abstract: The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer iteration to solve the associated “overdetermined” nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter’s implicit sparse Hessian matrix-vector multiply algorithm to construct the Krylov subspaces used to solve for the truncated Newton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs. 1
4 0.51871306 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
Author: Ting liu, Andrew W. Moore, Alexander Gray
Abstract: This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN. These results include datasets with up to 106 dimensions and 105 records, and show non-trivial speedups while giving exact answers. 1
5 0.482923 175 nips-2003-Sensory Modality Segregation
Author: Virginia Sa
Abstract: Why are sensory modalities segregated the way they are? In this paper we show that sensory modalities are well designed for self-supervised cross-modal learning. Using the Minimizing-Disagreement algorithm on an unsupervised speech categorization task with visual (moving lips) and auditory (sound signal) inputs, we show that very informative auditory dimensions actually harm performance when moved to the visual side of the network. It is better to throw them away than to consider them part of the “visual input”. We explain this finding in terms of the statistical structure in sensory inputs. 1
6 0.46953684 196 nips-2003-Wormholes Improve Contrastive Divergence
7 0.44861105 165 nips-2003-Reasoning about Time and Knowledge in Neural Symbolic Learning Systems
8 0.4437336 3 nips-2003-AUC Optimization vs. Error Rate Minimization
9 0.4386422 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification
10 0.43254372 181 nips-2003-Statistical Debugging of Sampled Programs
11 0.42056286 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
12 0.41866511 184 nips-2003-The Diffusion-Limited Biochemical Signal-Relay Channel
13 0.40799198 102 nips-2003-Large Scale Online Learning
14 0.39962775 99 nips-2003-Kernels for Structured Natural Language Data
15 0.39893264 44 nips-2003-Can We Learn to Beat the Best Stock
16 0.39023763 55 nips-2003-Distributed Optimization in Adaptive Networks
17 0.38374263 6 nips-2003-A Fast Multi-Resolution Method for Detection of Significant Spatial Disease Clusters
18 0.38137007 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
19 0.36972043 45 nips-2003-Circuit Optimization Predicts Dynamic Networks for Chemosensory Orientation in Nematode C. elegans
20 0.36185572 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
topicId topicWeight
[(0, 0.038), (9, 0.3), (11, 0.028), (29, 0.016), (30, 0.021), (35, 0.053), (48, 0.014), (53, 0.098), (71, 0.079), (76, 0.033), (85, 0.084), (91, 0.103), (99, 0.015)]
simIndex simValue paperId paperTitle
1 0.86854041 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control
Author: Francesco Tenore, Ralph Etienne-Cummings, M. A. Lewis
Abstract: We have constructed a second generation CPG chip capable of generating the necessary timing to control the leg of a walking machine. We demonstrate improvements over a previous chip by moving toward a significantly more versatile device. This includes a larger number of silicon neurons, more sophisticated neurons including voltage dependent charging and relative and absolute refractory periods, and enhanced programmability of neural networks. This chip builds on the basic results achieved on a previous chip and expands its versatility to get closer to a self-contained locomotion controller for walking robots. 1
2 0.79605734 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion
Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler
Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1
same-paper 3 0.76819569 187 nips-2003-Training a Quantum Neural Network
Author: Bob Ricks, Dan Ventura
Abstract: Most proposals for quantum neural networks have skipped over the problem of how to train the networks. The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. We propose a simple quantum neural network and a training method for it. It can be shown that this algorithm works in quantum systems. Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms. 1
4 0.54105979 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
Author: Yu-han Chang, Tracey Ho, Leslie P. Kaelbling
Abstract: In large multiagent games, partial observability, coordination, and credit assignment persistently plague attempts to design good learning algorithms. We provide a simple and efficient algorithm that in part uses a linear system to model the world from a single agent’s limited perspective, and takes advantage of Kalman filtering to allow an agent to construct a good training signal and learn an effective policy. 1
5 0.53569436 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
Author: Sanjiv Kumar, Martial Hebert
Abstract: In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudo-likelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph min-cut algorithms. The performance of the model was verified on the synthetic as well as the real-world images. The DRF model outperforms the MRF model in the experiments. 1
6 0.53500307 126 nips-2003-Measure Based Regularization
7 0.53403497 78 nips-2003-Gaussian Processes in Reinforcement Learning
8 0.5333482 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
9 0.53275716 113 nips-2003-Learning with Local and Global Consistency
10 0.53273231 143 nips-2003-On the Dynamics of Boosting
11 0.53188574 3 nips-2003-AUC Optimization vs. Error Rate Minimization
12 0.53166163 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
13 0.53109598 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
14 0.53061521 107 nips-2003-Learning Spectral Clustering
15 0.52992707 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
16 0.52971697 102 nips-2003-Large Scale Online Learning
17 0.52956867 93 nips-2003-Information Dynamics and Emergent Computation in Recurrent Circuits of Spiking Neurons
18 0.52931595 30 nips-2003-Approximability of Probability Distributions
19 0.5290038 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
20 0.52852136 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction