nips nips2008 nips2008-248 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ilya Sutskever, Geoffrey E. Hinton
Abstract: We describe a way of learning matrix representations of objects and relationships. The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. We further demonstrate that the system understands how higher-order propositions are related to first-order ones by showing that it can correctly answer questions about first-order propositions involving the relations +3 or has wife even though it has not been trained on any first-order examples involving these relations. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Using matrices to model symbolic relationships Ilya Sutskever and Geoffrey Hinton University of Toronto {ilya, hinton}@cs. [sent-1, score-0.159]
2 ca Abstract We describe a way of learning matrix representations of objects and relationships. [sent-3, score-0.226]
3 The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. [sent-4, score-0.451]
4 We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. [sent-5, score-0.444]
5 We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. [sent-6, score-0.49]
6 1 Introduction It is sometimes possible to find a way of mapping objects in a “data” domain into objects in a “target” domain so that operations in the data domain can be modelled by operations in the target domain. [sent-8, score-0.308]
7 When the objects in the data and target domains are more complicated than single numbers, it may be difficult to find good mappings using inspiration alone. [sent-10, score-0.188]
8 Paccanaro and Hinton [10] introduced a method called “Linear Relational Embedding” (LRE) that uses multiplication of vectors by matrices in the target domain to model pairwise relations between objects in the data domain. [sent-12, score-0.686]
9 LRE applies to a finite set of objects Ω and a finite set of relations R where every relation R ∈ R is a set of pairs of objects, so R ⊆ Ω × Ω. [sent-13, score-0.635]
10 1 is “discriminative” because it compares the distance from RA to each correct answer with the distances from RA to all possible answers. [sent-16, score-0.131]
11 The cost function then represents the sum of the negative log probabilities of picking the correct answers to questions of the form (A,? [sent-19, score-0.237]
12 ) ∈ R if we pick answers stochastically in proportion to their probability densities under the spherical Gaussian centered at RA. [sent-20, score-0.164]
13 We say that LRE accurately models a set of objects and relations if its answers to queries of the form (A, ? [sent-21, score-0.652]
14 ) ∈ R are correct, which means that for each object A and relation R such that there are k objects X satisfying (A, X) ∈ R, each vector representation X of each such object X must be among the k closest vector representations to RA. [sent-22, score-0.442]
15 ) ∈ R that has k correct answers (that is, (A, B) was removed from R during LRE’s learning), yet LRE answers the query (A, ? [sent-29, score-0.283]
16 ) ∈ R correctly by placing B among the k closest object representations to RA, then we can claim that LRE’s representation generalizes. [sent-30, score-0.164]
17 If the representation is high-dimensional, then LRE can easily represent any set of relations that is not too large, so its inductive bias finds all sets of relations plausible, which prevents generalization from being good. [sent-34, score-0.905]
18 Paccanaro and Hinton [10] show that lowdimensional LRE exhibits excellent generalization on datasets such as the family relations task. [sent-36, score-0.479]
19 A drawback of LRE is that the square matrices it uses to represent relations are quadratically more cumbersome than the vectors it uses to represent objects. [sent-39, score-0.567]
20 More importantly, it also means that relations cannot themselves be treated as objects. [sent-41, score-0.372]
21 In this paper we describe “Matrix Relational Embedding” (MRE), which is a version of LRE that uses matrices as the representation for objects as well as for relations. [sent-43, score-0.299]
22 1 We have also experimented with a version of LRE that learns to generate a learned matrix representation of a relation from a learned vector representation of the relation. [sent-49, score-0.324]
23 This too makes it possible to treat relations as objects because they both have vector representations. [sent-50, score-0.516]
24 However, it is less straightforward than simply representing objects by matrices and it does not generalize quite as well. [sent-51, score-0.267]
25 It can also represent relations involving an object and a relation, for instance (3, +3) ∈ plus. [sent-53, score-0.486]
26 Formally, we are given a finite set of higher-order rela˜ ˜ ˜ tions R, where a higher-order relation R ∈ R is a relation whose arguments can be relations as well ˜ ⊆ R × R or R ⊆ Ω × R (R is the set of the basic relations). [sent-54, score-0.679]
27 ˜ as objects, which we formalize as R The matrix representation of MRE allows it to treat relations in (almost) the same way it treats basic objects, so there is no difficulty representing relations whose arguments are also relations. [sent-55, score-0.92]
28 ) ∈ +3 even though the training set contains no examples of the basic relation +3. [sent-57, score-0.215]
29 It is told that (3, +3) ∈ plus and it figures out what plus means from higher-order examples of the form (2, +2) ∈ plus and basic examples of the form (3, 5) ∈ +2. [sent-59, score-0.316]
30 This enables MRE to understand a relation from an “analogical definition”: if it is told that has f ather to has mother is like has brother to has sister, etc. [sent-60, score-0.335]
31 , then MRE can answer queries involving has f ather based on this analogical information alone. [sent-61, score-0.232]
32 Finally, we show that MRE can learn new relations after an initial set of objects and relations has already been learned and the learned matrices have been fixed. [sent-62, score-1.069]
33 This shows that MRE can add new knowledge to previously acquired propositions without the need to relearn the original propositions. [sent-63, score-0.185]
34 We believe that MRE is the first gradient-descent learning system that can learn new relations from definitions, including learning the meanings of the terms used in the definitions. [sent-64, score-0.394]
35 Some of the existing connectionist models for representing and learning relations and analogies [2, 4] are able to detect new relations and to represent hierarchical relations of high complexity. [sent-66, score-1.175]
36 They differ by using temporal synchrony for explicitly representing the binding of the relations to object, and, more importantly, do not use distributed representations for representing the relations themselves. [sent-67, score-0.854]
37 2 The modular arithmetic task Paccanaro and Hinton [10] describe a very simple modular arithmetic task in which the 10 objects are the numbers from 0 to 9 and the 9 relations are +0 to +4 and −1 to −4. [sent-68, score-1.274]
38 Linear Relational Embedding easily learns this task using two-dimensional vectors for the numbers and 2 × 2 matrices for the relations. [sent-69, score-0.167]
39 It arranges the numbers in a circle centered at the origin and uses rotation matrices to implement the relations. [sent-70, score-0.164]
40 We used base 12 modular arithmetic, thus there are 12 objects, and made the task much more difficult by using both the twelve relations +0 to +11 and the twelve relations ×0 to ×11. [sent-71, score-0.98]
41 We did not include subtraction and division because in modular arithmetic every proposition involving subtraction or division is equivalent to one involving addition or multiplication. [sent-72, score-0.487]
42 There are 288 propositions in the modular arithmetic ntask. [sent-73, score-0.522]
43 We tried matrices of various sizes and discovered that 4 × 4 matrices gave the best generalization when some of the cases are held-out. [sent-74, score-0.259]
44 We held-out 30, 60, or 90 test cases chosen at random and used the remaining cases to learn the realvalued entries of the 12 matrices that represent numbers and the 24 matrices that represent relations. [sent-75, score-0.414]
45 We computed the gradient of the cost function on all of the training cases before updating the parameters, and initialized the parameters by a random sample from a spherical Gaussian with unit variance 2 on each dimension. [sent-81, score-0.169]
46 01 i wi to the cost function, where i indexes all of the entries in the matrices for objects and relations. [sent-83, score-0.277]
47 errors on 5 test sets mean test error (30) 0 0 0 0 0 0. [sent-92, score-0.175]
48 0 Table 1: Test results on the basic modular arithmetic task. [sent-95, score-0.406]
49 Each test query has 12 possible answers of which 1 is correct, so random guessing should be incorrect on at least 90% of the test cases. [sent-98, score-0.255]
50 Separate experiments showed that 2 × 2 matrices were sufficient for learning either the mod 3 or the mod 4 version of our modular arithmetic task, so the mod 12 version can clearly be done using a pair of 2 × 2 matrices for each number or relation. [sent-103, score-0.612]
51 Notice that for the last four relations there are people in the families in figure 1(a) for whom there are two different correct answers to the question (A,? [sent-106, score-0.523]
52 When there are N correct answers, the best way to maximize the sum of the log probabilities of picking the correct answer on each of the N cases is to produce an output matrix that is equidistant from the N correct answers and far from all other answers. [sent-108, score-0.436]
53 If we count cases with two correct answers as two different cases the family trees task has 112 cases. [sent-110, score-0.422]
54 We used precisely the same learning procedure and weight-decay as for the modular arithmetic task. [sent-111, score-0.337]
55 We held-out 10, 20, or 30 randomly selected cases as test cases, and we repeated the random selection of the test cases five times. [sent-112, score-0.184]
56 Table 2 shows the number of errors on the test cases when 4 × 4 matrices are learned for each person and for each relation. [sent-113, score-0.292]
57 MRE generalizes much better than the Test results for the basic family trees task. [sent-114, score-0.252]
58 errors on 5 test sets mean test error (10) 0 0 0 0 2 0. [sent-115, score-0.175]
59 0 Table 2: Test results on the basic family trees task. [sent-118, score-0.225]
60 Each test query has 24 possible answers, of which at most 2 objects are considered correct. [sent-122, score-0.226]
61 feedforward neural network used by [3] which typically gets one or two test cases wrong even when only four test cases are held-out. [sent-124, score-0.24]
62 It also generalizes much better than all of the many variations of the learning algorithms used by [8] for the family trees task. [sent-125, score-0.183]
63 These variations cannot achieve zero test errors even when only four test cases are held-out and the cases are chosen to facilitate generalization. [sent-126, score-0.259]
64 5 The higher-order modular arithmetic task We used a version of the modular arithmetic task in which the only basic relations were {+0, +2, . [sent-127, score-1.177]
65 , +11}, but we also included the higher-order relations plus, minus, inverse consisting of 36 propositions, examples of which are (3, +3) ∈ plus; (3, +9) ∈ minus; (+3, +9) ∈ inverse. [sent-130, score-0.372]
66 We then held-out all of the examples of one of the basic relations and trained 4 × 4 matrices on all of the other basic relations plus all of the higher-order relations. [sent-131, score-1.041]
67 Our first attempt to demonstrate that MRE could generalize from higher-order relations to basic relations failed: the generalization was only slightly better than chance. [sent-132, score-0.848]
68 When learning the higher-order training case (3, +3) ∈ plus it is not necessary for the product of the matrix representing 3 and the matrix representing plus to be exactly equal to the matrix representing +3. [sent-135, score-0.367]
69 In cases like the one shown in figure 1(b), the relative probability of the point B under a Gaussian centered at RA is increased by moving RA up, because this lowers the unnormalized probabilities of C and D by a greater proportion than it lowers the unnormalized probability of B. [sent-137, score-0.178]
70 The discriminative objective function prevents all of the representations collapsing to the same point, but it does not force the matrix products to be exactly equal to the correct answer. [sent-138, score-0.183]
71 Even when using this non-discriminative cost function for training the higher-order relations, the matrices could not all collapse to zero because the discriminative cost function was still being used for training the basic relations. [sent-141, score-0.325]
72 errors on 5 test sets mean test error +1 (12) 5 0 0 0 0 1. [sent-144, score-0.175]
73 6 Table 3: Test results on the higher-order arithmetic task. [sent-148, score-0.198]
74 Each row shows the number of incorrectly answered queries involving a relation (i. [sent-149, score-0.205]
75 , +1, +4, +6, or +10) all of whose basic examples were removed from MRE’s training data, so MRE’s knowledge of this relation was entirely from the other higher-order relations. [sent-151, score-0.215]
76 errors on 5 test sets mean test error has father (12) 0 12 0 0 0 2. [sent-156, score-0.29]
77 6 Table 4: Test results for the higher-order family trees task. [sent-160, score-0.156]
78 In each row, all basic propositions involving a relation are held-out (i. [sent-161, score-0.423]
79 Each row shows the number of errors MRE makes on these held-out propositions on 5 different learning runs from different initial random parameters. [sent-164, score-0.26]
80 The only information MRE has on these relations is in the form of a single higher-order relation, higher oppsex. [sent-165, score-0.402]
81 6 The higher-order family trees task To demonstrate that similar performance is obtained on family trees task when higher-order relations are used, we included in addition to the 112 basic relations the higher-order relation higher oppsex. [sent-168, score-1.336]
82 To define higher oppsex we observe that many relations have natural male and natural female versions, as in: mother-father, nephew-niece, uncle-aunt, brother-sister, husband-wife, and sondaughter. [sent-169, score-0.517]
83 We say that (A, B) ∈ higher oppsex for relations A and B if A and B can be seen as natural counterparts in this sense. [sent-170, score-0.517]
84 Four of the twelve examples of higher oppsex are given below: 1. [sent-171, score-0.178]
85 (has sister, has brother) ∈ higher oppsex We performed an analogous test to that in the previous section on the higher order modular arithmetic task, using exactly the same learning procedure and learning parameters. [sent-175, score-0.562]
86 The family trees task and its higher-order variant may appear difficult for systems such as MRE or LRE because of the logical nature of the task, which is made apparent by hard rules such as (A, B) ∈ has father, (A, C) ∈ has brother ⇒ (C, B) ∈ has father. [sent-177, score-0.3]
87 Instead, it “precomputes the answers” to all queries during training, by finding the matrix representation that models its training set. [sent-181, score-0.138]
88 errors on 5 test sets mean test error +1 (12) 0 0 0 2 4 1. [sent-187, score-0.175]
89 4 has has has has The sequential higher-order family trees task. [sent-191, score-0.156]
90 errors on 5 test sets mean test error father (12) 0 0 0 10 0 2. [sent-192, score-0.29]
91 0 Table 5: Test results for the higher-order arithmetic task (top) and the higher-order family trees task (bottom) when a held-out basic relation is learned from higher-order propositions after the rest of the objects and relations have been learned and fixed. [sent-196, score-1.373]
92 Each entry shows the number of test errors, and the number of test cases is written in brackets. [sent-198, score-0.142]
93 This does not mean that MRE can deal with general logical data of this kind, because MRE will fail when there are many relations that have many special cases. [sent-201, score-0.403]
94 The special cases will prevent MRE from finding low dimensional matrices that fit the data well and cause it to generalize much more poorly. [sent-202, score-0.133]
95 7 Adding knowledge incrementally The previous section shows that MRE can learn to apply a basic relation correctly even though the training set only contains higher-order propositions about the relation. [sent-203, score-0.444]
96 After learning some objects, basic relations, and higher-order relations, we freeze the weights in all of the matrices and learn the matrix for a new relation from a few higherorder propositions. [sent-205, score-0.337]
97 Table 5 shows that this works about as well as learning all of the propositions at the same time. [sent-206, score-0.185]
98 The input vectors R and A represent a relation and an object using a one-of-N encoding. [sent-208, score-0.183]
99 If the outgoing weights from the two active input units are set to R and A, these localist representations are converted into activity patterns in the first hidden layer that represent the matrices R and A. [sent-209, score-0.24]
100 The fact that MRE generalizes much better than a standard feedforward neural network on the family trees task is due to two features. [sent-217, score-0.27]
wordName wordTfidf (topN-words)
[('lre', 0.445), ('mre', 0.433), ('relations', 0.372), ('ra', 0.334), ('arithmetic', 0.198), ('propositions', 0.185), ('objects', 0.144), ('modular', 0.139), ('relation', 0.119), ('father', 0.115), ('oppsex', 0.115), ('sister', 0.101), ('answers', 0.1), ('matrices', 0.091), ('paccanaro', 0.087), ('relational', 0.087), ('trees', 0.084), ('brother', 0.082), ('answer', 0.08), ('errors', 0.075), ('family', 0.072), ('basic', 0.069), ('plus', 0.068), ('aunt', 0.066), ('nephew', 0.066), ('wife', 0.066), ('mother', 0.058), ('hinton', 0.057), ('correct', 0.051), ('involving', 0.05), ('test', 0.05), ('ij', 0.049), ('representations', 0.046), ('symbolic', 0.045), ('husband', 0.043), ('told', 0.043), ('cost', 0.042), ('cases', 0.042), ('representation', 0.039), ('spherical', 0.038), ('inductive', 0.037), ('bij', 0.037), ('object', 0.037), ('queries', 0.036), ('matrix', 0.036), ('generalization', 0.035), ('multiplication', 0.034), ('learned', 0.034), ('twelve', 0.033), ('analogical', 0.033), ('ather', 0.033), ('penelope', 0.033), ('feedforward', 0.032), ('logic', 0.032), ('query', 0.032), ('representing', 0.032), ('task', 0.031), ('mod', 0.031), ('logical', 0.031), ('higher', 0.03), ('alberto', 0.029), ('hummel', 0.029), ('lowers', 0.029), ('ilya', 0.029), ('layer', 0.028), ('units', 0.028), ('generalizes', 0.027), ('represent', 0.027), ('training', 0.027), ('discriminative', 0.027), ('embedding', 0.027), ('centered', 0.026), ('minus', 0.026), ('unnormalized', 0.026), ('subtraction', 0.025), ('picking', 0.025), ('uses', 0.025), ('mappings', 0.024), ('explicit', 0.024), ('network', 0.024), ('guessing', 0.023), ('prevents', 0.023), ('humans', 0.023), ('relationships', 0.023), ('learns', 0.023), ('numbers', 0.022), ('cij', 0.022), ('correctly', 0.022), ('learn', 0.022), ('table', 0.021), ('softmax', 0.021), ('causes', 0.021), ('outgoing', 0.02), ('gradient', 0.02), ('closest', 0.02), ('target', 0.02), ('questions', 0.019), ('representational', 0.019), ('christopher', 0.019), ('novelty', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 248 nips-2008-Using matrices to model symbolic relationship
Author: Ilya Sutskever, Geoffrey E. Hinton
Abstract: We describe a way of learning matrix representations of objects and relationships. The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. We further demonstrate that the system understands how higher-order propositions are related to first-order ones by showing that it can correctly answer questions about first-order propositions involving the relations +3 or has wife even though it has not been trained on any first-order examples involving these relations. 1
2 0.20657167 144 nips-2008-Multi-resolution Exploration in Continuous Spaces
Author: Ali Nouri, Michael L. Littman
Abstract: The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would benefit from additional samples. We demonstrate MRE’s broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches. 1
3 0.12375181 61 nips-2008-Diffeomorphic Dimensionality Reduction
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
4 0.087884165 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
Author: Thomas L. Griffiths, Joseph L. Austerweil
Abstract: Almost all successful machine learning algorithms and cognitive models require powerful representations capturing the features that are relevant to a particular problem. We draw on recent work in nonparametric Bayesian statistics to define a rational model of human feature learning that forms a featural representation from raw sensory data without pre-specifying the number of features. By comparing how the human perceptual system and our rational model use distributional and category information to infer feature representations, we seek to identify some of the forces that govern the process by which people separate and combine sensory primitives to form features. 1
5 0.06986443 134 nips-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: In many settings, such as protein interactions and gene regulatory networks, collections of author-recipient email, and social networks, the data consist of pairwise measurements, e.g., presence or absence of links between pairs of objects. Analyzing such data with probabilistic models requires non-standard assumptions, since the usual independence or exchangeability assumptions no longer hold. In this paper, we introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) with a local model to instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodel with applications to social networks and protein interaction networks. 1
6 0.06380178 10 nips-2008-A rational model of preference learning and choice prediction by children
7 0.061709858 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
8 0.053695578 236 nips-2008-The Mondrian Process
9 0.046837877 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
10 0.041456297 73 nips-2008-Estimating Robust Query Models with Convex Optimization
11 0.040946394 113 nips-2008-Kernelized Sorting
12 0.038160264 48 nips-2008-Clustering via LP-based Stabilities
13 0.036114208 138 nips-2008-Modeling human function learning with Gaussian processes
14 0.036110204 36 nips-2008-Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree
15 0.035733934 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
16 0.035262927 118 nips-2008-Learning Transformational Invariants from Natural Movies
17 0.035213117 103 nips-2008-Implicit Mixtures of Restricted Boltzmann Machines
18 0.035079949 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
19 0.034885861 23 nips-2008-An ideal observer model of infant object perception
20 0.034462661 4 nips-2008-A Scalable Hierarchical Distributed Language Model
topicId topicWeight
[(0, -0.123), (1, -0.01), (2, 0.044), (3, -0.045), (4, 0.038), (5, -0.04), (6, 0.033), (7, 0.003), (8, -0.005), (9, -0.01), (10, -0.012), (11, 0.002), (12, -0.042), (13, 0.014), (14, 0.04), (15, 0.017), (16, 0.031), (17, -0.059), (18, 0.111), (19, 0.08), (20, -0.043), (21, -0.044), (22, 0.12), (23, 0.005), (24, -0.031), (25, -0.063), (26, -0.007), (27, 0.076), (28, -0.007), (29, 0.125), (30, -0.039), (31, 0.125), (32, -0.094), (33, 0.085), (34, 0.049), (35, 0.118), (36, 0.139), (37, 0.242), (38, -0.035), (39, -0.159), (40, 0.058), (41, -0.049), (42, 0.051), (43, -0.119), (44, 0.03), (45, 0.019), (46, -0.016), (47, -0.246), (48, -0.059), (49, -0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.9410516 248 nips-2008-Using matrices to model symbolic relationship
Author: Ilya Sutskever, Geoffrey E. Hinton
Abstract: We describe a way of learning matrix representations of objects and relationships. The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. We further demonstrate that the system understands how higher-order propositions are related to first-order ones by showing that it can correctly answer questions about first-order propositions involving the relations +3 or has wife even though it has not been trained on any first-order examples involving these relations. 1
2 0.61534917 144 nips-2008-Multi-resolution Exploration in Continuous Spaces
Author: Ali Nouri, Michael L. Littman
Abstract: The essence of exploration is acting to try to decrease uncertainty. We propose a new methodology for representing uncertainty in continuous-state control problems. Our approach, multi-resolution exploration (MRE), uses a hierarchical mapping to identify regions of the state space that would benefit from additional samples. We demonstrate MRE’s broad utility by using it to speed up learning in a prototypical model-based and value-based reinforcement-learning method. Empirical results show that MRE improves upon state-of-the-art exploration approaches. 1
3 0.57452136 61 nips-2008-Diffeomorphic Dimensionality Reduction
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
4 0.47522607 134 nips-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: In many settings, such as protein interactions and gene regulatory networks, collections of author-recipient email, and social networks, the data consist of pairwise measurements, e.g., presence or absence of links between pairs of objects. Analyzing such data with probabilistic models requires non-standard assumptions, since the usual independence or exchangeability assumptions no longer hold. In this paper, we introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) with a local model to instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodel with applications to social networks and protein interaction networks. 1
5 0.44519961 10 nips-2008-A rational model of preference learning and choice prediction by children
Author: Christopher G. Lucas, Thomas L. Griffiths, Fei Xu, Christine Fawcett
Abstract: Young children demonstrate the ability to make inferences about the preferences of other agents based on their choices. However, there exists no overarching account of what children are doing when they learn about preferences or how they use that knowledge. We use a rational model of preference learning, drawing on ideas from economics and computer science, to explain the behavior of children in several recent experiments. Specifically, we show how a simple econometric model can be extended to capture two- to four-year-olds’ use of statistical information in inferring preferences, and their generalization of these preferences. 1
6 0.42135811 23 nips-2008-An ideal observer model of infant object perception
7 0.42049673 236 nips-2008-The Mondrian Process
8 0.39982945 26 nips-2008-Analyzing human feature learning as nonparametric Bayesian inference
9 0.34809718 225 nips-2008-Supervised Bipartite Graph Inference
10 0.31530538 3 nips-2008-A Massively Parallel Digital Learning Processor
11 0.30656189 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
12 0.30276418 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
13 0.29660201 31 nips-2008-Bayesian Exponential Family PCA
14 0.27894485 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
15 0.27612501 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
16 0.26302209 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
17 0.25674042 101 nips-2008-Human Active Learning
18 0.25475743 124 nips-2008-Load and Attentional Bayes
19 0.2540153 211 nips-2008-Simple Local Models for Complex Dynamical Systems
20 0.25391686 41 nips-2008-Breaking Audio CAPTCHAs
topicId topicWeight
[(6, 0.037), (7, 0.103), (12, 0.015), (28, 0.122), (57, 0.1), (59, 0.013), (63, 0.03), (74, 0.335), (77, 0.045), (78, 0.017), (83, 0.064)]
simIndex simValue paperId paperTitle
same-paper 1 0.71388018 248 nips-2008-Using matrices to model symbolic relationship
Author: Ilya Sutskever, Geoffrey E. Hinton
Abstract: We describe a way of learning matrix representations of objects and relationships. The goal of learning is to allow multiplication of matrices to represent symbolic relationships between objects and symbolic relationships between relationships, which is the main novelty of the method. We demonstrate that this leads to excellent generalization in two different domains: modular arithmetic and family relationships. We show that the same system can learn first-order propositions such as (2, 5) ∈ +3 or (Christopher, Penelope) ∈ has wife, and higher-order propositions such as (3, +3) ∈ plus and (+3, −3) ∈ inverse or (has husband, has wife) ∈ higher oppsex. We further demonstrate that the system understands how higher-order propositions are related to first-order ones by showing that it can correctly answer questions about first-order propositions involving the relations +3 or has wife even though it has not been trained on any first-order examples involving these relations. 1
2 0.69816422 172 nips-2008-Optimal Response Initiation: Why Recent Experience Matters
Author: Matt Jones, Sachiko Kinoshita, Michael C. Mozer
Abstract: In most cognitive and motor tasks, speed-accuracy tradeoffs are observed: Individuals can respond slowly and accurately, or quickly yet be prone to errors. Control mechanisms governing the initiation of behavioral responses are sensitive not only to task instructions and the stimulus being processed, but also to the recent stimulus history. When stimuli can be characterized on an easy-hard dimension (e.g., word frequency in a naming task), items preceded by easy trials are responded to more quickly, and with more errors, than items preceded by hard trials. We propose a rationally motivated mathematical model of this sequential adaptation of control, based on a diffusion model of the decision process in which difficulty corresponds to the drift rate for the correct response. The model assumes that responding is based on the posterior distribution over which response is correct, conditioned on the accumulated evidence. We derive this posterior as a function of the drift rate, and show that higher estimates of the drift rate lead to (normatively) faster responding. Trial-by-trial tracking of difficulty thus leads to sequential effects in speed and accuracy. Simulations show the model explains a variety of phenomena in human speeded decision making. We argue this passive statistical mechanism provides a more elegant and parsimonious account than extant theories based on elaborate control structures. 1
3 0.66537982 114 nips-2008-Large Margin Taxonomy Embedding for Document Categorization
Author: Kilian Q. Weinberger, Olivier Chapelle
Abstract: Applications of multi-class classification, such as document categorization, often appear in cost-sensitive settings. Recent work has significantly improved the state of the art by moving beyond “flat” classification through incorporation of class hierarchies [4]. We present a novel algorithm that goes beyond hierarchical classification and estimates the latent semantic space that underlies the class hierarchy. In this space, each class is represented by a prototype and classification is done with the simple nearest neighbor rule. The optimization of the semantic space incorporates large margin constraints that ensure that for each instance the correct class prototype is closer than any other. We show that our optimization is convex and can be solved efficiently for large data sets. Experiments on the OHSUMED medical journal data base yield state-of-the-art results on topic categorization. 1
4 0.51233697 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
Author: Indraneel Mukherjee, David M. Blei
Abstract: Hierarchical probabilistic modeling of discrete data has emerged as a powerful tool for text analysis. Posterior inference in such models is intractable, and practitioners rely on approximate posterior inference methods such as variational inference or Gibbs sampling. There has been much research in designing better approximations, but there is yet little theoretical understanding of which of the available techniques are appropriate, and in which data analysis settings. In this paper we provide the beginnings of such understanding. We analyze the improvement that the recently proposed collapsed variational inference (CVB) provides over mean field variational inference (VB) in latent Dirichlet allocation. We prove that the difference in the tightness of the bound on the likelihood of a document decreases as O(k − 1) + log m/m, where k is the number of topics in the model and m is the number of words in a document. As a consequence, the advantage of CVB over VB is lost for long documents but increases with the number of topics. We demonstrate empirically that the theory holds, using simulated text data and two text corpora. We provide practical guidelines for choosing an approximation. 1
5 0.49841535 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
6 0.495116 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
7 0.49451944 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
8 0.49319977 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
9 0.49239314 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification
10 0.49234903 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
11 0.49199986 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
12 0.49098504 66 nips-2008-Dynamic visual attention: searching for coding length increments
13 0.49049184 62 nips-2008-Differentiable Sparse Coding
14 0.48980999 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
15 0.48975435 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
16 0.48894194 194 nips-2008-Regularized Learning with Networks of Features
17 0.48741305 138 nips-2008-Modeling human function learning with Gaussian processes
18 0.48737368 221 nips-2008-Stochastic Relational Models for Large-scale Dyadic Data using MCMC
19 0.48599547 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
20 0.48512891 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks