nips nips2010 nips2010-155 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
Reference: text
sentIndex sentText sentNum sentScore
1 au Abstract This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. [sent-5, score-0.952]
2 The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations. [sent-6, score-0.519]
3 In part, this context specificity reflects the tendency for people to organize knowledge into independent “bundles” which may contain contradictory information, and which may be deemed appropriate to different contexts. [sent-10, score-0.677]
4 This phenomenon is called knowledge partitioning [2–6], and is observed in artificial category learning experiments as well as real world situations. [sent-11, score-0.51]
5 Context induced knowledge partitioning poses a challenge to models of human learning. [sent-13, score-0.374]
6 This paper explores the possibility that Bayesian models of human category learning can provide the missing explanation. [sent-15, score-0.268]
7 This model is then shown to provide a parsimonious and psychologically appealing account of the knowledge partitioning effect. [sent-17, score-0.375]
8 Following this, a hierarchical extension is introduced to the model, which allows it to acquire abstract knowledge about the context specificity of the categories, in a manner that is consistent with the data on human learning. [sent-18, score-0.673]
9 2 Learning categories in context This section outlines a Bayesian model that is sensitive to the learning context. [sent-19, score-0.592]
10 It extends Anderson’s [7] rational model of categorization (RMC) by allowing the model to track the context in which observations are made, and draw inferences about the role that context plays. [sent-20, score-1.002]
11 If zi denotes the cluster to which the ith observation is assigned, then the joint prior 1 distribution over zn = (z1 , . [sent-23, score-0.523]
12 (1) Each cluster of observations is mapped onto a distribution over features. [sent-27, score-0.283]
13 While independence is reasonable when stimulus dimensions are separable [9], knowledge partitioning can occur regardless of whether dimensions are separable or integral (see [6] for details), so the more general formulation is useful. [sent-36, score-0.358]
14 Each cluster is associated with a distribution over category labels. [sent-38, score-0.434]
15 If ℓi denotes the label given to the ith observation, then ℓi θk | zi = k, θk | β ∼ Bernoulli(θk ) ∼ Beta(β, β) (3) The β parameter describes the extent to which items in the same cluster are allowed to have different labels. [sent-39, score-0.758]
16 The extension to handle context dependence is straightforward: contextual information is treated as an auxiliary feature, and so each cluster is linked to a distribution over contexts. [sent-42, score-0.656]
17 In the experiments considered later, each observation is assigned to a context individually, which allows us to apply the exact same model for contextual features as regular ones. [sent-43, score-0.468]
18 Thus a very simple context model is sufficient: ci φk | zi = k, φk | γ ∼ ∼ Bernoulli(φk ) Beta(γ, γ) (4) The context specificity parameter γ is analogous to β and controls the extent to which clusters can include observations made in different contexts. [sent-44, score-1.226]
19 In more general contexts, a richer model would be required to capture the manner in which context can vary. [sent-45, score-0.418]
20 Firstly, since the categories do not overlap in the experiments discussed here it makes sense to set β = 0, which has the effect of forcing each cluster to be associated only with one category. [sent-47, score-0.288]
21 Secondly, human learners rarely have strong prior knowledge about the features used in artificial category learning experiments, expressed by setting κ0 = 1 and ν0 = 3 (ν0 is larger to ensure that the priors over features always has a well defined covariance structure). [sent-48, score-0.458]
22 Having made these choices, we may restrict our attention to α (the bias to introduce new clusters) and γ (the bias to treat clusters as context general). [sent-50, score-0.482]
23 2 Inference in the model Inference is performed via a collapsed Gibbs sampler, integrating out φ, θ, µ and Σ and defining a sampler only over the cluster assignments z. [sent-52, score-0.375]
24 To do so, note that P (zi = k|x, ℓ, c, z−i ) ∝ P (xi , ℓi , ci |x−i , ℓ−i , c−i , z−i , zi = k)P (zi = k|z−i ) = P (xi |x−i , z−i , zi = k)P (ℓi |ℓ−i , z−i , zi = k) P (ci |c−i , z−i , zi = k)P (zi = k|z−i ) (5) (6) where the dependence on the parameters that describe the prior (i. [sent-53, score-0.746]
25 In this expression z−i denotes the set of all cluster assignments 2 except the ith, and the normalizing term is calculated by summing Equation 6 over all possible cluster assignments k, including the possibility that the ith item is assigned to an entirely new cluster. [sent-56, score-0.713]
26 The conditional prior probability P (zi = k|z−i ) is P (zi = k|z−i ) = nk n−1+α α n−1+α if k is old if k is new (7) where nk counts the number of items (not including the ith) that have been assigned to the kth cluster. [sent-57, score-0.802]
27 A similar result applies to the labelling scheme: (ℓ ) 1 P (ℓi |ℓ−i , z−i , zi = k) = 0 P (ℓi |θk , zi = k)P (θk |ℓ−i , z−i ) dθk = nk i + β nk + 2β (9) (ℓ ) where nk i counts the number of observations that have been assigned to cluster k and given the same label as observation i. [sent-59, score-1.506]
28 Taken together, Equations 6, 8, 9 and 11 suggest a simple a Gibbs sampler over the cluster assignments z. [sent-65, score-0.345]
29 Cluster assignments zi are initialized randomly, and are then sequentially redrawn from the conditional posterior distribution in Equation 6. [sent-66, score-0.286]
30 3 Application to knowledge partitioning experiments To illustrate the behavior of the model, consider the most typical example of a knowledge partitioning experiment [3, 4, 6]. [sent-69, score-0.616]
31 There are two categories organized into an “inside-outside” structure, with one category (black circles/squares) occupying a region along either side of the other one (white circles/squares). [sent-73, score-0.258]
32 In Figure 1a, squares correspond to items presented in one context, and circles to items presented in the other context. [sent-75, score-0.366]
33 Participants are trained on these items in a standard supervised categorization experiment: stimuli are presented one at a time (with the context variable), and participants are asked to predict the category label. [sent-76, score-1.052]
34 Percentages refer to the probability of selecting category label A. [sent-79, score-0.264]
35 At this point, participants are shown transfer items (the crosses in Figure 1a), and asked what category label these items should be given. [sent-81, score-0.857]
36 Critically, each transfer item is presented in both contexts, to determine whether people generalize in a context specific way. [sent-83, score-0.61]
37 Some participants are context insensitive (lower two panels) and their predictions about the transfer items do not change as a function of context. [sent-86, score-0.848]
38 However, other participants are context sensitive (upper panels) and adopt a very different strategy depending on which context the transfer item is presented in. [sent-87, score-1.129]
39 This is taken to imply [3, 4, 6] that the context sensitive participants have learned a conceptual representation in which knowledge is “partitioned” into different bundles, each associated with a different context. [sent-88, score-0.811]
40 1 Learning the knowledge partition The initial investigation focused on what category representations the model learns, as a function of α and γ. [sent-90, score-0.368]
41 In the four cluster solution (panel b, small γ), the clusters never aggregate across items observed in different contexts. [sent-92, score-0.509]
42 In contrast, the three cluster solution (panel a, larger γ) is more context general, and collapses category B into a single cluster. [sent-93, score-0.822]
43 As a result, for α > 1 the model tends not to produce the three cluster solution. [sent-95, score-0.262]
44 The next aim was to quantify the extent to which γ influences the relative prevalence of the four cluster solution versus the three cluster solution. [sent-100, score-0.53]
45 Since the adjusted Rand index measures the extent to which any given pair of items are classified in the same way by the two solutions, it is a natural measure of how close a model-generated solution is to one of the two idealized solutions. [sent-102, score-0.34]
46 2 0 0 5 10 15 gamma (a) (b) (c) Figure 2: The two different clustering schemes produced by the context sensitive RMC, and the values of γ that produce them (for α fixed at 0. [sent-107, score-0.474]
47 7) the four cluster solution is extremely dominant whereas at larger values the three cluster solution is preferred. [sent-114, score-0.464]
48 One of the most desirable characteristics is the fact that the partitioning of the learners knowledge is made explicit. [sent-118, score-0.366]
49 That is, the model learns a much more differentiated and context bound representation when γ is small, and a more context general and less differentiated representation when γ is large. [sent-119, score-0.948]
50 During training, the model learns to weight each of the rule modules differently depending on context, thereby producing context specific generalizations. [sent-122, score-0.536]
51 As such, ATRIUM learns the context dependence, but not the knowledge partition itself. [sent-126, score-0.592]
52 2 Generalizing in context-specific and context-general ways The discussion to this point shows how the value of γ shapes the conceptual knowledge that the model acquires, but has not looked at what generalizations the model makes. [sent-128, score-0.294]
53 However, it is straightforward to show that varying γ does allow the context sensitive RMC to capture the two generalization patterns in Figure 1. [sent-129, score-0.474]
54 With this in mind, Figure 3 plots the generalizations made by the model for two different levels of context specificity (γ = 0 and γ = 10) and for the two different clustering solutions. [sent-130, score-0.495]
55 As is clear from inspection – and verified by the squared correlations listed in the Figure caption – when γ is small the model generalizes in a context specific manner, but when γ is large the generalizations are the same in all contexts. [sent-132, score-0.53]
56 This happens for both clustering solutions, which implies that γ plays two distinct but related roles, insofar as it influences the context specificity of both the learned knowledge partition and the generalizations to new observations. [sent-133, score-0.601]
57 4 Acquiring abstract knowledge about context specificity One thing missing from both ATRIUM and the RMC is an explanation for how the leaner decides whether context specific or context general representations are appropriate. [sent-134, score-1.257]
58 In both cases, the model has free parameters that govern the switch between the two cases, and these parameters must be 5 γ=0 context 1 (c) (b) context 2 4 clusters (a) γ = 10 context 2 4 clusters context 1 3 clusters 3 clusters (d) Figure 3: Generalizations made by the model. [sent-135, score-1.958]
59 1% of the variance in the context sensitive data, but only 35. [sent-137, score-0.474]
60 6% of the variance in the context sensitive data is explained, whereas 67. [sent-143, score-0.474]
61 1% of the context insensitive data can be accounted for. [sent-144, score-0.438]
62 To answer this, note that if the context varies in a systematic fashion, an intelligent learner might come to suspect that the context matters, and would be more likely to decide to generalize in a context specific way. [sent-152, score-1.433]
63 On the other hand, if there are no systematic patterns to the way that observations are distributed across contexts, then the learner should deem the context to be irrelevant and hence decide to generalize broadly across contexts. [sent-153, score-0.678]
64 One condition of this experiment was a standard knowledge partitioning experiment, identical in every meaningful respect to the data described earlier in this paper. [sent-156, score-0.358]
65 As is typical for such experiments, knowledge partitioning was observed for at least some of the participants. [sent-157, score-0.308]
66 In the other condition, however, the context variable was randomized: each of the training items was assigned to a randomly chosen context. [sent-158, score-0.621]
67 What this implies is that human learners use the systematicity of the context as a cue to determine how broadly to generalize. [sent-160, score-0.512]
68 As such, the model should learn that γ is small when the context varies systematically; and similarly should learn that γ is large if the context is random. [sent-161, score-0.836]
69 1 A hierarchical context-sensitive RMC Extending the statistical model is straightforward: we place priors over γ, and allow the model to infer a joint posterior distribution over the cluster assignments z and the context specificity γ. [sent-164, score-0.873]
70 This is closely related to other hierarchical Bayesian models of category learning [15–19]. [sent-165, score-0.275]
71 6 1000 systematic context randomized context frequency 800 600 400 200 0 −4 −3 −2 −1 log (γ) 0 1 2 10 Figure 4: Learned distributions over γ in the systematic (dark rectangles) and randomized (light rectangles) conditions, plotted on a logarithmic scale. [sent-170, score-1.102]
72 The acceptance probabilities for the Metropolis sampler may be calculated by observing that P (γ|x, ℓ, c, z) ∝ ∝ = P (x, ℓ, c|z, γ)P (γ) P (c|z, γ)P (γ) P (c|z, φ)P (φ|γ) dφ P (γ) K = (14) (15) (16) 1 P (c(k) |φk )P (φk |γ) dφk P (γ) k=1 (17) 0 K = ∝ exp(−λγ) nk ! [sent-176, score-0.307]
73 k=1 nk K (c=1) (c=2) B(nk + γ, nk + γ) B(γ, γ) k=1 (c=2) + γ, nk B(γ, γ) (c=j) where B(a, b) = Γ(a)Γ(b)/Γ(a + b) denotes the beta function, and nk items in cluster k that appeared in context j. [sent-179, score-1.838]
74 2 Application of the extended model To explore the performance of the hierarchical extension of the context sensitive RMC, the model was trained on both the original, systematic version of the knowledge partitioning experiments, and on a version with the context variables randomly permuted. [sent-181, score-1.42]
75 As expected, in the systematic condition the model notices the fact that the context varies systematically as a function of the feature values x, and learns to form context specific clusters. [sent-183, score-1.109]
76 Indeed, 97% of the posterior distribution over z is absorbed by the four cluster solution (or other solutions that are sufficiently similar in the sense discussed earlier). [sent-184, score-0.344]
77 In the process, the model infers that γ is small and generalizes in a context specific way (as per Figure 3). [sent-185, score-0.497]
78 Nevertheless, without changing any parameter values, the same model in the randomized condition infers that there is no pattern to the context variable, which ends up being randomly scattered across the clusters. [sent-186, score-0.558]
79 For this condition 57% of the posterior mass is approximately equivalent to the three cluster solution. [sent-187, score-0.347]
80 As a result, the model infers that γ is large, and generalizes in the context general fashion. [sent-188, score-0.497]
81 When considering the implications of Figure 4, it is clear that the model captures the critical feature of the experiment: the ability to learn when to make context specific generalizations and when not to. [sent-190, score-0.495]
82 Inspection of Figure 4 reveals that in the 7 randomized context condition the posterior distribution over γ does not move all that far above the prior median of 3. [sent-193, score-0.588]
83 If one were to suppose that people had no inherent prior biases to prefer to generalize one way or the other, it should follow that the less informative condition (i. [sent-196, score-0.3]
84 Empirically, the reverse is true: in the less informative condition, all participants generalize in a context general fashion; whereas in the more informative condition (i. [sent-199, score-0.678]
85 , systematic context) some but not all participants learn to generalize more narrowly. [sent-201, score-0.357]
86 This does not pose any inherent difficulty for the model, but it does suggest that the “unbiased” prior chosen for this demonstration is not quite right: people do appear to have strong prior biases to prefer context general representations. [sent-202, score-0.617]
87 5 Discussion The hierarchical Bayesian model outlined in this paper explains how human conceptual learning can be context general in some situations, and context sensitive in others. [sent-204, score-1.095]
88 This success leads to an interesting question: why does ALCOVE [21] not account for knowledge partitioning (see [4])? [sent-206, score-0.308]
89 On the basis of these similarities, one might expect similar behavior from ALCOVE and the context sensitive RMC. [sent-208, score-0.474]
90 In ALCOVE, as in many connectionist models, the dimensional biases are chosen to optimize the ability to predict the category label. [sent-211, score-0.277]
91 Since the context variable is not correlated with the label in these experiments (by construction), ALCOVE learns to ignore the context variable in all cases. [sent-212, score-0.906]
92 The approach taken by the RMC is qualitatively different: it looks for clusters of items where the label, the context and the feature values are all similar to one another. [sent-213, score-0.665]
93 Knowledge partitioning experiments more or less require that such clusters exist, so the RMC can learn that the context variable is not distributed randomly. [sent-214, score-0.697]
94 In short, ALCOVE treats context as important only if it can predict the label; the RMC treats the context as important if it helps the learner infer the structure of the world. [sent-215, score-0.9]
95 If fire fighters observe a very different distribution of fires in the context of back-burns than in the context of to-be-controlled fires, then it should be no surprise that they acquire two distinct theories of “fires”, each bound to a different context. [sent-220, score-0.829]
96 Although this particular example is a case in which the learned context specificity is incorrect, it takes only a minor shift to make the behavior correct. [sent-221, score-0.388]
97 If the distinction were between fires observed in a forest context and fires observed in a tyre yard, context specific category representations suddenly seem very sensible. [sent-223, score-0.978]
98 Similarly, social categories such as “polite behavior” are necessarily highly context dependent, so it makes sense that the learner would construct different rules for different contexts. [sent-224, score-0.506]
99 If the world presents the learner with observations that vary systematically across contexts, partitioning knowledge by context would seem to be a rational learning strategy. [sent-225, score-0.916]
100 Rational approximations to rational models: Alternative algorithms for category learning. [sent-377, score-0.271]
wordName wordTfidf (topN-words)
[('context', 0.388), ('rmc', 0.367), ('nk', 0.249), ('cluster', 0.232), ('partitioning', 0.215), ('category', 0.202), ('items', 0.183), ('participants', 0.18), ('alcove', 0.171), ('zi', 0.166), ('atrium', 0.122), ('systematic', 0.117), ('city', 0.102), ('clusters', 0.094), ('knowledge', 0.093), ('res', 0.093), ('sensitive', 0.086), ('generalizations', 0.077), ('people', 0.075), ('psychological', 0.075), ('ghters', 0.073), ('lewandowsky', 0.073), ('sanborn', 0.073), ('hierarchical', 0.073), ('rational', 0.069), ('learns', 0.068), ('contexts', 0.067), ('human', 0.066), ('extent', 0.066), ('posterior', 0.065), ('conceptual', 0.064), ('learner', 0.062), ('label', 0.062), ('cognition', 0.062), ('panel', 0.06), ('generalize', 0.06), ('sampler', 0.058), ('learners', 0.058), ('categories', 0.056), ('cognitive', 0.055), ('assignments', 0.055), ('idealized', 0.055), ('psychology', 0.054), ('stimuli', 0.053), ('acquire', 0.053), ('observations', 0.051), ('stimulus', 0.05), ('exemplar', 0.05), ('modules', 0.05), ('rand', 0.05), ('assigned', 0.05), ('insensitive', 0.05), ('bayesian', 0.05), ('condition', 0.05), ('bundles', 0.049), ('unsatisfying', 0.049), ('ith', 0.049), ('solutions', 0.047), ('transfer', 0.047), ('xk', 0.046), ('randomized', 0.046), ('categorization', 0.046), ('panels', 0.045), ('deemed', 0.045), ('infers', 0.044), ('ci', 0.043), ('partition', 0.043), ('biases', 0.043), ('adelaide', 0.043), ('perfors', 0.043), ('sensible', 0.042), ('item', 0.04), ('contradictory', 0.039), ('prior', 0.039), ('beta', 0.039), ('systematically', 0.038), ('differentiated', 0.037), ('parsimonious', 0.037), ('organize', 0.037), ('zn', 0.037), ('memory', 0.037), ('adjusted', 0.036), ('linked', 0.036), ('generalizes', 0.035), ('altering', 0.035), ('prefer', 0.033), ('fashion', 0.033), ('re', 0.032), ('counts', 0.032), ('neutral', 0.032), ('outlines', 0.032), ('metropolis', 0.032), ('connectionist', 0.032), ('editors', 0.032), ('partitions', 0.032), ('treats', 0.031), ('rectangles', 0.031), ('varies', 0.03), ('model', 0.03), ('coded', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999893 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
2 0.11033624 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
Author: Katsuhiko Ishiguro, Tomoharu Iwata, Naonori Ueda, Joshua B. Tenenbaum
Abstract: We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks. Our proposed model abstracts observed timevarying object-object relationships into relationships between object clusters. We extend the infinite Hidden Markov model to follow dynamic and time-sensitive changes in the structure of the relational data and to estimate a number of clusters simultaneously. We show the usefulness of the model through experiments with synthetic and real-world data sets.
3 0.10406904 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
Author: Ziv Bar-joseph, Hai-son P. Le
Abstract: Recent studies compare gene expression data across species to identify core and species specific genes in biological systems. To perform such comparisons researchers need to match genes across species. This is a challenging task since the correct matches (orthologs) are not known for most genes. Previous work in this area used deterministic matchings or reduced multidimensional expression data to binary representation. Here we develop a new method that can utilize soft matches (given as priors) to infer both, unique and similar expression patterns across species and a matching for the genes in both species. Our method uses a Dirichlet process mixture model which includes a latent data matching variable. We present learning and inference algorithms based on variational methods for this model. Applying our method to immune response data we show that it can accurately identify common and unique response patterns by improving the matchings between human and mouse genes. 1
4 0.10300761 153 nips-2010-Learning invariant features using the Transformed Indian Buffet Process
Author: Joseph L. Austerweil, Thomas L. Griffiths
Abstract: Identifying the features of objects becomes a challenge when those features can change in their appearance. We introduce the Transformed Indian Buffet Process (tIBP), and use it to define a nonparametric Bayesian model that infers features that can transform across instantiations. We show that this model can identify features that are location invariant by modeling a previous experiment on human feature learning. However, allowing features to transform adds new kinds of ambiguity: Are two parts of an object the same feature with different transformations or two unique features? What transformations can features undergo? We present two new experiments in which we explore how people resolve these questions, showing that the tIBP model demonstrates a similar sensitivity to context to that shown by human learners when determining the invariant aspects of features. 1
5 0.095149234 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
Author: Tim Rogers, Chuck Kalish, Joseph Harrison, Xiaojin Zhu, Bryan R. Gibson
Abstract: When the distribution of unlabeled data in feature space lies along a manifold, the information it provides may be used by a learner to assist classification in a semi-supervised setting. While manifold learning is well-known in machine learning, the use of manifolds in human learning is largely unstudied. We perform a set of experiments which test a human’s ability to use a manifold in a semisupervised learning task, under varying conditions. We show that humans may be encouraged into using the manifold, overcoming the strong preference for a simple, axis-parallel linear boundary. 1
6 0.092419505 223 nips-2010-Rates of convergence for the cluster tree
7 0.086750925 263 nips-2010-Switching state space model for simultaneously estimating state transitions and nonstationary firing rates
8 0.086362042 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
9 0.076882891 63 nips-2010-Distributed Dual Averaging In Networks
10 0.07593789 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
11 0.075127542 261 nips-2010-Supervised Clustering
12 0.071962833 151 nips-2010-Learning from Candidate Labeling Sets
13 0.071374021 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems
14 0.069552332 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
15 0.06760744 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
16 0.06577006 241 nips-2010-Size Matters: Metric Visual Search Constraints from Monocular Metadata
17 0.06572511 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
18 0.063543409 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
19 0.06191396 100 nips-2010-Gaussian Process Preference Elicitation
20 0.060608555 150 nips-2010-Learning concept graphs from text with stick-breaking priors
topicId topicWeight
[(0, 0.189), (1, 0.051), (2, -0.015), (3, 0.012), (4, -0.082), (5, 0.033), (6, 0.01), (7, -0.013), (8, 0.057), (9, 0.003), (10, 0.049), (11, -0.032), (12, -0.024), (13, -0.072), (14, 0.156), (15, -0.065), (16, -0.011), (17, 0.104), (18, 0.018), (19, -0.005), (20, -0.014), (21, -0.017), (22, 0.117), (23, 0.016), (24, 0.038), (25, 0.132), (26, -0.078), (27, -0.038), (28, -0.052), (29, 0.058), (30, 0.031), (31, 0.04), (32, -0.005), (33, 0.046), (34, -0.086), (35, -0.06), (36, -0.039), (37, 0.079), (38, -0.036), (39, 0.073), (40, -0.073), (41, 0.123), (42, -0.006), (43, 0.003), (44, 0.084), (45, 0.045), (46, 0.014), (47, 0.045), (48, 0.113), (49, -0.004)]
simIndex simValue paperId paperTitle
same-paper 1 0.97068375 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
2 0.68447316 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
Author: Ziv Bar-joseph, Hai-son P. Le
Abstract: Recent studies compare gene expression data across species to identify core and species specific genes in biological systems. To perform such comparisons researchers need to match genes across species. This is a challenging task since the correct matches (orthologs) are not known for most genes. Previous work in this area used deterministic matchings or reduced multidimensional expression data to binary representation. Here we develop a new method that can utilize soft matches (given as priors) to infer both, unique and similar expression patterns across species and a matching for the genes in both species. Our method uses a Dirichlet process mixture model which includes a latent data matching variable. We present learning and inference algorithms based on variational methods for this model. Applying our method to immune response data we show that it can accurately identify common and unique response patterns by improving the matchings between human and mouse genes. 1
3 0.66126168 67 nips-2010-Dynamic Infinite Relational Model for Time-varying Relational Data Analysis
Author: Katsuhiko Ishiguro, Tomoharu Iwata, Naonori Ueda, Joshua B. Tenenbaum
Abstract: We propose a new probabilistic model for analyzing dynamic evolutions of relational data, such as additions, deletions and split & merge, of relation clusters like communities in social networks. Our proposed model abstracts observed timevarying object-object relationships into relationships between object clusters. We extend the infinite Hidden Markov model to follow dynamic and time-sensitive changes in the structure of the relational data and to estimate a number of clusters simultaneously. We show the usefulness of the model through experiments with synthetic and real-world data sets.
4 0.56535894 62 nips-2010-Discriminative Clustering by Regularized Information Maximization
Author: Andreas Krause, Pietro Perona, Ryan G. Gomes
Abstract: Is there a principled way to learn a probabilistic discriminative classifier from an unlabeled data set? We present a framework that simultaneously clusters the data and trains a discriminative classifier. We call it Regularized Information Maximization (RIM). RIM optimizes an intuitive information-theoretic objective function which balances class separation, class balance and classifier complexity. The approach can flexibly incorporate different likelihood functions, express prior assumptions about the relative size of different classes and incorporate partial labels for semi-supervised learning. In particular, we instantiate the framework to unsupervised, multi-class kernelized logistic regression. Our empirical evaluation indicates that RIM outperforms existing methods on several real data sets, and demonstrates that RIM is an effective model selection method. 1
5 0.54959333 120 nips-2010-Improvements to the Sequence Memoizer
Author: Jan Gasthaus, Yee W. Teh
Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efficient representation, and inference algorithms operating on the new representation. Our derivations are based on precise definitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1
6 0.52074647 267 nips-2010-The Multidimensional Wisdom of Crowds
8 0.48530492 261 nips-2010-Supervised Clustering
9 0.48321518 223 nips-2010-Rates of convergence for the cluster tree
10 0.46339896 214 nips-2010-Probabilistic Belief Revision with Structural Constraints
11 0.45729923 251 nips-2010-Sphere Embedding: An Application to Part-of-Speech Induction
12 0.4413408 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
13 0.4400087 273 nips-2010-Towards Property-Based Classification of Clustering Paradigms
14 0.43247697 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
15 0.43154597 114 nips-2010-Humans Learn Using Manifolds, Reluctantly
16 0.42286178 215 nips-2010-Probabilistic Deterministic Infinite Automata
17 0.40923724 2 nips-2010-A Bayesian Approach to Concept Drift
18 0.40349862 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
19 0.40252891 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks
20 0.38497829 151 nips-2010-Learning from Candidate Labeling Sets
topicId topicWeight
[(13, 0.043), (17, 0.029), (27, 0.11), (30, 0.117), (35, 0.027), (45, 0.197), (50, 0.078), (52, 0.034), (60, 0.05), (69, 0.145), (77, 0.056), (90, 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.89055645 155 nips-2010-Learning the context of a category
Author: Dan Navarro
Abstract: This paper outlines a hierarchical Bayesian model for human category learning that learns both the organization of objects into categories, and the context in which this knowledge should be applied. The model is fit to multiple data sets, and provides a parsimonious method for describing how humans learn context specific conceptual representations.
2 0.85100806 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
Author: Dahua Lin, Eric Grimson, John W. Fisher
Abstract: We present a novel method for constructing dependent Dirichlet processes. The approach exploits the intrinsic relationship between Dirichlet and Poisson processes in order to create a Markov chain of Dirichlet processes suitable for use as a prior over evolving mixture models. The method allows for the creation, removal, and location variation of component models over time while maintaining the property that the random measures are marginally DP distributed. Additionally, we derive a Gibbs sampling algorithm for model inference and test it on both synthetic and real data. Empirical results demonstrate that the approach is effective in estimating dynamically varying mixture models. 1
3 0.84420484 268 nips-2010-The Neural Costs of Optimal Control
Author: Samuel Gershman, Robert Wilson
Abstract: Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty. 1
4 0.84408039 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
Author: Hongbo Zhou, Qiang Cheng
Abstract: Regularization technique has become a principled tool for statistics and machine learning research and practice. However, in most situations, these regularization terms are not well interpreted, especially on how they are related to the loss function and data. In this paper, we propose a robust minimax framework to interpret the relationship between data and regularization terms for a large class of loss functions. We show that various regularization terms are essentially corresponding to different distortions to the original data matrix. This minimax framework includes ridge regression, lasso, elastic net, fused lasso, group lasso, local coordinate coding, multiple kernel learning, etc., as special cases. Within this minimax framework, we further give mathematically exact definition for a novel representation called sparse grouping representation (SGR), and prove a set of sufficient conditions for generating such group level sparsity. Under these sufficient conditions, a large set of consistent regularization terms can be designed. This SGR is essentially different from group lasso in the way of using class or group information, and it outperforms group lasso when there appears group label noise. We also provide some generalization bounds in a classification setting. 1
5 0.84401679 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
Author: Surya Ganguli, Haim Sompolinsky
Abstract: Recent proposals suggest that large, generic neuronal networks could store memory traces of past input sequences in their instantaneous state. Such a proposal raises important theoretical questions about the duration of these memory traces and their dependence on network size, connectivity and signal statistics. Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network. However a more ethologically relevant scenario is that of sparse input sequences. In this scenario, we show how linear neural networks can essentially perform compressed sensing (CS) of past inputs, thereby attaining a memory capacity that exceeds the number of neurons. This enhanced capacity is achieved by a class of “orthogonal” recurrent networks and not by feedforward networks or generic recurrent networks. We exploit techniques from the statistical physics of disordered systems to analytically compute the decay of memory traces in such networks as a function of network size, signal sparsity and integration time. Alternately, viewed purely from the perspective of CS, this work introduces a new ensemble of measurement matrices derived from dynamical systems, and provides a theoretical analysis of their asymptotic performance. 1
6 0.84264648 117 nips-2010-Identifying graph-structured activation patterns in networks
7 0.83363569 270 nips-2010-Tight Sample Complexity of Large-Margin Learning
8 0.83240813 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
9 0.83140451 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
10 0.83118463 194 nips-2010-Online Learning for Latent Dirichlet Allocation
11 0.83103609 197 nips-2010-Optimal Bayesian Recommendation Sets and Myopically Optimal Choice Query Sets
12 0.82954097 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
13 0.82749575 161 nips-2010-Linear readout from a neural population with partial correlation data
14 0.82732832 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
15 0.82664585 280 nips-2010-Unsupervised Kernel Dimension Reduction
16 0.82575089 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
17 0.8251617 148 nips-2010-Learning Networks of Stochastic Differential Equations
18 0.82505888 220 nips-2010-Random Projection Trees Revisited
19 0.82385123 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
20 0.82337946 63 nips-2010-Distributed Dual Averaging In Networks