nips nips2013 nips2013-164 knowledge-graph by maker-knowledge-mining

164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

Source: pdf

Author: Nathaniel J. Smith, Noah Goodman, Michael Frank

Abstract: Language users are remarkably good at making inferences about speakers’ intentions in context, and children learning their native language also display substantial skill in acquiring the meanings of unknown words. These two cases are deeply related: Language users invent new terms in conversation, and language learners learn the literal meanings of words based on their pragmatic inferences about how those words are used. While pragmatic inference and word learning have both been independently characterized in probabilistic terms, no current work uniﬁes these two. We describe a model in which language learners assume that they jointly approximate a shared, external lexicon and reason recursively about the goals of others in using this lexicon. This model captures phenomena in word learning and pragmatic inference; it additionally leads to insights about the emergence of communicative systems in conversation and the mechanisms by which pragmatic inferences become incorporated into word meanings. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Learning and using language via recursive pragmatic reasoning about other agents Nathaniel J. [sent-1, score-0.688]

2 Frank Stanford University Abstract Language users are remarkably good at making inferences about speakers’ intentions in context, and children learning their native language also display substantial skill in acquiring the meanings of unknown words. [sent-4, score-0.376]

3 These two cases are deeply related: Language users invent new terms in conversation, and language learners learn the literal meanings of words based on their pragmatic inferences about how those words are used. [sent-5, score-0.912]

4 While pragmatic inference and word learning have both been independently characterized in probabilistic terms, no current work uniﬁes these two. [sent-6, score-0.554]

5 We describe a model in which language learners assume that they jointly approximate a shared, external lexicon and reason recursively about the goals of others in using this lexicon. [sent-7, score-0.636]

6 This model captures phenomena in word learning and pragmatic inference; it additionally leads to insights about the emergence of communicative systems in conversation and the mechanisms by which pragmatic inferences become incorporated into word meanings. [sent-8, score-1.273]

7 Theories of pragmatics frame the process of language comprehension as inference about the generating goal of an utterance given a rational speaker [14, 8, 9]. [sent-12, score-0.587]

8 For example, a listener might reason, “if she had wanted me to think ‘all’ of the cookies, she would have said ‘all’—but she didn’t. [sent-13, score-0.397]

9 But pragmatic reasoning about meaning-in-context relies on stable literal meanings that must themselves be learned. [sent-16, score-0.685]

10 In both adults and children, uncertainty about word meanings is common, and often considering speakers’ pragmatic goals can help to resolve this uncertainty. [sent-17, score-0.755]

11 For example, if a novel word is used in a context containing both a novel and a familiar object, young children can make the inference that the novel word refers to the novel object [22]. [sent-18, score-0.821]

12 1 For adults who are proﬁcient language users, there are also a variety of intriguing cases in which listeners seem to create situation- and task-speciﬁc ways of referring to particular objects. [sent-19, score-0.271]

13 Despite this intersection, there is relatively little work that takes pragmatic reasoning into account when considering language learning in context. [sent-29, score-0.484]

14 Recent work on grounded language learning has attempted to learn large sets of (sometimes relatively complex) word meanings from noisy and ambiguous input (e. [sent-30, score-0.467]

15 And a number of models have begun to formalize the consequences of pragmatic reasoning in situations where limited learning takes place [12, 9, 3, 13]. [sent-33, score-0.376]

16 The goal of our current work is to investigate the possibilities for integrating models of recursive pragmatic reasoning with models of language learning, with the hope of capturing phenomena in both domains. [sent-35, score-0.54]

17 We next simulate ﬁndings on pragmatic inference in one-shot games (replicating previous work). [sent-37, score-0.405]

18 We then build on these results to simulate the results of pragmatic learning in the language acquisition setting where one communicator is uncertain about the lexicon and in iterated communication games where both communicators are uncertain about the lexicon. [sent-38, score-1.182]

19 This agent has a lexicon of associations between words and meanings; speciﬁcally, it assigns each word w a vector of numbers in (0, 1) describing the extent to which this word provides evidence for each possible object2 . [sent-49, score-0.99]

20 To interpret a word, the literal listener simply re-weights their prior expectation about what is referred to using their lexicon’s entry for this word: PL0 (object|word, lexicon) ∝ lexicon(word, object) × Pprior (object). [sent-50, score-0.584]

21 (1) Because of the normalization in this equation, there is a systematic but unimportant symmetry among lexicons; we remove this by assuming the lexicon sums to 1 over objects for each word. [sent-51, score-0.506]

22 Our simpliﬁcation is without loss of generalization, however, because we can interpret our model as marginalizing over such a representation, with our literal Plexicon (object|word) = features P (object|features)Plexicon (features|word). [sent-61, score-0.211]

23 2 there is a great deal of evidence that humans do not use such equilibrium strategies; their behavior in language games (and in other games [5]) can be well-modeled as implementing Sk or Lk for some small k [9]. [sent-62, score-0.266]

24 This resolves one problem, but as soon as we attempt to add uncertainty about the meanings of words to such a model, a new paradox arises. [sent-65, score-0.22]

25 Suppose the listener is a young child who is uncertain about the lexicon their partner is using. [sent-66, score-1.023]

26 This basic structure is captured in previous models of Bayesian word learning [10]. [sent-68, score-0.214]

27 But when combined with the recursive pragmatic model, a new question arises: Given such a listener, what model should the speaker use? [sent-69, score-0.656]

28 But if they do this, then their utterances will provide no data about their lexicon, and there is nothing for the rational listener to learn from observing them. [sent-71, score-0.548]

29 3 One ﬁnal problem is that under this model, when agents switch roles between listener and speaker, there is nothing constraining them to continue using the same language. [sent-72, score-0.597]

30 Optimizing task performance requires my lexicon as a speaker to match your lexicon as a listener and vice-versa, but there is nothing that relates my lexicon as a speaker to my lexicon as a listener, because these never interact. [sent-73, score-2.938]

31 We resolve the problems described above by assuming that speakers and listeners deviate from normative behavior by assuming a conventional lexicon. [sent-77, score-0.249]

32 Speciﬁcally, our ﬁnal convention-based agents assume: (a) There is some single, speciﬁc literal lexicon which everyone should be using, (b) and everyone else knows this lexicon, and believes that I know it as well, (c) but in fact I don’t. [sent-78, score-0.885]

33 These assumptions instantiate a kind of “social anxiety” in which agents are all trying to learn the correct lexicon that they assume everyone else knows. [sent-79, score-0.682]

34 Assumption (a) corresponds to the lexicographer’s illusion: Naive language users will argue vociferously that words have speciﬁc meanings, even though these meanings are unobservable to everyone who purportedly uses them. [sent-80, score-0.362]

35 It also explains why learners speak the language they hear (rather than some private language that they assume listeners will eventually learn): Under assumption (a), observing other speakers’ behavior provides data about not just that speaker’s idiosyncratic lexicon, but the consensus lexicon. [sent-81, score-0.364]

36 Assumption (b) avoids the explosion of hypern -distributions described above: If agent n knows the lexicon, they assume that all lower agents do as well, reducing to the original tractable model without uncertainty. [sent-82, score-0.249]

37 To the extent that a child’s interlocutors do use a stable lexicon and do not fully adapt their speech to accomodate the child’s limitations, these assumptions make a reasonable approximation for the child language learning case. [sent-84, score-0.604]

38 Formally, let an unadorned L and S denote the listener and speaker who follow the above assumptions. [sent-86, score-0.714]

39 If we start from an uncertain listener with a prior over lexicons, then a ﬁrst-level uncertain speaker needs a prior over priors on lexicons, a second-level uncertain listener needs a prior over priors over priors, etc. [sent-88, score-1.249]

40 WL refers to the word learning model of [10]; PI refers to the recursive pragmatic inference model of [9]; PI+U refers to the pragmatic inference model of [3] which includes lexical uncertainty, marginalizes it out, and then recurses. [sent-105, score-1.034]

41 Our current model is referred to here as PI+WL, and combines pragmatic inference with word learning. [sent-106, score-0.554]

42 In particular in the iterated games explored here it consists of S’s previous utterances together with whatever other information L may have about their intended referents (e. [sent-108, score-0.224]

43 By assumption (b), L treats these utterances as samples from the knowledgeable speaker Sn−2 , not S, and thus as being informative about the lexicon. [sent-111, score-0.435]

44 In the remainder of the paper, we apply the model described above to a set of one-shot pragmatic inference games that have been well-studied in linguistics [14, 15] and are addressed by previous one-shot models of pragmatic inference [9, 3]. [sent-115, score-0.745]

45 In our simulations throughout, we somewhat arbitrarily set the recursion depth n = 3 (the minimal value that produces all the qualitative phenomena), λ = 3, and assume that all agents have shared priors on the lexicon and full knowledge of the cost function. [sent-118, score-0.672]

46 While “I ate some 4 An alternative model would have the speaker take the expectation over informativity, instead of the informativity of the expectation, which would correspond to slightly different utility functions. [sent-124, score-0.398]

47 So although “I ate some of the cookies” could in principle be compatible with eating ALL of them, the listener is lead to believe that SOME - BUT- NOT- ALL is the likely state of affairs. [sent-129, score-0.442]

48 The recursive pragmatic reasoning portions of our model capture ﬁndings on scalar implicature in the same manner as previous models [3, 13]. [sent-130, score-0.609]

49 One word is expensive to use, and one is cheap (call them “expensive” and “cheap” for short). [sent-134, score-0.304]

50 Intuitively, there are two possible communicative systems here: a good system where “cheap” referes to COMMON and “expensive” refers to RARE, and a bad system where the opposite holds. [sent-136, score-0.314]

51 not the brakes) because, had he used the brakes, the speaker would have chosen the simpler and shorter (less costly) expression, “Lee stopped the car” [15]. [sent-142, score-0.317]

52 If a listener assigns equal probability to her partner using the good system or the bad system, then their best bet is to estimate PS (word|object) as the average of PS (word|object, good system) and PS (word|object, bad system). [sent-145, score-0.607]

53 In the good system, the utilities of the speaker’s actions are relatively strongly separated compared to the bad system; therefore, a soft-max agent in the bad system has noiser behavior than in the good system, and the behavior in the good system dominates the average. [sent-147, score-0.31]

54 0, the symmetry breaks in the appropriate way: Despite total ignorance about the conventional system, our modeled speakers prefer to use simple words for common referents (PS (“cheap”|COMMON) = 0. [sent-152, score-0.201]

55 [3] report a much stronger preference, which they accomplish by applying further layers of pragmatic recursion on top of these marginal distributions. [sent-160, score-0.337]

56 4 Pragmatics in learning from a knowledgable speaker 4. [sent-162, score-0.317]

57 The acquisition of quantiﬁers like “some” provides a puzzle for most models of word learning: given that in many contexts, the word “some” is used to mean SOME - BUT- NOT- ALL, how do children learn that SOME - BUT- NOT- ALL is not in fact its literal meaning? [sent-164, score-0.679]

58 Our model is able to take scalar implicatures into account when learning, and thus provide a potential solution, congruent with the observation that no known language in fact lexicalizes SOME - BUT- NOT- ALL [21]. [sent-165, score-0.275]

59 ” Essentially, the model reasoned that although it had unambiguous evidence for “some” being used to refer to SOME - BUT- NOT- ALL, this was nonetheless consistent with a literal meaning of SOME - BUT- NOT- ALL OR ALL which had then been pragmatically strengthened. [sent-170, score-0.239]

60 0 Run 2 1 Run 1 2 2 words, 2 objects 3 4 5 6 7 Dialogue turn Run 1 8 9 10 1 2 Run 1 Run 2 3 4 5 3 words, 3 objects 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Dialogue turn Run 2 words Run 2 objects Figure 1: Simulations of two pragmatic agents playing a naming game. [sent-174, score-0.629]

61 From these posteriors we derive the probability P (L understands S) (marginalizing over target objects and word choices), and also depict graphically S’s model of the listener (top row), and L’s actual model (bottom row). [sent-177, score-0.694]

62 ” Simple probabilistic word learning models can produce a similar pattern of ﬁndings [10], but all such models assume that learners retain the mapping between novel word and novel object demonstrated in the experimental situation. [sent-187, score-0.642]

63 75%) that the speaker is referring to the novel object. [sent-191, score-0.406]

64 Nevertheless, this inference is not accompanied by an increased belief that the novel word literally refers to this object. [sent-192, score-0.36]

65 Nevertheless, on repeated exposure to the same novel word, novel object situation, the learner does learn the mapping as part of the lexicon (congruent with other data on repeated training on disambiguation situations [4]). [sent-194, score-0.711]

66 5 Pragmatic reasoning in the absence of conventional meanings 5. [sent-195, score-0.259]

67 For example, adults playing a communication game using only novel symbols with no conventional meaning will typically converge on a set of new conventions which allow them to accomplish their task [11]. [sent-198, score-0.234]

68 From a pure learning perspective this behavior is anomalous, however: Since both agents know perfectly well that there is no existing convention to discover, there is nothing to learn from the other’s behavior. [sent-202, score-0.228]

69 In the ﬁrst run, speaker and listener converge on a sparse and efﬁcient communicative equilibrium, in which “cheap” means COMMON and “expensive” means RARE, while in the second they reach a sub-optimal equilibrium. [sent-209, score-0.844]

70 Right: Proportion of dyads in the Horn implicature game (§5. [sent-227, score-0.243]

71 2) who have converged on the ‘good’ or ‘bad’ lexicons and believe that these are literal meanings. [sent-228, score-0.31]

72 To model such phenomena, we imagine two agents playing the simple referential game introduced in § 2. [sent-229, score-0.273]

73 On each turn the speaker is assigned a target object, utters some word referring to this object, the listener makes a guess at the object, and then, critically, the speaker observes the listener’s guess and the listener receives feedback indicating the correct answer (i. [sent-230, score-1.74]

74 Both agents then update their posterior over lexicons before proceeding to the next trial. [sent-233, score-0.319]

75 As in [19, 7], the speaker and listener remain ﬁxed in the same role throughout. [sent-234, score-0.714]

76 Each agent effectively uses their partner’s behavior as a basis for forming weak beliefs about the underlying lexicon that they assume must exist. [sent-238, score-0.55]

77 And unlike some previous models of emergence across multiple generations of agents [18, 25], this occurs within individual agents in a single dialogue. [sent-240, score-0.379]

78 A stronger example of how pragmatics can create biases in emerging lexicons can be observed by considering a version of this game played in the “cheap”/“expensive”/COMMON/RARE domain introduced in our discussion of Horn implicature (§3. [sent-243, score-0.424]

79 Here, a uniform prior over lexicons, combined with pragmatic reasoning, causes each agent to start out weakly biased towards the associations “cheap” ↔ COMMON, “expensive” ↔ RARE. [sent-245, score-0.36]

80 A fully rational listener who observed an uncertain speaker using words in this manner would therefore discount it as arising from this bias, and conclude that the speaker was, in fact, highly uncertain. [sent-246, score-1.147]

81 When they succeed, they take their success as evidence that the listener was in fact using the good system all along. [sent-249, score-0.452]

82 As a result, dyads in this game end up converging onto a stable system at a rate far above chance, and 7 preferentially onto the ‘good’ system (Figs. [sent-250, score-0.199]

83 In this model, Horn implicatures depend on uncertainty about literal meaning. [sent-253, score-0.294]

84 As the agents gather more data, their uncertainty is reduced, and thus through the course of a dialogue, the implicature is replaced by a belief that “cheap” literally means COMMON (and did all along). [sent-254, score-0.407]

85 To demonstrate this phenomenon, we queried each agent in each simulated dyad about how they would refer to or interpret each object and word, if the two objects were equally common, which cancels the Horn implicature. [sent-255, score-0.207]

86 Depending on the details of the input, it is possible for our convention-based agents to observe pragmatically strengthened uses of scalar terms (e. [sent-259, score-0.26]

87 This occurs because scalar implicature depends only on recursive pragmatic reasoning (§2. [sent-263, score-0.609]

88 But, while our agents are able to use Horn implicatures in their own behaviour (§ 3. [sent-265, score-0.268]

89 2), this happens implicitly as a result of their uncertainty, and our agents do not model the uncertainty of other agents; thus, when they observe other agents using Horn implicatures, they cannot interpret this behavior as arising from an implicature. [sent-266, score-0.432]

90 Our model therefore makes the interesting prediction that all else being equal, uncertainty-based implicatures should over time be more prone to lexicalizing and becoming part of literal meaning than recursion-based implicatures are. [sent-269, score-0.39]

91 6 Conclusion Language learners and language users must consider word meanings both within and across contexts. [sent-270, score-0.533]

92 In the current work we treat agents communicating with one another as assuming that there is a shared conventional lexicon which they both rely on, but with differing degrees of knowledge. [sent-272, score-0.689]

93 They then reason recursively about how this lexicon should be used to convey particular meanings in context. [sent-273, score-0.615]

94 In particular, we consider new explanations of disambiguation in early word learning and the acquisition of quantiﬁers, and demonstrate that our model is capable of developing novel and efﬁcient communicative systems through iterated learning within the context of a single simulated conversation. [sent-275, score-0.524]

95 Our assumptions produce a tractable model, but because they deviate from pure rationality, they must introduce biases, of which we identify two: a tendency for pragmatic speakers and listeners to accentuate useful, sparse patterns in their communicative systems (§5. [sent-276, score-0.613]

96 Our work here takes a ﬁrst step towards joining disparate strands of research that have treated language acquisition and language use as distinct. [sent-282, score-0.282]

97 Accessing the unsaid: The role of scalar alternatives in childrens pragmatic inference. [sent-294, score-0.356]

98 That’s what she (could have) said: How alternative utterances affect language use. [sent-301, score-0.202]

99 Using speakers’ referential intentions to model early cross-situational word learning. [sent-347, score-0.261]

100 Toward a new taxonomy for pragmatic inference: Q-based and r-based implicature. [sent-371, score-0.308]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lexicon', 0.47), ('listener', 0.397), ('speaker', 0.317), ('pragmatic', 0.308), ('word', 0.214), ('agents', 0.173), ('literal', 0.164), ('implicature', 0.154), ('lexicons', 0.146), ('meanings', 0.145), ('communicative', 0.13), ('horn', 0.115), ('language', 0.108), ('object', 0.096), ('implicatures', 0.095), ('utterances', 0.094), ('speakers', 0.091), ('listeners', 0.084), ('dialogue', 0.083), ('pragmatics', 0.071), ('reasoning', 0.068), ('games', 0.065), ('cheap', 0.065), ('partner', 0.063), ('cookies', 0.063), ('disambiguation', 0.063), ('system', 0.055), ('game', 0.053), ('children', 0.052), ('agent', 0.052), ('scalar', 0.048), ('referring', 0.048), ('lexicalization', 0.047), ('referential', 0.047), ('understands', 0.047), ('conventional', 0.046), ('bad', 0.046), ('uncertain', 0.046), ('literally', 0.045), ('ate', 0.045), ('novel', 0.041), ('iterated', 0.041), ('inferences', 0.041), ('words', 0.04), ('everyone', 0.039), ('pragmatically', 0.039), ('ps', 0.038), ('learners', 0.036), ('meaning', 0.036), ('objects', 0.036), ('communicators', 0.036), ('dyads', 0.036), ('informativity', 0.036), ('pprior', 0.036), ('acquisition', 0.035), ('uncertainty', 0.035), ('wl', 0.034), ('emergence', 0.033), ('cognitive', 0.032), ('inference', 0.032), ('adults', 0.031), ('referent', 0.031), ('strands', 0.031), ('goodman', 0.031), ('recursive', 0.031), ('rational', 0.03), ('users', 0.03), ('recursion', 0.029), ('rare', 0.029), ('utterance', 0.029), ('behavior', 0.028), ('refers', 0.028), ('communication', 0.027), ('nothing', 0.027), ('cognition', 0.026), ('child', 0.026), ('pi', 0.025), ('phenomena', 0.025), ('guess', 0.025), ('lexical', 0.025), ('expensive', 0.025), ('run', 0.024), ('marginalizing', 0.024), ('brakes', 0.024), ('congruent', 0.024), ('conversational', 0.024), ('dax', 0.024), ('eggbeater', 0.024), ('hypern', 0.024), ('knowledgeable', 0.024), ('normatively', 0.024), ('plexicon', 0.024), ('pln', 0.024), ('psychonomic', 0.024), ('referents', 0.024), ('skater', 0.024), ('interpret', 0.023), ('goals', 0.022), ('interpretations', 0.021), ('young', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

Author: Nathaniel J. Smith, Noah Goodman, Michael Frank

2 0.14688544 172 nips-2013-Learning word embeddings efficiently with noise-contrastive estimation

Author: Andriy Mnih, Koray Kavukcuoglu

Abstract: Continuous-valued word embeddings learned by neural language models have recently been shown to capture semantic and syntactic information about words very well, setting performance records on several word similarity tasks. The best results are obtained by learning high-dimensional embeddings from very large quantities of data, which makes scalability of the training method a critical factor. We propose a simple and scalable new approach to learning word embeddings based on training log-bilinear models with noise-contrastive estimation. Our approach is simpler, faster, and produces better results than the current state-of-theart method. We achieve results comparable to the best ones reported, which were obtained on a cluster, using four times less data and more than an order of magnitude less computing time. We also investigate several model types and ﬁnd that the embeddings learned by the simpler models perform at least as well as those learned by the more complex ones. 1

3 0.11433491 129 nips-2013-Generalized Random Utility Models with Multiple Types

Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes

Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classiﬁcation of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identiﬁability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1

4 0.11070462 96 nips-2013-Distributed Representations of Words and Phrases and their Compositionality

Author: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean

Abstract: The recently introduced continuous Skip-gram model is an efﬁcient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain signiﬁcant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of “Canada” and “Air” cannot be easily combined to obtain “Air Canada”. Motivated by this example, we present a simple method for ﬁnding phrases in text, and show that learning good vector representations for millions of phrases is possible.

5 0.10062405 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

Abstract: We describe a novel approach for computing collision-free global trajectories for p agents with speciﬁed initial and ﬁnal conﬁgurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with p for several cost functionals. We also show that a specialization of our algorithm can be used for local motion planning by solving the problem of joint optimization in velocity space. 1

6 0.086719282 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

7 0.078325786 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

8 0.075667597 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer

9 0.075301446 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

10 0.063528173 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

11 0.054193307 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

12 0.051251486 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

13 0.049183469 12 nips-2013-A Novel Two-Step Method for Cross Language Representation Learning

14 0.04713792 174 nips-2013-Lexical and Hierarchical Topic Regression

15 0.044157784 51 nips-2013-Bayesian entropy estimation for binary spike train data using parametric prior knowledge

16 0.043849979 98 nips-2013-Documents as multiple overlapping windows into grids of counts

17 0.043005023 183 nips-2013-Mapping paradigm ontologies to and from the brain

18 0.04066249 84 nips-2013-Deep Neural Networks for Object Detection

19 0.035024203 5 nips-2013-A Deep Architecture for Matching Short Texts

20 0.033591464 191 nips-2013-Minimax Optimal Algorithms for Unconstrained Linear Optimization

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.098), (1, 0.013), (2, -0.054), (3, -0.04), (4, 0.042), (5, -0.039), (6, 0.031), (7, -0.024), (8, -0.021), (9, 0.022), (10, -0.043), (11, 0.003), (12, -0.004), (13, 0.002), (14, -0.1), (15, 0.068), (16, 0.133), (17, 0.082), (18, 0.048), (19, 0.081), (20, -0.173), (21, -0.125), (22, -0.119), (23, -0.007), (24, -0.074), (25, 0.07), (26, -0.008), (27, 0.096), (28, -0.017), (29, 0.087), (30, -0.05), (31, 0.007), (32, 0.02), (33, -0.051), (34, 0.033), (35, 0.034), (36, 0.03), (37, -0.031), (38, 0.08), (39, 0.027), (40, -0.031), (41, -0.007), (42, -0.022), (43, -0.046), (44, -0.026), (45, -0.012), (46, 0.027), (47, -0.038), (48, -0.019), (49, 0.004)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94962072 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

Author: Nathaniel J. Smith, Noah Goodman, Michael Frank

2 0.6482991 129 nips-2013-Generalized Random Utility Models with Multiple Types

Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes

3 0.64563447 172 nips-2013-Learning word embeddings efficiently with noise-contrastive estimation

Author: Andriy Mnih, Koray Kavukcuoglu

4 0.61069757 96 nips-2013-Distributed Representations of Words and Phrases and their Compositionality

Author: Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, Jeff Dean

5 0.5705356 16 nips-2013-A message-passing algorithm for multi-agent trajectory planning

Author: Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan S. Yedidia

6 0.56019616 248 nips-2013-Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

7 0.51709074 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

8 0.48281506 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion

9 0.42574799 98 nips-2013-Documents as multiple overlapping windows into grids of counts

10 0.41852033 336 nips-2013-Translating Embeddings for Modeling Multi-relational Data

11 0.40764698 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer

12 0.39282599 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation

13 0.38616043 71 nips-2013-Convergence of Monte Carlo Tree Search in Simultaneous Move Games

14 0.37553987 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model

15 0.34341469 12 nips-2013-A Novel Two-Step Method for Cross Language Representation Learning

16 0.34316161 341 nips-2013-Universal models for binary spike patterns using centered Dirichlet processes

17 0.33108762 181 nips-2013-Machine Teaching for Bayesian Learners in the Exponential Family

18 0.32890835 183 nips-2013-Mapping paradigm ontologies to and from the brain

19 0.29888353 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies

20 0.29602602 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.027), (11, 0.354), (16, 0.028), (33, 0.089), (34, 0.102), (36, 0.011), (41, 0.024), (49, 0.021), (56, 0.069), (70, 0.042), (85, 0.039), (89, 0.028), (93, 0.064), (95, 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8002007 198 nips-2013-More Effective Distributed ML via a Stale Synchronous Parallel Parameter Server

Author: Qirong Ho, James Cipar, Henggang Cui, Seunghak Lee, Jin Kyu Kim, Phillip B. Gibbons, Garth A. Gibson, Greg Ganger, Eric Xing

Abstract: We propose a parameter server system for distributed ML, which follows a Stale Synchronous Parallel (SSP) model of computation that maximizes the time computational workers spend doing useful work on ML algorithms, while still providing correctness guarantees. The parameter server provides an easy-to-use shared interface for read/write access to an ML model’s values (parameters and variables), and the SSP model allows distributed workers to read older, stale versions of these values from a local cache, instead of waiting to get them from a central storage. This signiﬁcantly increases the proportion of time workers spend computing, as opposed to waiting. Furthermore, the SSP model ensures ML algorithm correctness by limiting the maximum age of the stale values. We provide a proof of correctness under SSP, as well as empirical results demonstrating that the SSP model achieves faster algorithm convergence on several different ML problems, compared to fully-synchronous and asynchronous schemes. 1

same-paper 2 0.75962001 164 nips-2013-Learning and using language via recursive pragmatic reasoning about other agents

Author: Nathaniel J. Smith, Noah Goodman, Michael Frank

3 0.54035074 36 nips-2013-Annealing between distributions by averaging moments

Author: Roger B. Grosse, Chris J. Maddison, Ruslan Salakhutdinov

Abstract: Many powerful Monte Carlo techniques for estimating partition functions, such as annealed importance sampling (AIS), are based on sampling from a sequence of intermediate distributions which interpolate between a tractable initial distribution and the intractable target distribution. The near-universal practice is to use geometric averages of the initial and target distributions, but alternative paths can perform substantially better. We present a novel sequence of intermediate distributions for exponential families deﬁned by averaging the moments of the initial and target distributions. We analyze the asymptotic performance of both the geometric and moment averages paths and derive an asymptotically optimal piecewise linear schedule. AIS with moment averaging performs well empirically at estimating partition functions of restricted Boltzmann machines (RBMs), which form the building blocks of many deep learning models. 1

4 0.4359265 5 nips-2013-A Deep Architecture for Matching Short Texts

Author: Zhengdong Lu, Hang Li

Abstract: Many machine learning problems can be interpreted as learning for matching two types of objects (e.g., images and captions, users and products, queries and documents, etc.). The matching level of two objects is usually measured as the inner product in a certain feature space, while the modeling effort focuses on mapping of objects from the original space to the feature space. This schema, although proven successful on a range of matching tasks, is insufﬁcient for capturing the rich structure in the matching process of more complicated objects. In this paper, we propose a new deep architecture to more effectively model the complicated matching relations between two objects from heterogeneous domains. More speciﬁcally, we apply this model to matching tasks in natural language, e.g., ﬁnding sensible responses for a tweet, or relevant answers to a given question. This new architecture naturally combines the localness and hierarchy intrinsic to the natural language problems, and therefore greatly improves upon the state-of-the-art models. 1

5 0.43474284 99 nips-2013-Dropout Training as Adaptive Regularization

Author: Stefan Wager, Sida Wang, Percy Liang

Abstract: Dropout and other feature noising schemes control overﬁtting by artiﬁcially corrupting the training data. For generalized linear models, dropout performs a form of adaptive regularization. Using this viewpoint, we show that the dropout regularizer is ﬁrst-order equivalent to an L2 regularizer applied after scaling the features by an estimate of the inverse diagonal Fisher information matrix. We also establish a connection to AdaGrad, an online learning algorithm, and ﬁnd that a close relative of AdaGrad operates by repeatedly solving linear dropout-regularized problems. By casting dropout as regularization, we develop a natural semi-supervised algorithm that uses unlabeled data to create a better adaptive regularizer. We apply this idea to document classiﬁcation tasks, and show that it consistently boosts the performance of dropout training, improving on state-of-the-art results on the IMDB reviews dataset. 1

6 0.43454823 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents

7 0.43198231 201 nips-2013-Multi-Task Bayesian Optimization

8 0.42933106 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

9 0.42933017 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

10 0.42844033 69 nips-2013-Context-sensitive active sensing in humans

11 0.42806825 173 nips-2013-Least Informative Dimensions

12 0.42783305 251 nips-2013-Predicting Parameters in Deep Learning

13 0.42673844 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

14 0.42625839 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization

15 0.42551118 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

16 0.42545217 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

17 0.4253498 114 nips-2013-Extracting regions of interest from biological images with convolutional sparse block coding

18 0.42493755 350 nips-2013-Wavelets on Graphs via Deep Learning

19 0.42471233 64 nips-2013-Compete to Compute

20 0.42465886 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data