nips nips2003 nips2003-191 knowledge-graph by maker-knowledge-mining

191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Source: pdf

Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman

Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Signiﬁcant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of signiﬁcant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proﬁciency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Unsupervised context sensitive language acquisition from a large corpus Zach Solan, David Horn, Eytan Ruppin Sackler Faculty of Exact Sciences Tel Aviv University Tel Aviv, Israel 69978 {rsolan,horn,ruppin}@post. [sent-1, score-0.517]

2 edu Abstract We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. [sent-5, score-0.404]

3 This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. [sent-6, score-0.301]

4 The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). [sent-7, score-0.342]

5 Linguistic constructions are represented by trees composed of signiﬁcant patterns and their associated equivalence classes. [sent-9, score-0.52]

6 An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proﬁciency. [sent-10, score-0.045]

7 The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. [sent-11, score-0.322]

8 1 Introduction A central tenet of generative linguistics is that extensive innate knowledge of grammar is essential to explain the acquisition of language from positive-only data [1, 2]. [sent-12, score-0.553]

9 Here, we explore an alternative hypothesis, according to which syntax is an abstraction that emerges from exposure to language [3], coexisting with the corpus data within the same representational mechanism. [sent-13, score-0.576]

10 Far from parsimonious, the representation we introduce allows partial overlap of linguistic patterns or constructions [4]. [sent-14, score-0.346]

11 The incremental process of acquisition of patterns is driven both by structural similarities and by statistical information inherent in the data, so that frequent strings of similar composition come to be represented by the same pattern. [sent-15, score-0.386]

12 The degree of abstraction of a pattern varies: it may be high, as in the case of a frame with several slots, each occupied by a member of an equivalence class associated with it, or low, as in the extreme case of idioms or formulaic language snippets, where there is no abstraction at all [5, 6]. [sent-16, score-0.85]

13 The acquired patterns represent fully the original data, and, crucially, enable structure-sensitive generalization in the production and the assimilation of unseen examples. [sent-17, score-0.277]

14 Previous approaches to the acquisition of linguistic knowledge, such as n-gram Hidden Markov Models (HMMs) that use raw data, aimed not at grammar induction but rather at expressing the probability of a sentence in terms of the conditional probabilities of its constituents. [sent-18, score-0.657]

15 In comparison, statistical grammar induction methods aim to identify the most probable grammar, given a corpus [7, 8]. [sent-19, score-0.45]

16 Grammar induction methods that do attempt unsupervised learning can be categorized into two classes: those that use corpora tagged with part-of-speech information, and those that work with raw, untagged data. [sent-21, score-0.273]

17 The present work extends an earlier study [13] which offered preliminary results demonstrating the feasibility of unsupervised learning of linguistic knowledge from raw data. [sent-23, score-0.247]

18 Here, we describe a new learning model and its implementation and extensive testing on a large corpus of transcribed spoken language from the CHILDES collection [14] (the larger corpora used in many other computational studies do not focus on children-directed language). [sent-24, score-0.532]

19 Our new results suggest that useful patterns embodying syntactic and semantic knowledge of language can indeed be extracted from untagged corpora in an unsupervised manner. [sent-25, score-0.63]

20 2 The ADIOS model The ADIOS (Automatic DIstillation Of Structure) model has two components: (1) a Representational Data Structure (RDS) graph, and (2) a Pattern Acquisition (PA) algorithm that progressively reﬁnes the RDS in an unsupervised fashion. [sent-26, score-0.084]

21 The PA algorithm aims to detect signiﬁcant patterns (SP): similarly structured sequences of primitives that recur in the corpus. [sent-27, score-0.223]

22 Each SP has an associated equivalence class (EC), which is a set of alternative primitives that may ﬁt into the slot in the SP to construct a given path through the graph (see Figure 1a). [sent-28, score-0.416]

23 The manner whereby the model supports generalization is exempliﬁed in Figure 1c. [sent-29, score-0.064]

24 The algorithm requires neither prior classiﬁcation of the primitives into syntactic categories, nor even a pre-setting of their scope: it can bootstrap itself from a corpus in which all the words have been broken down into their constituent characters. [sent-30, score-0.297]

25 One of the few free parameters in the earlier version of the model, ADIOS1, was the length L of the typical pattern the system was expected to acquire. [sent-31, score-0.083]

26 Although presetting the value of L sufﬁced to learn simple artiﬁcial grammars, it proved to be problematic for natural language corpora. [sent-32, score-0.202]

27 The ADIOS2 model addresses this issue by ﬁrst identifying long signiﬁcant paths (SPATH) in the graph, then analyzing their k-gram statistics to identify short signiﬁcant patterns SP. [sent-35, score-0.35]

28 1 Step 1: identifying a signiﬁcant path For each pathi (sequence of elements e1 → e2 → . [sent-37, score-0.284]

29 → ek ) longer than a given threshold, the algorithm constructs a set P = {p1 , . [sent-40, score-0.172]

30 score S(P) = j s(pathj ), with s(·) deﬁned by eq. [sent-46, score-0.075]

31 This score assesses the likelihood that P captures a signiﬁcant regularity rather than a random ﬂuctuation in the data. [sent-48, score-0.075]

32 The set with the maximal score in a given pass over the corpus is the SPATH. [sent-49, score-0.253]

33 within-sentence index Sentence Number (a) (1) node where 104 (1) BEGIN cat 101 edge (2) 101 (1) is 102 103 (2) (1) (5) (2) (4) 101 (3) that (3) a 102 (3) 104 103 (4) 104 the ? [sent-50, score-0.105]

34 (5) 102 (5) 103 (5) (3) and dog (4) horse (4) (6) (6) (6) END (7) (6) cat (b) where 104 that is a dog ? [sent-51, score-0.691]

35 104 horse and the 104 BEGIN 103 103 102 101 pattern 140: is that a _____ ? [sent-52, score-0.263]

36 Each sentence is depicted by a solid colored line; edge direction is marked by arrows and is labeled by the sentence number and withinsentence index. [sent-58, score-0.17]

37 The sentences in this example join a pattern is that a {dog, cat, horse} ? [sent-59, score-0.239]

38 The abstracted pattern and the equivalence class associated with it are highlighted (edges that belong to sequences not subsumed by this pattern, e. [sent-62, score-0.484]

39 (c) The identiﬁcation of new signiﬁcant patterns is done using the acquired equivalence classes (e. [sent-65, score-0.498]

40 In this manner, the system “bootstraps” itself, recursively distilling more and more complex patterns. [sent-68, score-0.045]

41 This kind of abstraction also supports generalization: the original three sentences (shaded paths) form a pattern with two equivalence classes, which can then potentially generate six new sentences (e. [sent-69, score-0.737]

42 P (ek |ek−1 ) P (2) (pathi ) (1) (3) The algorithm estimates the probabilities of different paths from the respective k-gram statistics (k being the length of the paths in the set under consideration), as per eq. [sent-81, score-0.272]

43 , ek without taking into account their sequential order along the path. [sent-86, score-0.172]

44 3) is a better candidate for identifying signiﬁcant strings, as opposed to mere sets of nodes, because it takes into account the sequence of nodes along the path. [sent-92, score-0.107]

45 2 Step 2: identifying a signiﬁcant pattern Once the SPATH set is determined, the algorithm calculates the degree of cohesion cij for each one of its member sub-paths, according to eq. [sent-94, score-0.213]

46 The sub-path with the highest cscore is now tagged as a Signiﬁcant Pattern. [sent-98, score-0.039]

47 Our experience shows that the two-stage mechanism just described induces coherent equivalence classes, leading to the formation of meaningful short patterns. [sent-99, score-0.274]

48 The new pattern is added as a new vertex to the RDS graph, replacing the elements and edges it subsumes (Figure 1(b)). [sent-100, score-0.124]

49 Note that only those edges of the multi-graph that belong to the detected pattern are rewired; edges that belong to sequences not subsumed by the pattern are left intact. [sent-101, score-0.363]

50 This highly context-sensitive method of pattern abstraction, which is unique to our approach, allows ADIOS to achieve a high degree of representational parsimony without sacriﬁcing generalization power. [sent-102, score-0.186]

51 p(ek ) (4) During the pass over the corpus, the list of equivalence sets is updated continuously; new signiﬁcant patterns are found using the current equivalence classes. [sent-145, score-0.726]

52 For each set of candidate paths, the algorithm tries to ﬁt one or more equivalence classes from the pool it maintains. [sent-146, score-0.355]

53 Because an element can a appear in several classes, the algorithm must check different combinations of equivalence classes. [sent-147, score-0.274]

54 When not all the members appear in an existing set, the algorithm creates a new equivalence class containing only those members that do. [sent-149, score-0.535]

55 Thus, as the algorithm processes more and more text, it bootstraps itself and enriches the RDS graph structure with new SPs and their accompanying equivalence sets. [sent-150, score-0.357]

56 The relationships among the distilled patterns can be visualized in a tree format, with tree depth corresponding to the level of recursion (e. [sent-152, score-0.306]

57 The reciprocal of their ratio, η, is the generalization factor, which can be calculated for each pattern in the RDS graph (e. [sent-157, score-0.197]

58 Patterns whose signiﬁcance score S and generalization factor η are beneath certain thresholds are rejected. [sent-161, score-0.139]

59 The algorithm halts if it processes a given amount of text without ﬁnding a new signiﬁcant pattern or equivalence set (in real language acquisition this process may never stop). [sent-162, score-0.696]

60 3 The test module A collection of patterns distilled from a corpus can be seen as a kind of empirically determined construction grammar; cf. [sent-164, score-0.54]

61 The patterns can eventually become highly abstract, thus endowing the model with an ability to generalize to unseen inputs. [sent-167, score-0.213]

62 Hundreds of such patterns and equivalence classes (underscored) together constitute a concise representation of the raw data. [sent-173, score-0.561]

63 Some of the phrases that can be described/generated by patterns #16555 and #16543 are: let’s change her. [sent-174, score-0.178]

64 None of these sentences appear in the training data, illustrating the ability of ADIOS to generalize. [sent-184, score-0.156]

65 The numbers in parentheses denote the generalization factor η of the patterns and their components (e. [sent-185, score-0.242]

66 , pattern #16555 generates 86% new strings, while pattern #16543 generates 75% new strings). [sent-187, score-0.166]

67 For each non-terminal, the children are scanned from left to right; for each equivalence class (underscored), one member is chosen. [sent-189, score-0.463]

68 duction, generalization is possible, for example, when two equivalence classes are placed next to each other in a pattern, creating new paths among the members of the equivalence classes. [sent-191, score-0.901]

69 In comprehension, generalization can also ensue from partial activation of existing patterns by novel inputs. [sent-192, score-0.309]

70 This function is supported by the test module, designed to process a novel sentence by forming its distributed representation in terms of activities of existing patterns (a similar approach has been proposed for novel object and scene representation in vision [15]). [sent-193, score-0.339]

71 These values, which can be used to support grammaticality judgment, are computed by propagating activation from bottom (the terminals) to top (the patterns) of the RDS. [sent-194, score-0.119]

72 The initial activities aj of the terminals ej are calculated given the novel stimulus s1 , . [sent-195, score-0.176]

73 k P (sl , ej ) log P (sl , ej ) P (sl )P (ej ) (5) where P (sl , ej ) is the joint probability of sl and ej appearing in the same equivalence class, and P (sl ) and P (ej ) are the probabilities of sl and ej appearing in any equivalence class. [sent-200, score-1.308]

74 For an equivalence class, the value propagated upwards is the strongest non-zero activation of its members; for a pattern, it is the average weight of the children nodes, on the condition that all the children were activated by adjacent inputs. [sent-201, score-0.465]

75 Activity propagation continues until it reaches the top nodes of the pattern lattice. [sent-202, score-0.119]

76 When the algorithm encounters a novel word, all the members of the terminal equivalence class contribute a value of = 0. [sent-203, score-0.466]

77 This enables the model to make an educated guess as to the meaning of the unfamiliar word, by considering the patterns that become active. [sent-205, score-0.178]

78 1 Working with real data: the CHILDES’ parents To illustrate the scalability of our method, we describe here brieﬂy the outcome of applying the PA algorithm to a subset of the CHILDES collection [14], which consists of transcribed speech produced by, or directed at, children. [sent-207, score-0.181]

79 Working at a rate of 250 patterns per day, the algorithm identiﬁed 3400 patterns and 3200 equivalence classes, representing the corpus in terms of these elements. [sent-211, score-0.808]

80 The outcome (for some examples, see Figure 2) was encouraging: the algorithm found intuitively signiﬁcant SPs and produced semantically adequate corresponding equivalence sets. [sent-212, score-0.274]

81 2 Novel inputs We have assessed the ability of the ADIOS model to deal with novel inputs by training it on the CHILDES collection and then subjecting it to a grammaticality judgment test, in the form of multiple choice questions used in English as Second Language (ESL) classes. [sent-222, score-0.271]

82 The test consists of 100 three-choice questions; a score lower than 50% is considered pre-intermediate, 50%−70% intermediate, and a score greater than 70% – advanced, with 65% being the average score for the population mentioned. [sent-227, score-0.225]

83 For each of the three choices in a given question, our algorithm provided a grammaticality score. [sent-228, score-0.09]

84 The choice with the highest score was declared as the winner; if two choices received the same top score, the answer was “don’t know”. [sent-229, score-0.075]

85 The algorithm’s performance in this test at different stages of learning is plotted in Figure 3 versus the number of corpus sentences that have been processed. [sent-230, score-0.334]

86 Over the course of training, the proportion of questions that received a deﬁnite answer grew (solid curve), while the proportion of correct answers remained around 60% (dashed curve). [sent-231, score-0.056]

87 The best results were achieved with the ensemble of patterns distilled from two separate runs (two different generalization factors were applied in each run: 0. [sent-232, score-0.31]

88 As a benchmark, we compared the performance of ADIOS in this test with that of a word bi-gram model. [sent-235, score-0.04]

89 The latter was tested using the same procedure as ADIOS, except that signiﬁcant patterns in the bi-gram model were deﬁned as all the word pairs in the corpus (we emphasize that there is no training phase in the bi-gram model, as all the “patterns” are already available in the raw data). [sent-236, score-0.459]

90 ADIOS outperformed the bi-gram model by answering 60% of the questions with 60% hits, compared to 20% of the questions with only 45% hits for the latter (note that chance performance in this test is 33%). [sent-237, score-0.112]

91 Figure 3: The performance of ADIOS2 in an ESL test based on grammaticality judgment, plotted against the number of sentences (paths) scanned during training. [sent-238, score-0.277]

92 The solid curve represents the percentage of questions with a valid answer; the dashed curve shows the percentage of correct answers. [sent-239, score-0.056]

93 4 Concluding remarks The ADIOS model incrementally learns the (morpho)syntax of English from “raw” input by distilling structural regularities (which can be thought of as constructions [16, 4]) from the accrued statistical co-occurrence and contextual cues. [sent-240, score-0.148]

94 On the one hand, this results in larger – but not unmanageable – demands on memory (more patterns need to be stored); on the other hand, crucially, it leads to efﬁcient unsupervised probabilistic learning, and subsequent judicious use, of linguistic knowledge. [sent-245, score-0.362]

95 The ultimate goal of this project is to address the entire spectrum of English syntax-related phenomena (and, eventually, semantics, which, as the construction grammarians hold, is intimately connected to syntax [16, 4]). [sent-246, score-0.123]

96 With respect to some of these, the ADIOS model is already known to behave reasonably: for example, subject-verb agreement (even longrange) is captured properly, due to the conservative structured pattern abstraction. [sent-247, score-0.113]

97 In particular, the treatment of many aspects of syntax such as anaphora, auxiliaries, wh-questions, passive, control, etc. [sent-249, score-0.089]

98 The estimation of stochastic context-free grammars using the Inside-Outside algorithm. [sent-287, score-0.039]

99 Comparing two unsupervised grammar induction systems: Alignment-based learning vs. [sent-310, score-0.356]

100 Learning syntax and meanings through optimization and distributional analysis. [sent-325, score-0.089]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('adios', 0.338), ('equivalence', 0.274), ('pathi', 0.248), ('grammar', 0.214), ('dog', 0.203), ('language', 0.202), ('childes', 0.18), ('horse', 0.18), ('patterns', 0.178), ('corpus', 0.178), ('ek', 0.172), ('sentences', 0.156), ('acquisition', 0.137), ('paths', 0.136), ('rds', 0.135), ('sl', 0.125), ('gon', 0.113), ('members', 0.107), ('cat', 0.105), ('ej', 0.102), ('linguistic', 0.1), ('grammaticality', 0.09), ('syntax', 0.089), ('sentence', 0.085), ('unsupervised', 0.084), ('pattern', 0.083), ('english', 0.083), ('score', 0.075), ('syntactic', 0.074), ('strings', 0.071), ('constructions', 0.068), ('abstraction', 0.068), ('distilled', 0.068), ('esl', 0.068), ('spath', 0.068), ('sps', 0.068), ('transcribed', 0.068), ('cant', 0.065), ('generalization', 0.064), ('member', 0.063), ('raw', 0.063), ('induction', 0.058), ('questions', 0.056), ('na', 0.053), ('go', 0.05), ('judgment', 0.05), ('graph', 0.05), ('children', 0.048), ('class', 0.047), ('corpora', 0.047), ('classes', 0.046), ('distilling', 0.045), ('eat', 0.045), ('formulaic', 0.045), ('ruppin', 0.045), ('schab', 0.045), ('solan', 0.045), ('subsumed', 0.045), ('underscored', 0.045), ('untagged', 0.045), ('module', 0.045), ('primitives', 0.045), ('edges', 0.041), ('word', 0.04), ('representational', 0.039), ('parents', 0.039), ('sp', 0.039), ('signi', 0.039), ('tagged', 0.039), ('grammars', 0.039), ('doe', 0.039), ('aviv', 0.039), ('novel', 0.038), ('collection', 0.037), ('speech', 0.037), ('nodes', 0.036), ('identifying', 0.036), ('ing', 0.036), ('terminals', 0.036), ('ta', 0.036), ('upwards', 0.036), ('tel', 0.036), ('erlbaum', 0.036), ('thought', 0.035), ('candidate', 0.035), ('unseen', 0.035), ('belong', 0.035), ('construction', 0.034), ('winner', 0.033), ('bootstraps', 0.033), ('chicago', 0.033), ('scanned', 0.031), ('cij', 0.031), ('stanford', 0.031), ('tree', 0.03), ('propagated', 0.03), ('conservative', 0.03), ('students', 0.03), ('activation', 0.029), ('wa', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman

2 0.14516844 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments

Author: Yuuya Sugita, Jun Tani

Abstract: We present a novel connectionist model for acquiring the semantics of a simple language through the behavioral experiences of a real robot. We focus on the “compositionality” of semantics, a fundamental characteristic of human language, which is the ability to understand the meaning of a sentence as a combination of the meanings of words. We also pay much attention to the “embodiment” of a robot, which means that the robot should acquire semantics which matches its body, or sensory-motor system. The essential claim is that an embodied compositional semantic representation can be self-organized from generalized correspondences between sentences and behavioral patterns. This claim is examined and conﬁrmed through simple experiments in which a robot generates corresponding behaviors from unlearned sentences by analogy with the correspondences between learned sentences and behaviors. 1

3 0.10483308 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

Author: Noam Shental, Aharon Bar-hillel, Tomer Hertz, Daphna Weinshall

Abstract: Density estimation with Gaussian Mixture Models is a popular generative technique used also for clustering. We develop a framework to incorporate side information in the form of equivalence constraints into the model estimation procedure. Equivalence constraints are deﬁned on pairs of data points, indicating whether the points arise from the same source (positive constraints) or from different sources (negative constraints). Such constraints can be gathered automatically in some learning problems, and are a natural form of supervision in others. For the estimation of model parameters we present a closed form EM procedure which handles positive constraints, and a Generalized EM procedure using a Markov net which handles negative constraints. Using publicly available data sets we demonstrate that such side information can lead to considerable improvement in clustering tasks, and that our algorithm is preferable to two other suggested methods using the same type of side information.

4 0.066924244 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

Author: Xavier Carreras, Lluís Màrquez

Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a ﬁltering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reﬂects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1

5 0.064520389 99 nips-2003-Kernels for Structured Natural Language Data

Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda

Abstract: This paper devises a novel kernel function for structured natural language data. In the ﬁeld of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reﬂect the syntactic and semantic structures that natural language data inherently have. In this paper, we deﬁne the kernel function and show how it permits efﬁcient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1

6 0.064014047 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

7 0.063587591 156 nips-2003-Phonetic Speaker Recognition with Support Vector Machines

8 0.061603956 121 nips-2003-Log-Linear Models for Label Ranking

9 0.057987116 168 nips-2003-Salient Boundary Detection using Ratio Contour

10 0.05741943 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

11 0.051886171 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity

12 0.050783195 75 nips-2003-From Algorithmic to Subjective Randomness

13 0.048720252 112 nips-2003-Learning to Find Pre-Images

14 0.046987724 118 nips-2003-Link Prediction in Relational Data

15 0.046605442 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science

16 0.044271585 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

17 0.044271573 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

18 0.040358573 145 nips-2003-Online Classification on a Budget

19 0.04008228 5 nips-2003-A Classification-based Cocktail-party Processor

20 0.038952693 150 nips-2003-Out-of-Sample Extensions for LLE, Isomap, MDS, Eigenmaps, and Spectral Clustering

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.151), (1, -0.023), (2, 0.017), (3, 0.014), (4, -0.045), (5, -0.047), (6, 0.015), (7, -0.012), (8, 0.015), (9, 0.045), (10, 0.137), (11, -0.036), (12, -0.008), (13, -0.043), (14, 0.121), (15, -0.01), (16, -0.085), (17, -0.063), (18, 0.062), (19, -0.016), (20, 0.031), (21, -0.042), (22, -0.164), (23, -0.018), (24, -0.114), (25, -0.038), (26, -0.018), (27, -0.106), (28, -0.094), (29, 0.122), (30, 0.211), (31, -0.02), (32, 0.032), (33, 0.061), (34, -0.034), (35, -0.2), (36, -0.048), (37, 0.016), (38, -0.013), (39, -0.165), (40, 0.211), (41, 0.051), (42, -0.081), (43, 0.144), (44, -0.073), (45, -0.077), (46, -0.026), (47, 0.066), (48, 0.015), (49, -0.121)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96119839 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman

2 0.83036619 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments

Author: Yuuya Sugita, Jun Tani

3 0.64490348 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

Author: Xavier Carreras, Lluís Màrquez

4 0.53788215 75 nips-2003-From Algorithmic to Subjective Randomness

Author: Thomas L. Griffiths, Joshua B. Tenenbaum

Abstract: We explore the phenomena of subjective randomness as a case study in understanding how people discover structure embedded in noise. We present a rational account of randomness perception based on the statistical problem of model selection: given a stimulus, inferring whether the process that generated it was random or regular. Inspired by the mathematical definition of randomness given by Kolmogorov complexity, we characterize regularity in terms of a hierarchy of automata that augment a finite controller with different forms of memory. We find that the regularities detected in binary sequences depend upon presentation format, and that the kinds of automata that can identify these regularities are informative about the cognitive processes engaged by different formats. 1

5 0.4905991 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science

Author: Woojae Kim, Daniel J. Navarro, Mark A. Pitt, In J. Myung

Abstract: Despite the popularity of connectionist models in cognitive science, their performance can often be diﬃcult to evaluate. Inspired by the geometric approach to statistical model selection, we introduce a conceptually similar method to examine the global behavior of a connectionist model, by counting the number and types of response patterns it can simulate. The Markov Chain Monte Carlo-based algorithm that we constructed Þnds these patterns eﬃciently. We demonstrate the approach using two localist network models of speech perception. 1

6 0.41477549 99 nips-2003-Kernels for Structured Natural Language Data

7 0.31914556 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

8 0.31814504 30 nips-2003-Approximability of Probability Distributions

9 0.31452212 168 nips-2003-Salient Boundary Detection using Ratio Contour

10 0.30585882 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process

11 0.29719457 185 nips-2003-The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity

12 0.28155261 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

13 0.26613703 156 nips-2003-Phonetic Speaker Recognition with Support Vector Machines

14 0.26607981 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering

15 0.25924811 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting

16 0.25048584 118 nips-2003-Link Prediction in Relational Data

17 0.24270821 175 nips-2003-Sensory Modality Segregation

18 0.24171819 165 nips-2003-Reasoning about Time and Knowledge in Neural Symbolic Learning Systems

19 0.23937827 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class

20 0.23793375 43 nips-2003-Bounded Invariance and the Formation of Place Fields

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.039), (11, 0.026), (30, 0.021), (35, 0.035), (53, 0.095), (55, 0.037), (66, 0.096), (71, 0.085), (76, 0.04), (81, 0.011), (82, 0.012), (85, 0.064), (91, 0.08), (99, 0.248)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90510118 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices

Author: Arnab Nilim, Laurent El Ghaoui

Abstract: Optimal solutions to Markov Decision Problems (MDPs) are very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to realworld problems. We propose an algorithm for solving ﬁnite-state and ﬁnite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities. Our algorithm involves a statistically accurate yet numerically efﬁcient representation of uncertainty, via Kullback-Leibler divergence bounds. The worst-case complexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at practically no extra computing cost.

2 0.8812499 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

Author: Michael I. Jordan, Martin J. Wainwright

Abstract: We present a new method for calculating approximate marginals for probability distributions deﬁned by graphs with cycles, based on a Gaussian entropy bound combined with a semideﬁnite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efﬁcient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semideﬁnite relaxations for integer programming [e.g., 3]. 1

same-paper 3 0.86706698 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus

Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman

4 0.69649851 158 nips-2003-Policy Search by Dynamic Programming

Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng

Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a ﬁnite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1

5 0.67332435 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

Author: David Donoho, Victoria Stodden

Abstract: We interpret non-negative matrix factorization geometrically, as the problem of ﬁnding a simplicial cone which contains a cloud of data points and which is contained in the positive orthant. We show that under certain conditions, basically requiring that some of the data are spread across the faces of the positive orthant, there is a unique such simplicial cone. We give examples of synthetic image articulation databases which obey these conditions; these require separated support and factorial sampling. For such databases there is a generative model in terms of ‘parts’ and NMF correctly identiﬁes the ‘parts’. We show that our theoretical results are predictive of the performance of published NMF code, by running the published algorithms on one of our synthetic image articulation databases. 1

6 0.6517328 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias

7 0.64950323 98 nips-2003-Kernel Dimensionality Reduction for Supervised Learning

8 0.63952208 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

9 0.63255453 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects

10 0.63046485 78 nips-2003-Gaussian Processes in Reinforcement Learning

11 0.62609708 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

12 0.62606657 121 nips-2003-Log-Linear Models for Label Ranking

13 0.62384808 189 nips-2003-Tree-structured Approximations by Expectation Propagation

14 0.62345856 30 nips-2003-Approximability of Probability Distributions

15 0.62110466 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

16 0.61058629 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

17 0.60881114 107 nips-2003-Learning Spectral Clustering

18 0.60746145 48 nips-2003-Convex Methods for Transduction

19 0.60669929 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering

20 0.60139704 151 nips-2003-PAC-Bayesian Generic Chaining