nips nips2003 nips2003-147 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xavier Carreras, Lluís Màrquez
Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1
Reference: text
sentIndex sentText sentNum sentScore
1 es Abstract This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. [sent-3, score-0.919]
2 The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. [sent-4, score-1.453]
3 We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. [sent-5, score-0.242]
4 Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. [sent-6, score-0.364]
5 1 Introduction Over the past few years, many machine learning methods have been successfully applied to Natural Language tasks in which phrases of some type have to be recognized. [sent-7, score-0.621]
6 Generally, given an input sentence —as a sequence of words— the task is to predict a bracketing for the sentence representing a structure of phrases, either sequential or hierarchical. [sent-8, score-0.334]
7 For instance, syntactic analysis of Natural Language provides several problems of this type, such as partial parsing tasks [1, 2], or even full parsing [3]. [sent-9, score-0.161]
8 The general approach consists of decomposing the global phrase recognition problem into a number of local learnable subproblems, and infer the global solution from the outcomes of the local subproblems. [sent-10, score-0.923]
9 For chunking problems —in which phrases are sequentially structured— the approach is typically to perform a tagging. [sent-11, score-0.619]
10 In this case, local subproblems include learning whether a word opens, closes, or is inside a phrase of some type (noun phrase, verb phrase, . [sent-12, score-0.898]
11 When hierarchical structure has to be recognized, additional local decisions are required to determine the embedding of phrases, resulting in a more complex inference process which recursively builds the global solution [3, 2, 6, 7]. [sent-16, score-0.157]
12 However, when performing the phrase recognition task, the classifiers are used together and dependently, in the sense that one classifier predictions’ may affect the prediction of another. [sent-20, score-0.739]
13 Indeed, the global performance of a system is measured in terms of precision and recall of the recognized phrases, which, although related, is not the local classification accuracy measure for which the local classifiers are usually trained. [sent-21, score-0.223]
14 The general idea consists of moving the learning strategy from the binary classification setting to a general ranking context into which the global problem can be casted. [sent-23, score-0.204]
15 Crammer and Singer [8] present a label-ranking algorithm, in which several perceptrons receive feedback from the ranking they produce over a training instance. [sent-24, score-0.182]
16 For structured outputs, and motivating this work, Collins [10] introduces a variant of the perceptron for tagging tasks, in which the learning feedback is globally given from the output of the Viterbi decoding algorithm. [sent-27, score-0.153]
17 In this paper we present a global learning strategy for the general task of recognizing phrases in a sentence. [sent-28, score-0.734]
18 We adopt the general phrase recognition strategy of our previous work [6]. [sent-29, score-0.741]
19 Given a sentence, learning is first applied at word level to identify phrase candidates of the solution. [sent-30, score-0.895]
20 Then, learning is applied at a higher-order level in which phrase candidates are scored to discriminate among competing ones. [sent-31, score-0.778]
21 The overall strategy infers the global solution by exploring with learning components a number of plausible solutions. [sent-32, score-0.152]
22 When visiting a sentence, the perceptrons are first used to recognize the set of phrases, and then updated according to the correctness of the global solution. [sent-35, score-0.169]
23 Furthermore, following [11] the final model incorporates voted prediction methods for the perceptrons and the use of kernel functions. [sent-37, score-0.153]
24 1 Phrase Recognition Formalization Let x be a sentence formed by n words xi , with i ranging from 0 to n − 1, belonging to the sentence space X . [sent-40, score-0.409]
25 For instance, in syntactic parsing K may include noun phrases, verb phrases, prepositional phrases and clauses, among others. [sent-42, score-0.693]
26 A phrase, denoted as (s, e)k , is the sequence of consecutive words spanning from word xs to word xe , having s ≤ e, with category k ∈ K. [sent-43, score-0.46]
27 A solution for a phrase recognition problem is a set y of phrases which is coherent with respect to some constraints. [sent-48, score-1.326]
28 For the problem of recognizing sequentially organized phrases, often referred to as chunking, phrases are not allowed to overlap or embed. [sent-50, score-0.604]
29 More generally, for the problem of recognizing phrases organized hierarchically, a solution is a set of phrases which do not overlap but may be embedded. [sent-52, score-1.176]
30 In order to evaluate a phrase recognition system we use the standard measures for recognition tasks: precision (p) —the ratio of recognized phrases that are correct—, recall (r) 2pr —the ratio of correct phrases that are recognized— and their harmonic mean F1 = p+r . [sent-54, score-1.988]
31 2 Recognizing Phrases The mechanism to recognize phrases is described here as a function which, given a sentence x, identifies the set of phrases y of x: R : X → Y. [sent-56, score-1.293]
32 First, we assume a function P which, given a sentence x, identifies a set of candidate phrases, not necessarily coherent, for the sentence, P(x) ⊆ P. [sent-58, score-0.193]
33 Second, we assume a score function which, given a phrase, produces a real-valued prediction indicating the plausability of the phrase for the sentence. [sent-59, score-0.78]
34 Note that the R function constructs the optimal phrase set by evaluating scores of phrase candidates, and, regarding the length of the sentence, there is a quadratic number of possible phrases, that is, the set P. [sent-62, score-1.274]
35 Thus, considering straightforwardly all phrases in P would result in a very expensive exploration. [sent-63, score-0.55]
36 The function P is intended to filter out phrase candidates from P by applying decisions at word level. [sent-64, score-0.916]
37 A simple setting for this function is a start-end classification for each phrase type: each word of the sentence is tested as k-start —if it is likely to start phrases of type k— and as k-end —if it is likely to end phrases type k. [sent-65, score-2.14]
38 Each k-start word xs with each k-end word xe , having s ≤ e, forms the phrase candidates (s, e)k . [sent-66, score-1.136]
39 In general, each classifier will be applied to each word in the sentence, and deciding the best strategy for identifying phrase candidates will depend on the sparseness of phrases in a sentence, the length of phrases and the number of categories. [sent-68, score-2.037]
40 Once the phrase candidates are identified, the optimal coherent phrase set is selected according to (1). [sent-69, score-1.449]
41 Due to its nature, there is no need to explicitly enumerate each possible coherent phrase set, which would result in an exponential exploration. [sent-70, score-0.693]
42 Instead, by guiding the exploration through the problem constraints and using dynamic programming the optimal coherent phrase set can be found in polynomial time over the sentence length. [sent-71, score-0.879]
43 When embedding of phrases is allowed, a cubic-time bottom-up exploration is required [6]. [sent-73, score-0.569]
44 Summarizing, the phrase recognition system is performed in two layers: the identification layer, which filters out phrase candidates in linear time, and the scoring layer, which selects the optimal phrase chunking in quadratic or cubic time. [sent-75, score-2.206]
45 3 Additive Online Learning via Recognition Feedback In this section we describe an online learning strategy for training the learning components of the Phrase Recognizer, namely the start-end classifiers in P and the score function. [sent-76, score-0.242]
46 The learning challenge consists in approximating the functions so as to maximize the global F1 measure on the problem, taking into account that the functions interact. [sent-77, score-0.162]
47 The function P consists of two classifiers per phrase type: the start classifier (hk ) S and the end classifier (hk ). [sent-80, score-0.715]
48 Thus, the P function is formed by a prediction vector for each E classifier, noted as wk or wk , and a unique shared representation function φw which maps a S E word in context into a feature vector. [sent-81, score-0.567]
49 A prediction is computed as hk (x) = wk · φw (x), and S S similarly for the hk , and the sign is taken as the binary classification. [sent-82, score-0.402]
50 The score function E computes a real-valued score for a phrase candidate (s, e)k . [sent-83, score-0.867]
51 We implement this function with a prediction vector wk for each type k ∈ K, and also a shared representation function φp which maps a phrase into a feature vector. [sent-84, score-0.925]
52 The score prediction is then given by the expression: score((s, e)k , x, y) = wk · φp ((s, e)k , x, y). [sent-85, score-0.335]
53 We give the algorithm the name FR-Perceptron since it is a Perceptron-based learning algorithm that approximates the prediction vectors in P as Filters of words, and the score vectors as Rankers of phrases. [sent-88, score-0.205]
54 Given a sentence, it predicts its optimal phrase solution as specified in (1) using the current vectors. [sent-90, score-0.659]
55 As in the traditional Perceptron algorithm, if the predicted phrase set is not perfect the vectors responsible of the incorrect prediction are updated additively. [sent-91, score-0.717]
56 , (xm , y m )}, xi are sentences, y i are solutions in Y • Define: W = {wk , wk , wk |k ∈ K}. [sent-95, score-0.405]
57 By analyzing the dependencies between each function and a solution, we derive a feedback rule which naturally fits the phrase recognition setting. [sent-106, score-0.77]
58 Let y ∗ be the gold set of phrases for a sentence x, and y the set ˆ predicted by the R function. [sent-107, score-0.738]
59 Let goldS(xi , k) and goldE(xi , k) be, respectively, the perfect indicator functions for start and end boundaries of phrases of type k. [sent-108, score-0.667]
60 That is, they return 1 if word xi starts/ends some k-phrase in y ∗ and -1 otherwise. [sent-109, score-0.138]
61 We differentiate three kinds of phrases in order to give feedback to the functions being learned: • Phrases correctly identified: ∀(s, e)k ∈ y ∗ ∩ y : ˆ – Do nothing, since they are correct. [sent-110, score-0.65]
62 Update misclassified boundary words: if (wk · φw (xs ) ≤ 0) then wk = wk + φw (xs ) S S S if (wk · φw (xe ) ≤ 0) then wk = wk + φw (xe ) E E E 2. [sent-112, score-0.787]
63 Update score function, if applied: if (wk · φw (xs ) > 0 ∧ wk · φw (xe ) > 0) then wk = wk + φp ((s, e)k , x, y) S E • Over-predicted phrases: ∀(s, e)k ∈ y \y ∗ : ˆ 1. [sent-113, score-0.678]
64 Update score function: wk = wk − φp ((s, e)k , x, y) 2. [sent-114, score-0.486]
65 Update words misclassified as S or E: if (goldS(xs , k) = −1) then wk = wk − φw (xs ) S S if (goldE(xe , k) = −1) then wk = wk − φw (xe ) E E This feedback models the interaction between the two layers of the recognition process. [sent-115, score-0.975]
66 The start-end layer filters out phrase candidates for the scoring layer. [sent-116, score-0.856]
67 Thus, misclassifying the boundary words of a correct phrase blocks the generation of the candidate and produces a missed phrase. [sent-117, score-0.796]
68 Therefore, we move the start or end prediction vectors toward the misclassified boundary words of a missed phrase. [sent-118, score-0.228]
69 When an incorrect phrase is predicted, we move away the prediction vectors from the start or end words, provided that they are not boundary words of a phrase in the gold solution. [sent-119, score-1.507]
70 Regarding the scoring layer, each category prediction vector is moved toward missed phrases and moved away from over-predicted phrases. [sent-121, score-0.698]
71 It is important to note that this feedback operates only on the basis of the predicted solution y , avoiding to make updates ˆ for every prediction the function has made. [sent-122, score-0.135]
72 Thus, the learning strategy is taking advantage of the recognition process, and concentrates on (i) assigning high scores for the correct phrases and (ii) making the incorrect competing phrases to score lower than the correct ones. [sent-123, score-1.418]
73 As a consequence, this feedback rule tends to approximate the desired behavior of the global R function, that is, to make the summation of the scores of the correct phrase set maximal with respect to other phrase set candidates. [sent-124, score-1.456]
74 This learning strategy is closely related to other recent works on learning ranking functions [10, 8, 9]. [sent-125, score-0.147]
75 4 Experiments on Clause Identification Clause Identification is the problem of recognizing the clauses of a sentence. [sent-127, score-0.173]
76 A clause can be roughly defined as a phrase with a subject, possibly implicit, and a predicate. [sent-128, score-0.782]
77 Clauses in a sentence form a hierarchical structure which constitutes the skeleton of the full syntactic tree. [sent-129, score-0.211]
78 The problem consists of recognizing the set of clauses on the basis of words, part-of-speech tags (PoS), and syntactic base phrases (or chunks). [sent-132, score-0.807]
79 There is only one category of phrases to be considered, namely the clauses. [sent-133, score-0.576]
80 Representation Functions We now describe the representation functions φw and φp , which respectively map a word or a phrase and their local context into a feature vector in {0, 1}n . [sent-135, score-0.809]
81 For the function φw (xi ) we capture the form, PoS and chunk tags of words in a window around xi , that is, words xi+l with l ∈ [−Lw , +Lw ]. [sent-137, score-0.169]
82 Each attribute type, together with each relative position l and each returned value forms a final binary indicator feature (for instance, “the word at position -2 is that” is a binary feature). [sent-138, score-0.163]
83 Also, we consider the word decisions of the words to the left of xi , that is, binary flags indicating whether the [−Lw , −1] words in the window are starts and/or ends of a phrase. [sent-139, score-0.331]
84 For the function φp (s, e) we represent the context of the phrase by capturing a [−Lp , 0] window of forms, PoS and chunks at the s word, and a separate [0, +Lp ] window at the e word. [sent-140, score-0.703]
85 Furthermore, we represent the (s, e) phrase by evaluating a pattern from s to e which captures the relevant elements in the sentence fragment from word s to word e 2 . [sent-141, score-1.056]
86 The system to train was composed by the start and end functions which identify clause candidates, and a score function for clauses. [sent-144, score-0.359]
87 To train the score classifier, we generated only the phrase candidates formed with all pairs of correct phrase boundaries. [sent-148, score-1.545]
88 The alternative of generating all possible phrases as examples would be more realistic, but infeasible for the learning algorithm since it would produce 1,377,843 examples, with a 98. [sent-150, score-0.572]
89 That is, the training sentences are visited online as in the FR-Perceptron: first, the start-end functions are applied to each word, and according to their positive decisions, phrase examples are generated to train the score function. [sent-153, score-0.895]
90 In this way, the input of the score function is dynamically adapted to the start-end behavior, but a classification feedback is given to each function for each decision taken. [sent-154, score-0.174]
91 4 We trained the perceptron models for up to 20 epochs via the FR-Perceptron algorithm and via classification feedback, either online (CO-VP) or batch (CB-VP). [sent-159, score-0.159]
92 Bottom: given the start-end filters, upper bound on the global F1 (left) and number of proposed phrase candidates (right). [sent-172, score-0.821]
93 The FR-Perceptron model exhibits the desirable filtering behavior for this local decision, which consists in maintaining a very high recall (so that no correct candidates are blocked) while increasing the precision during epochs. [sent-177, score-0.27]
94 The left plot shows the maximum achievable global F1 , assuming a perfect scorer, given the phrases proposed by the start-end functions. [sent-181, score-0.615]
95 Additionally, the right plot depicts the filtering capabilities in terms of the number of phrase candidates produced, out of a total number of 300,511 possible phrases. [sent-182, score-0.756]
96 The FR-Perceptron behavior in the filtering layer is clear: while it maintains a high recall on identifying correct phrases (above 95%), it substantially reduces the number of phrase candidates to explore in the scoring layer, and thus, it progressively simplifies the input to the score function. [sent-183, score-1.605]
97 Far from this behavior, the classification-based models are not sensitive to the global performance in the filtering layer and, although they aggressively reduce the search space, provide only a moderate upper bound on the global F1 . [sent-184, score-0.184]
98 5 Conclusion We have presented a global learning strategy for the general problem of recognizing structures of phrases, in which, typically, several different learning functions interact to explore and recognize the structure. [sent-225, score-0.26]
99 The effectiveness of our method has been empirically proved in the problem of clause identification, where we have shown that a considerable improvement can be obtained by exploiting high-order global dependencies in learning, in contrast to concentrating only on the local subproblems. [sent-226, score-0.237]
100 These results suggest to scale up global learning strategies to more complex problems found in the natural language area (such as full parsing or machine translation), or other structured domains. [sent-227, score-0.175]
wordName wordTfidf (topN-words)
[('phrase', 0.637), ('phrases', 0.55), ('wk', 0.192), ('sentence', 0.167), ('clause', 0.145), ('clauses', 0.119), ('candidates', 0.119), ('word', 0.117), ('score', 0.102), ('xe', 0.084), ('perceptrons', 0.078), ('hk', 0.073), ('feedback', 0.072), ('classi', 0.07), ('chunking', 0.069), ('global', 0.065), ('xs', 0.062), ('recognition', 0.061), ('coherent', 0.056), ('layer', 0.054), ('words', 0.054), ('recognizing', 0.054), ('online', 0.053), ('sentences', 0.05), ('parsing', 0.049), ('scoring', 0.046), ('syntactic', 0.044), ('recognized', 0.044), ('strategy', 0.043), ('identi', 0.043), ('decisions', 0.043), ('prediction', 0.041), ('perceptron', 0.04), ('carreras', 0.04), ('lw', 0.039), ('epochs', 0.038), ('missed', 0.035), ('subproblems', 0.034), ('voted', 0.034), ('ltering', 0.032), ('ranking', 0.032), ('ers', 0.032), ('verb', 0.031), ('pos', 0.031), ('recall', 0.031), ('start', 0.03), ('type', 0.03), ('end', 0.029), ('precision', 0.029), ('lters', 0.029), ('cation', 0.028), ('batch', 0.028), ('functions', 0.028), ('chunks', 0.028), ('local', 0.027), ('development', 0.027), ('recognize', 0.026), ('golde', 0.026), ('golds', 0.026), ('kudo', 0.026), ('punyakanok', 0.026), ('rquez', 0.026), ('sang', 0.026), ('tjong', 0.026), ('upc', 0.026), ('xavier', 0.026), ('candidate', 0.026), ('category', 0.026), ('correct', 0.025), ('shared', 0.025), ('train', 0.025), ('er', 0.025), ('binary', 0.023), ('solution', 0.022), ('learning', 0.022), ('xi', 0.021), ('identifying', 0.021), ('concentrates', 0.021), ('gold', 0.021), ('tags', 0.021), ('svm', 0.021), ('language', 0.021), ('behavior', 0.02), ('layers', 0.02), ('vectors', 0.02), ('consists', 0.019), ('recognizer', 0.019), ('noun', 0.019), ('tagging', 0.019), ('incorrect', 0.019), ('exploration', 0.019), ('lp', 0.019), ('window', 0.019), ('boundary', 0.019), ('tasks', 0.019), ('misclassi', 0.019), ('fragment', 0.018), ('strategies', 0.018), ('collins', 0.017), ('crammer', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
Author: Xavier Carreras, Lluís Màrquez
Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1
2 0.085145742 59 nips-2003-Efficient and Robust Feature Extraction by Maximum Margin Criterion
Author: Haifeng Li, Tao Jiang, Keshu Zhang
Abstract: A new feature extraction criterion, maximum margin criterion (MMC), is proposed in this paper. This new criterion is general in the sense that, when combined with a suitable constraint, it can actually give rise to the most popular feature extractor in the literature, linear discriminate analysis (LDA). We derive a new feature extractor based on MMC using a different constraint that does not depend on the nonsingularity of the within-class scatter matrix Sw . Such a dependence is a major drawback of LDA especially when the sample size is small. The kernelized (nonlinear) counterpart of this linear feature extractor is also established in this paper. Our preliminary experimental results on face images demonstrate that the new feature extractors are efficient and stable. 1
3 0.077579632 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
Author: Xuerui Wang, Rebecca Hutchinson, Tom M. Mitchell
Abstract: We consider learning to classify cognitive states of human subjects, based on their brain activity observed via functional Magnetic Resonance Imaging (fMRI). This problem is important because such classifiers constitute “virtual sensors” of hidden cognitive states, which may be useful in cognitive science research and clinical applications. In recent work, Mitchell, et al. [6,7,9] have demonstrated the feasibility of training such classifiers for individual human subjects (e.g., to distinguish whether the subject is reading an ambiguous or unambiguous sentence, or whether they are reading a noun or a verb). Here we extend that line of research, exploring how to train classifiers that can be applied across multiple human subjects, including subjects who were not involved in training the classifier. We describe the design of several machine learning approaches to training multiple-subject classifiers, and report experimental results demonstrating the success of these methods in learning cross-subject classifiers for two different fMRI data sets. 1
4 0.066924244 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1
5 0.057699062 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting
Author: Stuart Andrews, Thomas Hofmann
Abstract: Learning from ambiguous training data is highly relevant in many applications. We present a new learning algorithm for classification problems where labels are associated with sets of pattern instead of individual patterns. This encompasses multiple instance learning as a special case. Our approach is based on a generalization of linear programming boosting and uses results from disjunctive programming to generate successively stronger linear relaxations of a discrete non-convex problem. 1
6 0.054233618 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments
7 0.053534437 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
8 0.050638679 99 nips-2003-Kernels for Structured Natural Language Data
9 0.048653554 121 nips-2003-Log-Linear Models for Label Ranking
10 0.047990054 102 nips-2003-Large Scale Online Learning
11 0.046890102 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
12 0.0455507 112 nips-2003-Learning to Find Pre-Images
13 0.04365531 164 nips-2003-Ranking on Data Manifolds
14 0.041434586 124 nips-2003-Max-Margin Markov Networks
15 0.040579978 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
16 0.040556174 186 nips-2003-Towards Social Robots: Automatic Evaluation of Human-Robot Interaction by Facial Expression Classification
17 0.039403219 145 nips-2003-Online Classification on a Budget
18 0.039215457 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
19 0.039007083 100 nips-2003-Laplace Propagation
20 0.037260834 65 nips-2003-Extending Q-Learning to General Adaptive Multi-Agent Systems
topicId topicWeight
[(0, -0.124), (1, -0.026), (2, -0.005), (3, -0.081), (4, -0.001), (5, -0.041), (6, -0.038), (7, -0.001), (8, 0.002), (9, 0.034), (10, 0.092), (11, -0.044), (12, -0.039), (13, -0.037), (14, 0.099), (15, -0.012), (16, -0.07), (17, -0.021), (18, 0.016), (19, -0.041), (20, 0.083), (21, -0.049), (22, -0.127), (23, -0.013), (24, -0.046), (25, 0.058), (26, -0.006), (27, -0.086), (28, -0.107), (29, -0.01), (30, 0.051), (31, 0.024), (32, 0.028), (33, 0.059), (34, -0.104), (35, -0.153), (36, -0.051), (37, 0.028), (38, 0.075), (39, -0.139), (40, 0.033), (41, -0.129), (42, -0.107), (43, 0.145), (44, -0.049), (45, -0.135), (46, -0.008), (47, -0.02), (48, 0.041), (49, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.91102743 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
Author: Xavier Carreras, Lluís Màrquez
Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1
2 0.69649708 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1
3 0.62580943 8 nips-2003-A Holistic Approach to Compositional Semantics: a connectionist model and robot experiments
Author: Yuuya Sugita, Jun Tani
Abstract: We present a novel connectionist model for acquiring the semantics of a simple language through the behavioral experiences of a real robot. We focus on the “compositionality” of semantics, a fundamental characteristic of human language, which is the ability to understand the meaning of a sentence as a combination of the meanings of words. We also pay much attention to the “embodiment” of a robot, which means that the robot should acquire semantics which matches its body, or sensory-motor system. The essential claim is that an embodied compositional semantic representation can be self-organized from generalized correspondences between sentences and behavioral patterns. This claim is examined and confirmed through simple experiments in which a robot generates corresponding behaviors from unlearned sentences by analogy with the correspondences between learned sentences and behaviors. 1
4 0.46911073 59 nips-2003-Efficient and Robust Feature Extraction by Maximum Margin Criterion
Author: Haifeng Li, Tao Jiang, Keshu Zhang
Abstract: A new feature extraction criterion, maximum margin criterion (MMC), is proposed in this paper. This new criterion is general in the sense that, when combined with a suitable constraint, it can actually give rise to the most popular feature extractor in the literature, linear discriminate analysis (LDA). We derive a new feature extractor based on MMC using a different constraint that does not depend on the nonsingularity of the within-class scatter matrix Sw . Such a dependence is a major drawback of LDA especially when the sample size is small. The kernelized (nonlinear) counterpart of this linear feature extractor is also established in this paper. Our preliminary experimental results on face images demonstrate that the new feature extractors are efficient and stable. 1
5 0.42793718 99 nips-2003-Kernels for Structured Natural Language Data
Author: Jun Suzuki, Yutaka Sasaki, Eisaku Maeda
Abstract: This paper devises a novel kernel function for structured natural language data. In the field of Natural Language Processing, feature extraction consists of the following two steps: (1) syntactically and semantically analyzing raw data, i.e., character strings, then representing the results as discrete structures, such as parse trees and dependency graphs with part-of-speech tags; (2) creating (possibly high-dimensional) numerical feature vectors from the discrete structures. The new kernels, called Hierarchical Directed Acyclic Graph (HDAG) kernels, directly accept DAGs whose nodes can contain DAGs. HDAG data structures are needed to fully reflect the syntactic and semantic structures that natural language data inherently have. In this paper, we define the kernel function and show how it permits efficient calculation. Experiments demonstrate that the proposed kernels are superior to existing kernel functions, e.g., sequence kernels, tree kernels, and bag-of-words kernels. 1
6 0.42616874 188 nips-2003-Training fMRI Classifiers to Detect Cognitive States across Multiple Human Subjects
7 0.35794893 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
8 0.35471141 112 nips-2003-Learning to Find Pre-Images
9 0.34903604 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting
10 0.34186253 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
11 0.33090541 75 nips-2003-From Algorithmic to Subjective Randomness
12 0.30123895 90 nips-2003-Increase Information Transfer Rates in BCI by CSP Extension to Multi-class
13 0.30016804 102 nips-2003-Large Scale Online Learning
14 0.28612339 100 nips-2003-Laplace Propagation
15 0.28326377 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification
16 0.26284188 156 nips-2003-Phonetic Speaker Recognition with Support Vector Machines
17 0.25987968 3 nips-2003-AUC Optimization vs. Error Rate Minimization
18 0.25954416 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels
19 0.25806072 137 nips-2003-No Unbiased Estimator of the Variance of K-Fold Cross-Validation
20 0.25538325 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms
topicId topicWeight
[(0, 0.062), (11, 0.025), (26, 0.024), (30, 0.019), (33, 0.015), (35, 0.033), (49, 0.01), (53, 0.098), (71, 0.092), (76, 0.057), (85, 0.071), (88, 0.25), (91, 0.109), (99, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.79893422 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
Author: Xavier Carreras, Lluís Màrquez
Abstract: This work presents an architecture based on perceptrons to recognize phrase structures, and an online learning algorithm to train the perceptrons together and dependently. The recognition strategy applies learning in two layers: a filtering layer, which reduces the search space by identifying plausible phrase candidates, and a ranking layer, which recursively builds the optimal phrase structure. We provide a recognition-based feedback rule which reflects to each local function its committed errors from a global point of view, and allows to train them together online as perceptrons. Experimentation on a syntactic parsing problem, the recognition of clause hierarchies, improves state-of-the-art results and evinces the advantages of our global training method over optimizing each function locally and independently. 1
2 0.7847507 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots
Author: Michael J. Quinlan, Stephan K. Chalup, Richard H. Middleton
Abstract: This article addresses the issues of colour classification and collision detection as they occur in the legged league robot soccer environment of RoboCup. We show how the method of one-class classification with support vector machines (SVMs) can be applied to solve these tasks satisfactorily using the limited hardware capacity of the prescribed Sony AIBO quadruped robots. The experimental evaluation shows an improvement over our previous methods of ellipse fitting for colour classification and the statistical approach used for collision detection.
3 0.77385902 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
4 0.6193707 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
Author: Sanjiv Kumar, Martial Hebert
Abstract: In this paper we present Discriminative Random Fields (DRF), a discriminative framework for the classification of natural image regions by incorporating neighborhood spatial dependencies in the labels as well as the observed data. The proposed model exploits local discriminative models and allows to relax the assumption of conditional independence of the observed data given the labels, commonly used in the Markov Random Field (MRF) framework. The parameters of the DRF model are learned using penalized maximum pseudo-likelihood method. Furthermore, the form of the DRF model allows the MAP inference for binary classification problems using the graph min-cut algorithms. The performance of the model was verified on the synthetic as well as the real-world images. The DRF model outperforms the MRF model in the experiments. 1
5 0.61114377 113 nips-2003-Learning with Local and Global Consistency
Author: Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, Bernhard Schölkopf
Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data. 1
6 0.61001313 112 nips-2003-Learning to Find Pre-Images
7 0.60955238 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
8 0.60877401 172 nips-2003-Semi-Supervised Learning with Trees
9 0.60841519 107 nips-2003-Learning Spectral Clustering
10 0.60675943 126 nips-2003-Measure Based Regularization
11 0.606736 3 nips-2003-AUC Optimization vs. Error Rate Minimization
12 0.6053099 143 nips-2003-On the Dynamics of Boosting
13 0.60480845 78 nips-2003-Gaussian Processes in Reinforcement Learning
14 0.60441673 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
15 0.60241514 81 nips-2003-Geometric Analysis of Constrained Curves
16 0.60118079 189 nips-2003-Tree-structured Approximations by Expectation Propagation
17 0.60113961 41 nips-2003-Boosting versus Covering
18 0.60083073 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models
19 0.59933335 132 nips-2003-Multiple Instance Learning via Disjunctive Programming Boosting
20 0.59799737 48 nips-2003-Convex Methods for Transduction