emnlp emnlp2011 emnlp2011-49 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Moshe Dubiner ; Yoram Singer
Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.
Reference: text
sentIndex sentText sentNum sentScore
1 Entire Relaxation Path for Maximum Entropy Problems Moshe Dubiner Google mo she @ google . [sent-1, score-0.057]
2 com Abstract We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. [sent-2, score-0.491]
3 This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. [sent-3, score-0.275]
4 We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. [sent-4, score-0.256]
5 We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. [sent-5, score-0.662]
6 The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. [sent-6, score-0.64]
7 We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. [sent-7, score-0.532]
8 Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. [sent-8, score-0.62]
9 1 Introduction Maximum entropy (max-ent) models and its dual counterpart, logistic regression, is a popular and effective tool in numerous natural language processing tasks. [sent-10, score-0.346]
10 The principle of maximum entropy was spelled out explicitly by E. [sent-11, score-0.227]
11 Applications of maximum entropy approach to natural language processing are numerous. [sent-14, score-0.156]
12 A notable example and probably one of the earliest usages and 941 Yoram Singer Google s inger@ google com . [sent-15, score-0.145]
13 generalizations of the maximum entropy principle to language processing is the work of Berger, Della Pietra× 2, and Lafferty (Berger et al. [sent-16, score-0.248]
14 o,r 1m9u9la6t,io Dne lolaf max-ent cast the problem as the task of finding the distribution attaining the highest entropy subject to equality constraints. [sent-20, score-0.334]
15 While this formalism is aesthetic and paves the way to a simple dual in the form of a unique Gibbs distribution (Della Pietra et al. [sent-21, score-0.363]
16 To mitigate these issues, numerous relaxation schemes of the equality constraints have been proposed. [sent-23, score-0.443]
17 A notable recent work by Dudik, Phillips, and Schapire (2007) provided a general constraint-relaxation framework. [sent-24, score-0.052]
18 See also the references therein for an in depth overview of other approaches and generalizations of max-ent. [sent-25, score-0.094]
19 The constraint relaxation surfaces a natural parameter, namely, a relaxation value. [sent-26, score-0.638]
20 The dual form of this free parameter is the regularization value of penalized logistic regression problems. [sent-27, score-0.217]
21 The relaxed maximum-entropy problem setting is the starting point of this paper. [sent-29, score-0.075]
22 In this paper we describe and analyze a framework for efficiently tracking the entire relaxation path of constrained max-ent problems. [sent-30, score-0.502]
23 2 with a generalization in which we discuss the problem of finding a distribution that minimizes the relative entropy to a given prior distribution while satisfying max-norm constraints with respect to an observed distribution. [sent-32, score-0.491]
24 3 we tackle the problem by introducing a re-parametrization in which the Proce Ed iningbsu orfg th ,e S 2c0o1tl1an Cdo,n UfeKr,en Jcuely on 27 E–m31p,ir 2ic0a1l1 M. [sent-34, score-0.088]
25 ec th2o0d1s1 i Ans Nsoactuiartaioln La fonrg Cuaogmep Purtoatcieosnsainlg L,in pgaugies ti 9c4s1–948, unknown distribution is distilled to a single scalar. [sent-36, score-0.168]
26 4 a homotopy between the relaxation parameter and the distribution characterizing parameter. [sent-38, score-0.662]
27 This formulation also reveals an aesthetic symmetry between the prior distribution and the observed distribution. [sent-39, score-0.409]
28 We use the reformulated problem to describe in Secs. [sent-40, score-0.08]
29 5-6 space and time efficient algorithms for tracking the entire relaxation path. [sent-41, score-0.452]
30 Our derivations are based on a compact geometric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. [sent-42, score-0.584]
31 In contrast to common homotopy methods for the Lasso Osborne et al. [sent-43, score-0.231]
32 (2000), our procedure for tracking the max-ent homotopy results in an uncharacteristically low complexity bounds thus renders the approach applicable for large alphabets. [sent-44, score-0.419]
33 e W dee use dth bey sh caolrlitgharnapdh [n] to edrsen,eo . [sent-55, score-0.036]
34 sts T hofe a nll’ hvec dtiomrse p souncahl that, Pjn=1 pj = 1and for all j ∈ [n], pj ≥ 0. [sent-65, score-1.046]
35 We genPeralize thi=s n 1o ationnd to multiplicity weighted vectors. [sent-66, score-0.08]
36 Formally, we say that a vector p with multiplicity m is in the simplex, (p, m) ∈ ∆, if Pjn=1 mjpj = 1, iasn din fo thre ea slli j ∈ [n], pj ≥ 0, ∈a ∆nd, mjP P≥ 0. [sent-67, score-0.793]
37 The generalized relaxed maximum-entropy problem is concerned with obtaining an estimate p, given a prior distribution u and an observed distribution q such that the relative entropy between p and u is as small as possible while p and q are within a given max-norm tolerance. [sent-68, score-0.423]
38 Formally, we cast the follow- ing constrained optimization problem, mpinXj=n1mjpjlog? [sent-69, score-0.044]
39 We derive the dual by introducing Lagrange-Legendre multipliers for each of the constraints appearing in (1). [sent-76, score-0.269]
40 Let αj+ ≥ 0 denote the multiplier for the constraint qj pj 1/ν and αj− ≥ 0 the multiplier for the con−str paint≤ qj pj ≥ 1/ν. [sent-77, score-2.092]
41 I0n adPdition, we use γ as the multiplier −fopr t≥he −c1o/nsνt. [sent-78, score-0.108]
42 γft aers some routine algebraic manipulaPtions we get that the Lagrangian is, − ≤ − − Pjn=1mi ? [sent-80, score-0.121]
43 zero, and get that log +1−αj+γ = 0, which implies that pj ∼ uj? [sent-93, score-0.56]
44 eαj We now employ the fact that (p, m) ∈ ∆∼ ∼to u get that the exact form for pj is . [sent-94, score-0.56]
45 (3) Using (3) in the compPact form of the Lagrangian we obtain the following dual problem mαax−log(Z) −jX=n1mjqjαj+jX=n1mνj|αj| , (4) where Z = Pjn=1 mjujeαj. [sent-97, score-0.148]
46 However, the complementary slackness conditions that are necessary for optimality to hold play an important role in the next section in which we present a reformulation of the relaxed maximum entropy problem. [sent-99, score-0.34]
47 3 Problem Reformulation First note that the primal problem is a strictly convex function over a compact convex domain. [sent-100, score-0.138]
48 Let us now characterize the form of the solution. [sent-102, score-0.048]
49 We partition the set of indices in [n] into three disjoint sets depending on whether the constraint |pj − qj | ≤ 1/ν is active and witsh efothrmer. [sent-103, score-0.433]
50 Concretely, we de−fin qe I− I0 I+ = = = {1 ≤ j ≤ n | pj = qj − 1/ν} < 1/ν} {1 ≤ j ≤ n | pj = qj + 1/ν} {1 ≤ j ≤ n | |pj − qj| (1,1) . [sent-104, score-1.912]
51 We next use the complementary slPackness conditions (see for instance (Boyd and Vandenberghe, 2004)) to further characterize the solution. [sent-108, score-0.09]
52 For any j ∈ I− we must chahavera αj− = e0 t haen dso αj+ ≥ . [sent-109, score-0.04]
53 0 Fthoerr aenfyore j αj ≥ 0, which immediately implies th≥at pj ≥ uj/Z. [sent-110, score-0.505]
54 By d 0ef,i wnihtiiocnh we have that pj = qj 1/ν ≥fo ur j ∈ I− . [sent-111, score-0.974]
55 Combin− µµ ing these two facts we get t/hνat f uj/Z ≤ qj 1/ν for j ∈ I− . [sent-112, score-0.488]
56 Analogous derivation yields t qhat− uj/Z qj + 1/ν for j ∈ I+. [sent-113, score-0.433]
57 Resorting again to the definition of p from (3) we get that pj = uj/Z for j ∈ I0. [sent-115, score-0.56]
58 Since |pj qj | < 1/ν for j ∈ I0 we get ∈th Iat |uj/Z qj | < 1/ν. [sent-116, score-0.921]
wordName wordTfidf (topN-words)
[('pj', 0.505), ('qj', 0.433), ('relaxation', 0.299), ('homotopy', 0.231), ('pjn', 0.216), ('dual', 0.148), ('aesthetic', 0.139), ('entropy', 0.119), ('tracking', 0.119), ('multiplier', 0.108), ('lagrangian', 0.1), ('distilled', 0.092), ('mjpj', 0.092), ('mjuje', 0.092), ('pujj', 0.092), ('multiplicity', 0.08), ('piecewise', 0.08), ('reformulated', 0.08), ('distribution', 0.076), ('relaxed', 0.075), ('dimensional', 0.075), ('reformulation', 0.067), ('symmetry', 0.067), ('jx', 0.063), ('equality', 0.059), ('google', 0.057), ('berger', 0.056), ('satisfying', 0.056), ('characterizing', 0.056), ('della', 0.056), ('get', 0.055), ('generalizations', 0.054), ('notable', 0.052), ('path', 0.05), ('reveals', 0.05), ('characterize', 0.048), ('letters', 0.048), ('introducing', 0.047), ('minimizes', 0.046), ('convex', 0.046), ('compact', 0.046), ('cast', 0.044), ('numerous', 0.044), ('complementary', 0.042), ('tackle', 0.041), ('constraints', 0.041), ('dne', 0.04), ('haen', 0.04), ('iasn', 0.04), ('iat', 0.04), ('thre', 0.04), ('jaynes', 0.04), ('dubiner', 0.04), ('hme', 0.04), ('nll', 0.04), ('opf', 0.04), ('paint', 0.04), ('ppj', 0.04), ('qk', 0.04), ('relaxes', 0.04), ('spect', 0.04), ('surfaces', 0.04), ('therein', 0.04), ('tihthe', 0.04), ('vandenberghe', 0.04), ('prior', 0.039), ('principle', 0.038), ('observed', 0.038), ('maximum', 0.037), ('pietra', 0.036), ('fo', 0.036), ('hofe', 0.036), ('merits', 0.036), ('ric', 0.036), ('ur', 0.036), ('qe', 0.036), ('ther', 0.036), ('dee', 0.036), ('zipf', 0.036), ('attaining', 0.036), ('bounds', 0.036), ('earliest', 0.036), ('equate', 0.036), ('integers', 0.036), ('lasso', 0.036), ('uj', 0.036), ('wit', 0.036), ('zipfian', 0.036), ('gibbs', 0.035), ('logistic', 0.035), ('entire', 0.034), ('regression', 0.034), ('derivations', 0.034), ('routine', 0.033), ('thi', 0.033), ('renders', 0.033), ('spelled', 0.033), ('algebraic', 0.033), ('boyd', 0.033), ('multipliers', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
Author: Moshe Dubiner ; Yoram Singer
Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.
2 0.20232347 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
Author: Chao Shen ; Tao Li
Abstract: In active dual supervision, not only informative examples but also features are selected for labeling to build a high quality classifier with low cost. However, how to measure the informativeness for both examples and feature on the same scale has not been well solved. In this paper, we propose a non-negative matrix factorization based approach to address this issue. We first extend the matrix factorization framework to explicitly model the corresponding relationships between feature classes and examples classes. Then by making use of the reconstruction error, we propose a unified scheme to determine which feature or example a classifier is most likely to benefit from having labeled. Empirical results demonstrate the effectiveness of our proposed methods.
4 0.10866226 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1
5 0.068539731 92 emnlp-2011-Minimally Supervised Event Causality Identification
Author: Quang Do ; Yee Seng Chan ; Dan Roth
Abstract: This paper develops a minimally supervised approach, based on focused distributional similarity methods and discourse connectives, for identifying of causality relations between events in context. While it has been shown that distributional similarity can help identifying causality, we observe that discourse connectives and the particular discourse relation they evoke in context provide additional information towards determining causality between events. We show that combining discourse relation predictions and distributional similarity methods in a global inference procedure provides additional improvements towards determining event causality.
6 0.047104385 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
7 0.045156747 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
8 0.03990148 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
9 0.038480308 129 emnlp-2011-Structured Sparsity in Structured Prediction
10 0.037981167 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
11 0.030275166 96 emnlp-2011-Multilayer Sequence Labeling
12 0.029656135 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
13 0.029289864 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
14 0.028494153 106 emnlp-2011-Predicting a Scientific Communitys Response to an Article
15 0.025604047 81 emnlp-2011-Learning General Connotation of Words using Graph-based Algorithms
16 0.023995325 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
17 0.022616306 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
18 0.022245869 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
19 0.019649258 3 emnlp-2011-A Correction Model for Word Alignments
20 0.019156408 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis
topicId topicWeight
[(0, 0.092), (1, -0.013), (2, -0.011), (3, -0.002), (4, 0.099), (5, -0.026), (6, 0.029), (7, -0.284), (8, -0.073), (9, -0.023), (10, -0.01), (11, -0.105), (12, 0.185), (13, -0.229), (14, -0.059), (15, -0.218), (16, -0.026), (17, -0.039), (18, -0.124), (19, -0.051), (20, -0.066), (21, -0.067), (22, -0.01), (23, 0.0), (24, -0.091), (25, 0.078), (26, 0.017), (27, -0.076), (28, 0.012), (29, 0.02), (30, -0.002), (31, -0.009), (32, 0.093), (33, 0.003), (34, -0.086), (35, -0.188), (36, -0.024), (37, -0.089), (38, -0.091), (39, -0.041), (40, -0.006), (41, 0.022), (42, -0.025), (43, 0.001), (44, -0.05), (45, 0.082), (46, 0.061), (47, -0.159), (48, -0.014), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.98487645 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
Author: Moshe Dubiner ; Yoram Singer
Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.
2 0.66650432 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
3 0.665335 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1
Author: Chao Shen ; Tao Li
Abstract: In active dual supervision, not only informative examples but also features are selected for labeling to build a high quality classifier with low cost. However, how to measure the informativeness for both examples and feature on the same scale has not been well solved. In this paper, we propose a non-negative matrix factorization based approach to address this issue. We first extend the matrix factorization framework to explicitly model the corresponding relationships between feature classes and examples classes. Then by making use of the reconstruction error, we propose a unified scheme to determine which feature or example a classifier is most likely to benefit from having labeled. Empirical results demonstrate the effectiveness of our proposed methods.
5 0.23940262 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
Author: Sebastian Riedel ; Andrew McCallum
Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.
7 0.19302411 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
8 0.18506069 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
9 0.16968803 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
10 0.15941074 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts
11 0.15810837 92 emnlp-2011-Minimally Supervised Event Causality Identification
12 0.12830676 67 emnlp-2011-Hierarchical Verb Clustering Using Graph Factorization
13 0.1271947 12 emnlp-2011-A Weakly-supervised Approach to Argumentative Zoning of Scientific Documents
14 0.12447619 112 emnlp-2011-Refining the Notions of Depth and Density in WordNet-based Semantic Similarity Measures
15 0.12123748 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
16 0.12106523 16 emnlp-2011-Accurate Parsing with Compact Tree-Substitution Grammars: Double-DOP
17 0.11544265 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP
18 0.11059942 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
19 0.10970724 39 emnlp-2011-Discovering Morphological Paradigms from Plain Text Using a Dirichlet Process Mixture Model
20 0.10654036 91 emnlp-2011-Literal and Metaphorical Sense Identification through Concrete and Abstract Context
topicId topicWeight
[(23, 0.061), (36, 0.015), (37, 0.033), (45, 0.052), (54, 0.016), (59, 0.448), (62, 0.029), (64, 0.045), (66, 0.029), (69, 0.053), (79, 0.042), (82, 0.037), (90, 0.014), (96, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.79531223 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
Author: Moshe Dubiner ; Yoram Singer
Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.
2 0.24922767 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
3 0.24911214 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base
Author: Ni Lao ; Tom Mitchell ; William W. Cohen
Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.
4 0.2472088 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
5 0.24595137 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
6 0.24155566 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
7 0.23919272 107 emnlp-2011-Probabilistic models of similarity in syntactic context
8 0.23517497 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
9 0.2345956 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
10 0.23270851 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
11 0.23254959 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
12 0.23114096 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
13 0.22965343 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
14 0.22832111 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
15 0.22684869 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
16 0.22658816 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
17 0.22628374 128 emnlp-2011-Structured Relation Discovery using Generative Models
18 0.22622994 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
19 0.22525957 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
20 0.22520767 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding