acl acl2010 acl2010-243 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yang Liu ; Liang Huang
Abstract: unkown-abstract
Reference: text
sentIndex sentText sentNum sentScore
1 cn ct 1 Introduction The past several years have witnessed rapid advances in syntax-based machine translation, which exploits natural language syntax to guide translation. [sent-3, score-0.329]
2 Depending on the type of input, most ofthese efforts can be divided into two broad categories: (a) string-based systems whose input is a string, which is simultaneously parsed and translated by a synchronous grammar (Wu, 1997; Chiang, 2005; Galley et al. [sent-4, score-0.377]
3 , 2006), and (b) tree-based systems whose input is already a parse tree to be directly converted into a target tree or string (Lin, 2004; Ding and Palmer, 2005; Quirk et al. [sent-5, score-0.421]
4 Compared with their string-based counterparts, tree-based systems offer many attractive features: they are much faster in decoding (linear time vs. [sent-9, score-0.656]
5 cubic time), do not require sophisticated binarization (Zhang et al. [sent-10, score-0.207]
6 , 2006), and can use separate grammars for parsing and translation (e. [sent-11, score-0.383]
7 a context-free grammar for the former and a tree substitution grammar for the latter). [sent-13, score-0.344]
8 However, despite these advantages, most treebased systems suffer from a major drawback: they only use 1-best parse trees to direct translation, which potentially introduces translation mistakes due to parsing errors (Quirk and Corston-Oliver, 2006). [sent-14, score-0.996]
9 This situation becomes worse for resourcepoor source languages without enough Treebank data to train a high-accuracy parser. [sent-15, score-0.049]
10 This problem can be alleviated elegantly by using packed forests (Huang, 2008), which encodes exponentially many parse trees in a polynomial space. [sent-16, score-1.059]
11 , 2008; Mi and Huang, 2008) thus take a packed forest instead of a parse tree as an input. [sent-18, score-0.698]
12 In addition, packed forests could also be used for translation rule extraction, which helps alleviate the propagation of parsing errors into rule set. [sent-19, score-1.305]
13 Forest-based translation can be regarded as a compromise between the string-based and tree-based methods, while com2 Liang Huang Information Sciences Institute University of Southern California lhuang@ i i s . [sent-20, score-0.435]
14 edu bining the advantages of both: decoding is still fast, yet does not commit to a single parse. [sent-21, score-0.647]
15 Surprisingly, translating a forest of millions of trees is even faster than translating 30 individual trees, and offers significantly better translation quality. [sent-22, score-1.078]
16 2 Content Overview This tutorial surveys tree-based and forest-based translation methods. [sent-24, score-0.465]
17 For each approach, we will discuss the two fundamental tasks: decoding, which performs the actual translation, and rule extraction, which learns translation rules from realworld data automatically. [sent-25, score-0.603]
18 Finally, we will in- troduce some more recent developments to treebased and forest-based translation, such as tree sequence based models, tree-to-tree models, joint parsing and translation, and faster decoding algorithms. [sent-26, score-1.125]
19 We will conclude our talk by pointing out some directions for future work. [sent-27, score-0.165]
wordName wordTfidf (topN-words)
[('decoding', 0.355), ('packed', 0.316), ('translation', 0.301), ('treebased', 0.202), ('faster', 0.191), ('forest', 0.182), ('huang', 0.173), ('forests', 0.153), ('quirk', 0.133), ('rule', 0.116), ('elegantly', 0.11), ('lhuang', 0.11), ('yliu', 0.11), ('alleviated', 0.11), ('troduce', 0.11), ('bining', 0.11), ('tree', 0.106), ('tutorial', 0.105), ('advantages', 0.103), ('overview', 0.102), ('trees', 0.102), ('mi', 0.098), ('translating', 0.097), ('realworld', 0.095), ('parse', 0.094), ('asscolcia', 0.09), ('cgeom', 0.09), ('jtuulytor', 0.09), ('witnessed', 0.085), ('binarization', 0.085), ('parsing', 0.082), ('pra', 0.082), ('sciences', 0.08), ('developments', 0.079), ('motivations', 0.079), ('commit', 0.079), ('compromise', 0.076), ('cubic', 0.076), ('ofthese', 0.076), ('liu', 0.075), ('sweden', 0.074), ('mistakes', 0.072), ('southern', 0.072), ('drawback', 0.072), ('uppsala', 0.072), ('grammar', 0.071), ('pointing', 0.07), ('extraction', 0.069), ('propagation', 0.066), ('string', 0.066), ('attractive', 0.065), ('af', 0.065), ('academy', 0.065), ('exponentially', 0.065), ('alleviate', 0.063), ('rapid', 0.062), ('ding', 0.062), ('counterparts', 0.062), ('galley', 0.059), ('sc', 0.059), ('surveys', 0.059), ('regarded', 0.058), ('talk', 0.057), ('polynomial', 0.057), ('millions', 0.056), ('institute', 0.056), ('synchronous', 0.055), ('substitution', 0.055), ('offers', 0.052), ('encodes', 0.052), ('exploits', 0.052), ('suffer', 0.052), ('surprisingly', 0.051), ('chiang', 0.051), ('situation', 0.049), ('converted', 0.049), ('pruning', 0.049), ('errors', 0.049), ('efforts', 0.048), ('learns', 0.047), ('palmer', 0.046), ('sophisticated', 0.046), ('ct', 0.046), ('extensions', 0.046), ('offer', 0.045), ('california', 0.044), ('fundamental', 0.044), ('helps', 0.043), ('guide', 0.043), ('simultaneously', 0.043), ('liang', 0.043), ('broad', 0.042), ('introduces', 0.042), ('translated', 0.042), ('past', 0.041), ('former', 0.041), ('fo', 0.04), ('yang', 0.039), ('popular', 0.039), ('conclude', 0.038)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 243 acl-2010-Tree-Based and Forest-Based Translation
Author: Yang Liu ; Liang Huang
Abstract: unkown-abstract
2 0.28440502 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
3 0.26038891 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: Tree-to-string translation rules are widely used in linguistically syntax-based statistical machine translation systems. In this paper, we propose to use deep syntactic information for obtaining fine-grained translation rules. A head-driven phrase structure grammar (HPSG) parser is used to obtain the deep syntactic information, which includes a fine-grained description of the syntactic property and a semantic representation of a sentence. We extract fine-grained rules from aligned HPSG tree/forest-string pairs and use them in our tree-to-string and string-to-tree systems. Extensive experiments on largescale bidirectional Japanese-English trans- lations testified the effectiveness of our approach.
4 0.21988545 71 acl-2010-Convolution Kernel over Packed Parse Forest
Author: Min Zhang ; Hui Zhang ; Haizhou Li
Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1
5 0.21571934 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
6 0.20147851 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
7 0.15364327 54 acl-2010-Boosting-Based System Combination for Machine Translation
8 0.12914643 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
9 0.12820192 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
10 0.12192748 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
11 0.12014322 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
13 0.10848286 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
14 0.10623505 133 acl-2010-Hierarchical Search for Word Alignment
15 0.10163417 31 acl-2010-Annotation
16 0.10162408 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
17 0.098400764 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
18 0.097624779 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
19 0.096080638 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
20 0.091739014 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
topicId topicWeight
[(0, -0.208), (1, -0.243), (2, 0.05), (3, 0.008), (4, -0.128), (5, -0.074), (6, 0.126), (7, 0.022), (8, -0.258), (9, -0.009), (10, 0.124), (11, -0.058), (12, 0.143), (13, -0.02), (14, -0.016), (15, 0.125), (16, -0.067), (17, -0.009), (18, -0.039), (19, 0.009), (20, 0.091), (21, -0.04), (22, -0.192), (23, -0.088), (24, -0.156), (25, 0.014), (26, 0.02), (27, 0.043), (28, 0.051), (29, -0.071), (30, -0.07), (31, -0.116), (32, -0.074), (33, -0.045), (34, -0.077), (35, -0.17), (36, -0.033), (37, 0.082), (38, 0.085), (39, 0.047), (40, -0.026), (41, 0.009), (42, -0.009), (43, -0.075), (44, 0.008), (45, 0.005), (46, -0.058), (47, 0.017), (48, 0.001), (49, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.96689606 243 acl-2010-Tree-Based and Forest-Based Translation
Author: Yang Liu ; Liang Huang
Abstract: unkown-abstract
2 0.81202203 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: Tree-to-string translation rules are widely used in linguistically syntax-based statistical machine translation systems. In this paper, we propose to use deep syntactic information for obtaining fine-grained translation rules. A head-driven phrase structure grammar (HPSG) parser is used to obtain the deep syntactic information, which includes a fine-grained description of the syntactic property and a semantic representation of a sentence. We extract fine-grained rules from aligned HPSG tree/forest-string pairs and use them in our tree-to-string and string-to-tree systems. Extensive experiments on largescale bidirectional Japanese-English trans- lations testified the effectiveness of our approach.
3 0.73510814 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
4 0.70792365 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
Author: Chris Dyer ; Adam Lopez ; Juri Ganitkevitch ; Jonathan Weese ; Ferhan Ture ; Phil Blunsom ; Hendra Setiawan ; Vladimir Eidelman ; Philip Resnik
Abstract: Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Ferhan Ture University of Maryland fture@cs.umd.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Philip Resnik University of Maryland resnik@umiacs.umd.edu classes in a unified way.1 Although open source decoders for both phraseWe present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.
5 0.5485068 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
6 0.52836406 71 acl-2010-Convolution Kernel over Packed Parse Forest
7 0.49998057 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
8 0.48384684 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
9 0.47726303 54 acl-2010-Boosting-Based System Combination for Machine Translation
10 0.47050813 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
11 0.4411158 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
12 0.40784261 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
13 0.40729949 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
15 0.39651409 67 acl-2010-Computing Weakest Readings
16 0.39245802 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling
17 0.39155838 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
18 0.38501504 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices
19 0.3610667 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
20 0.35021076 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
topicId topicWeight
[(25, 0.13), (42, 0.011), (44, 0.473), (59, 0.111), (73, 0.023), (78, 0.029), (83, 0.065), (98, 0.067)]
simIndex simValue paperId paperTitle
same-paper 1 0.8827495 243 acl-2010-Tree-Based and Forest-Based Translation
Author: Yang Liu ; Liang Huang
Abstract: unkown-abstract
2 0.74701309 165 acl-2010-Learning Script Knowledge with Web Experiments
Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal
Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.
3 0.70705408 210 acl-2010-Sentiment Translation through Lexicon Induction
Author: Christian Scheible
Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.
4 0.46110272 86 acl-2010-Discourse Structure: Theory, Practice and Use
Author: Bonnie Webber ; Markus Egg ; Valia Kordoni
Abstract: unkown-abstract
5 0.42735898 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
Author: Rohit J. Kate ; Yuk Wah Wong
Abstract: unkown-abstract
6 0.40800485 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
7 0.39712381 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features
8 0.39644843 31 acl-2010-Annotation
9 0.39328057 69 acl-2010-Constituency to Dependency Translation with Forests
10 0.38748339 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
11 0.38272268 71 acl-2010-Convolution Kernel over Packed Parse Forest
12 0.38058174 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
13 0.37915736 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
14 0.37405825 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
15 0.37352899 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
16 0.3727026 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
17 0.37260208 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
18 0.3722398 44 acl-2010-BabelNet: Building a Very Large Multilingual Semantic Network
19 0.3692334 224 acl-2010-Talking NPCs in a Virtual Game World
20 0.36919156 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar