emnlp emnlp2010 emnlp2010-14 knowledge-graph by maker-knowledge-mining

14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution


Source: pdf

Author: Fang Kong ; Guodong Zhou

Abstract: This paper proposes a unified framework for zero anaphora resolution, which can be divided into three sub-tasks: zero anaphor detection, anaphoricity determination and antecedent identification. In particular, all the three sub-tasks are addressed using tree kernel-based methods with appropriate syntactic parse tree structures. Experimental results on a Chinese zero anaphora corpus show that the proposed tree kernel-based methods significantly outperform the feature-based ones. This indicates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel-based methods in modeling such structural information. To our best knowledge, this is the first systematic work dealing with all the three sub-tasks in Chinese zero anaphora resolution via a unified framework. Moreover, we release a Chinese zero anaphora corpus of 100 documents, which adds a layer of annotation to the manu- ally-parsed sentences in the Chinese Treebank (CTB) 6.0.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract This paper proposes a unified framework for zero anaphora resolution, which can be divided into three sub-tasks: zero anaphor detection, anaphoricity determination and antecedent identification. [sent-3, score-2.385]

2 Experimental results on a Chinese zero anaphora corpus show that the proposed tree kernel-based methods significantly outperform the feature-based ones. [sent-5, score-0.9]

3 This indicates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel-based methods in modeling such structural information. [sent-6, score-1.188]

4 To our best knowledge, this is the first systematic work dealing with all the three sub-tasks in Chinese zero anaphora resolution via a unified framework. [sent-7, score-0.994]

5 This indicates the prevalence of zero anaphors in Chinese and the necessity of zero anaphora resolution in Chinese anaphora resolution. [sent-24, score-2.209]

6 number or gender) about their possible antecedents, zero anaphora resolution is much more challenging than traditional anaphora resolution. [sent-27, score-1.346]

7 Although Chinese zero anaphora has been widely studied in the linguistics research (Li and Thompson 1979; Li 2004), only a small body of prior work in computational linguistics deals with Chinese zero anaphora resolution (Converse 2006; Zhao and Ng 2007). [sent-28, score-1.773]

8 Moreover, zero anaphor detection, as a critical component for real applications of zero anaphora resolution, has been largely ignored. [sent-29, score-1.77]

9 This is done by assigning anaphoric/non-anaphoric zero anaphora labels to the null constituents in a parse tree. [sent-36, score-0.879]

10 Finally, this paper illustrates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel-based methods in modeling such structural information. [sent-37, score-1.187]

11 Section 2 briefly describes the related work on both zero anaphora resolution and tree kernelbased anaphora resolution. [sent-39, score-1.457]

12 Section 3 introduces the overwhelming problem of zero anaphora in Chinese and our developed Chinese zero anaphora corpus, which is available for research purpose. [sent-40, score-1.624]

13 Section 4 presents our tree kernel-based unified framework in zero anaphora resolution. [sent-41, score-0.939]

14 2 Related Work This section briefly overviews the related work on both zero anaphora resolution and tree kernelbased anaphora resolution. [sent-44, score-1.457]

15 1 Zero anaphora resolution Although zero anaphors are prevalent in many languages, such as Chinese, Japanese and Spanish, there only have a few works on zero anaphora resolution. [sent-46, score-2.156]

16 Zero anaphora resolution in Chinese Converse (2006) developed a Chinese zero anaphora corpus which only deals with zero anaphora category “-NONE- *pro*” for dropped subjects/objects and ignores other categories, such as “-NONE- *PRO*” for non-overt subjects in nonfinite clauses. [sent-47, score-2.198]

17 Besides, Converse (2006) proposed a rule-based method to resolve the anaphoric zero anaphors only. [sent-48, score-0.961]

18 The method did not consider zero anaphor detection and anaphoric identification, and performed zero anaphora resolution using the Hobbs algorithm (Hobbs, 1978), assuming the availability of golden anaphoric zero anaphors and golden parse trees. [sent-49, score-3.318]

19 Instead, Zhao and Ng (2007) proposed featurebased methods to zero anaphora resolution on the same corpus from Convese (2006). [sent-50, score-1.002]

20 However, they only considered zero anaphors with explicit noun phrase referents and discarded those with split an883 tecedents or referring to events. [sent-51, score-0.879]

21 For zero anaphor detection, a simple heuristic rule was employed. [sent-53, score-0.959]

22 Although this rule can recover almost all the zero anaphors, it suffers from very low precision by introducing too many false zero anaphors and thus leads to low performance in anaphoricity determination, much due to the imbalance between positive and negative training examples. [sent-54, score-1.507]

23 They did not perform zero anaphor detection, assuming the availability of golden zero anaphors. [sent-57, score-1.456]

24 However, they assumed that zero anaphors were already detected and each zero anaphor’s grammatical case was already determined by a zero anaphor detector. [sent-60, score-2.156]

25 (2006) explored a machine learning method for the sub-task of antecedent identification using rich syntactic pattern features, assuming the availability of golden anaphoric zero anaphors. [sent-62, score-0.929]

26 (2008) proposed a fully-lexicalized probabilistic model for zero anaphora resolution, which estimated case assignments for the overt case components and the antecedents of zero anaphors simultaneously. [sent-64, score-1.66]

27 For Japanese zero anaphora, we do not see any reports about zero anaphora categories. [sent-66, score-1.209]

28 Moreover, all the above related works we can find on Japanese zero anaphora resolution ignore zero anaphor detection, focusing on either anaphoricity determination or antecedent identification. [sent-67, score-2.495]

29 Zero anaphora resolution in Spanish As the only work we can find, Ferrandez and Peral (2000) proposed a hand-engineered rule-based method for both anaphoricity determination and antecedent identification. [sent-70, score-1.149]

30 (2008) proposed a dynamic-expansion scheme to automatically construct a proper parse tree structure for anaphora resolution of pronouns by taking predicate- and antecedent competitor-related information into consideration. [sent-89, score-0.921]

31 Evaluation on the ACE 2003 corpus showed that the dynamic-expansion scheme can well cover 884 necessary structural information in the parse tree for anaphora resolution of pronouns and the context-sensitive convolution tree kernel much outperformed other tree kernels. [sent-91, score-0.999]

32 3 Task Definition This section introduces the phenomenon of zero anaphora in Chinese and our developed Chinese zero anaphora corpus. [sent-92, score-1.626]

33 1 Zero anaphora in Chinese A zero anaphor is a gap in a sentence, which refers to an entity that supplies the necessary information for interpreting the gap. [sent-94, score-1.322]

34 A zero anaphor can be classified into either ana… … phoric or non-anaphoric, depending on whether it has an antecedent in the discourse. [sent-99, score-1.147]

35 Typically, a zero anaphor is non-anaphoric when it refers to an extra linguistic entity (e. [sent-100, score-0.935]

36 Among the four anaphors in Figure 1, zero anaphors Ф 1 and Ф 4 are non-anaphoric while zero anaphors Ф 2 and Ф 3 are anaphoric, referring to noun phrase “建筑行为/building action” and noun phrase “新 区管委会/new district managing committee” respectively. [sent-103, score-2.11]

37 Chinese zero anaphora resolution is very difficult due to following reasons: 1) Zero anaphors give little hints (e. [sent-104, score-1.374]

38 2) A zero anaphor can be either anaphoric or non-anaphoric. [sent-108, score-1.086]

39 This indicates the necessity of zero anaphor detection, which has been largely ignored in previous research and has proved to be difficult in our later experiments. [sent-113, score-1.001]

40 2 Zero anaphora corpus in Chinese Due to lack of an available zero anaphora corpus for research purpose, we develop a Chinese zero anaphora corpus of 100 documents from CTB 6. [sent-117, score-2.041]

41 Hoping the public availability of this corpus can push the research of zero anaphora resolution in Chinese and other languages. [sent-119, score-0.984]

42 885 anaphora (AZA and ZA indicates anaphoric zero anaphor and zero anaphor respectively) Figure 2 illustrates an example sentence annotated in CTB 6. [sent-120, score-2.441]

43 In our developed corpus, we need to annotate anaphoric zero anaphors using those null constituents with the special tag of “- NONE-”. [sent-122, score-1.006]

44 This suggests the importance of anaphoricity determination in zero anaphora resolution. [sent-126, score-1.187]

45 Table 3 further shows that, among 712 anaphoric zero anaphors, 598 (84%) are intra-sentential and no anaphoric zero anaphors have their antecedents ocTaubrlein3gDtwisorSbesuntieon c>oe10c=fed2asinbtapenfhcoer i. [sent-127, score-1.551]

46 For a non-anaphoric zero anaphor, we replace the null constituent with “E-i NZA”, where i indicates the category of zero anaphora, with “ 1” referring to “-NONE *T*” etc. [sent-129, score-0.883]

47 886 4 Tree Kernel-based Framework This section presents the tree kernel-based unified framework for all the three sub-tasks in zero anaphora resolution. [sent-131, score-0.939]

48 In the tree kernel-based framework, we perform the three sub-tasks, zero anaphor detection, anaphoricity determination and antecedent identification in a pipeline manner. [sent-135, score-1.671]

49 That is, given a zero anaphor candidate Z, the zero anaphor detector is first called to determine whether Z is a zero anaphor or not. [sent-136, score-2.84]

50 1 Zero anaphor detection At the first glance, it seems that a zero anaphor can occur between any two constituents in a parse tree. [sent-143, score-1.575]

51 Fortunately, an exploration of our corpus shows that a zero anaphor always occurs just before a predicate1 phrase node (e. [sent-144, score-0.992]

52 This phenomenon has also been employed in Zhao and Ng (2007) in generating zero anaphor candidates. [sent-147, score-0.963]

53 As shown in Figure 1, zero anaphors may occur immediately to the left of规范/guide, 防止/avoid, 现/appear, 根据 /according to, 结 合 /combine, 台 /promulgate, which cover the four true zero anaphors. [sent-149, score-1.221]

54 Therefore, it is simple but reliable in applying above heuristic rules to generate zero anaphor candidates. [sent-150, score-0.949]

55 Given a zero anaphor candidate, it is critical to construct a proper parse tree structure for tree kernel-based zero anaphor detection. [sent-151, score-2.125]

56 The intuition behind our parser tree structure for zero anaphor detection is to keep the competitive information 出 出 1 The predicate in Chinese can be categorized into verb predicate, noun predicate and preposition predicate. [sent-152, score-1.181]

57 about the predicate phrase node and the zero anaphor candidate as much as possible. [sent-155, score-1.041]

58 Figure 4 shows an example of the parse tree structure corresponding to Figure 1 with the zero anaphor candidate Φ2 in consideration. [sent-158, score-1.093]

59 During training, if a zero anaphor candidate has a counterpart in the same position in the golden standard corpus (either anaphoric or nonanaphoric), a positive instance is generated. [sent-159, score-1.212]

60 During testing, each zero anaphor candidate is presented to the learned zero anaphor detector to determine whether it is a zero anaphor or not. [sent-161, score-2.84]

61 Besides, since a zero anaphor candidate is generated when a predicate phrase node appears, there may be two or more zero anaphor candidates in the same position. [sent-162, score-1.976]

62 However, there is normally one zero anaphor in the same position. [sent-163, score-0.935]

63 Therefore, we just select the one with maximal confidence as the zero anaphor in the position and ignore others, if multiple zero anaphor candidates occur in the same position. [sent-164, score-1.87]

64 Figure 4: An example parse tree structure for zero anaphor detection with the predicate phrase node and the zero anaphor candidate Φ 2 in black 4. [sent-165, score-2.171]

65 2 Anaphoricity determination To determine whether a zero anaphor is anaphoric or not, we limit the parse tree structure between the 887 previous predicate phrase node and the following predicate phrase node. [sent-166, score-1.54]

66 Figure 5 illustrates an example of the parse tree structure for anaphoricity determination, corresponding to Figure 1 with the zero anaphor Φ 2 in consideration. [sent-168, score-1.32]

67 phenomenon Figure 5: An example parse tree structure for anaphoricity determination with the zero anaphor Φ 2 in consideration 4. [sent-169, score-1.48]

68 Figure 6 illustrates an example parse tree structure for antecedent identification, corresponding to Figure 1 with the anaphoric zero anaphor Φ 2 and the antecedent candidate “建筑行为/building action” in consideration. [sent-172, score-1.684]

69 Figure 6: An example parse tree structure for antecedent identification with the anaphoric zero anaphor Φ 2 and the antecedent candidate “建筑行为/building action” in consideration In this paper, we adopt a similar procedure as Soon et al. [sent-173, score-1.713]

70 Besides, since all the anaphoric zero anaphors have their antecedents at most one sentence away, we only consider antecedent candidates which are at most one sentence away. [sent-175, score-1.201]

71 5 Experimentation and Discussion We have systematically evaluated our tree kernelbased unified framework on our developed Chinese zero anaphora corpus, as described in Section 3. [sent-178, score-0.975]

72 Besides, in order to focus on zero anaphor resolution itself and compare with related work, all the experiments are done on golden parse trees provided by CTB 6. [sent-180, score-1.244]

73 1 Experimental results Zero anaphor detection Table 4 gives the performance of zero anaphor detection, which achieves 70. [sent-184, score-1.524]

74 0 8 Table 4: Performance of zero anaphor detection Anaphoricity determination Table 5 gives the performance of anaphoricity determination. [sent-195, score-1.378]

75 It shows that anaphoricity determination on golden zero anaphors achieves very good performance of 89. [sent-196, score-1.307]

76 It also shows that anaphoricity determination on automatic zero anaphor detection achieves 77. [sent-201, score-1.389]

77 In comparison with anaphoricity determination on golden zero anaphors, anaphoricity determination on automatic zero anaphor detection lowers the performance by about 23 in F-measure. [sent-205, score-2.275]

78 This indicates the importance and the necessity for further research in zero anaphor detection. [sent-206, score-0.988]

79 9738 Table 5: Performance of anaphoricity determination Antecedent identification Table 6 gives the performance of antecedent identification given golden zero anaphors. [sent-210, score-1.199]

80 It shows that antecedent identification on golden anaphoric zero anaphors achieves 88. [sent-211, score-1.326]

81 24 in precision, recall and Fmeasure, respectively, with a decrease of about 8% in precision, about 21% in recall and about 18% in F-measure, in comparison with antecedent identification on golden anaphoric zero anaphors. [sent-218, score-0.938]

82 v25e94n golden zero anaphors Overall: zero anaphora resolution Table 7 gives the performance of overall zero anaphora resolution with automatic zero anaphor detection, anaphoricity determination and antecedent identification. [sent-223, score-4.361]

83 In comparison with Table 6, it shows that the errors caused by automatic zero anaphor detection decrease the performance of overall zero anaphora resolution by about 14 in F-measure, in comparison with golden zero anaphors. [sent-228, score-2.456]

84 0 6 Table 7: Performance of zero anaphora resolution Figure 7 shows the learning curve of zero anaphora resolution with the increase of the number of the documents in experimentation, with the horizontal axis the number of the documents used and the vertical axis the F-measure. [sent-232, score-1.986]

85 sentence distances Table 9 shows the detailed performance of zero anaphora resolution over the two major zero anaphora categories, “-NONE- *PRO*” and “NONE- *pro*”. [sent-241, score-1.757]

86 F 803 Table 9: Performance of zero anaphora resolution over major zero anaphora categories 889 5. [sent-246, score-1.757]

87 2 Comparison with previous work As a representative in Chinese zero anaphora resolution, Zhao and Ng (2007) focused on anaphoric- ity determination and antecedent identification using feature-based methods. [sent-247, score-1.216]

88 0, it only deals with the zero anaphors under the zero anaphora category of “-NONE- *pro*” for dropped subjects/objects. [sent-251, score-1.624]

89 Furthermore, Zhao and Ng (2007) only considered zero anaphors with explicit noun phrase referents and discarded zero anaphors with split antecedents (i. [sent-252, score-1.692]

90 As a result, their corpus is only about half of our corpus in the number of zero anaphors and anaphoric zero anaphors. [sent-255, score-1.396]

91 Besides, our corpus deals with all the types of zero anaphors and all the categories of zero anaphora except zero cataphora. [sent-256, score-2.047]

92 For zero anaphor detection, they used a very sim- ple heuristic rule to generate zero anaphor candidates. [sent-258, score-1.894]

93 In comparison, we propose a tree kernel-based unified framework for all the three sub-tasks in zero anaphora resolution. [sent-260, score-0.939]

94 For fair comparison with Zhao and Ng (2007), we duplicate their system and evaluate it on our developed Chinese zero anaphora corpus, using the same J48 decision tree learning algorithm in Weka and the same feature sets for anaphoricity determination and antecedent identification. [sent-263, score-1.504]

95 It also shows that, when our tree kernel-based zero anaphor detector is employed 2 , the feature-based method gets much lower precision with a gap of about 3 1%, although it achieves slightly higher recall. [sent-268, score-1.088]

96 6 Conclusion and Further Work This paper proposes a tree kernel-based unified framework for zero anaphora resolution, which can be divided into three sub-tasks: zero anaphor detection, anaphoricity determination and antecedent identification. [sent-277, score-2.475]

97 2) To our best knowledge, this is the first systematic work dealing with all the three sub-tasks in Chinese zero anaphora resolution via a unified framework. [sent-280, score-0.994]

98 3) Employment of tree kernel-based methods indicates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel methods in modeling such structural information. [sent-281, score-1.308]

99 In the future work, we will systematically evaluate our framework on automatically-generated parse trees, construct more effective parse tree structures for different sub-tasks of zero anaphora resolution, and explore joint learning among the three sub-tasks. [sent-282, score-1.017]

100 Besides, we only consider zero anaphors driven by a verb predicate phrase node in this paper. [sent-283, score-0.919]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('anaphor', 0.524), ('zero', 0.411), ('anaphors', 0.399), ('anaphora', 0.387), ('anaphoricity', 0.228), ('antecedent', 0.212), ('determination', 0.161), ('resolution', 0.161), ('anaphoric', 0.151), ('golden', 0.097), ('tree', 0.09), ('chinese', 0.065), ('zhao', 0.058), ('detection', 0.054), ('parse', 0.051), ('ctb', 0.048), ('identification', 0.045), ('predicate', 0.044), ('convolution', 0.042), ('besides', 0.041), ('ng', 0.04), ('necessity', 0.036), ('unified', 0.035), ('zhou', 0.032), ('featurebased', 0.031), ('kernel', 0.03), ('antecedents', 0.028), ('converse', 0.027), ('node', 0.026), ('structural', 0.026), ('kong', 0.026), ('japanese', 0.025), ('referring', 0.025), ('pro', 0.025), ('critical', 0.024), ('overt', 0.024), ('documents', 0.022), ('precision', 0.021), ('kernelbased', 0.021), ('pronoun', 0.02), ('driven', 0.02), ('yang', 0.02), ('pronouns', 0.02), ('phrase', 0.019), ('null', 0.019), ('verbal', 0.019), ('detector', 0.018), ('layer', 0.018), ('candidate', 0.017), ('indicates', 0.017), ('imbalance', 0.017), ('framework', 0.016), ('action', 0.016), ('deals', 0.016), ('illustrates', 0.016), ('aza', 0.016), ('cbt', 0.016), ('ferrandez', 0.016), ('hints', 0.016), ('iida', 0.016), ('seki', 0.016), ('constructed', 0.015), ('phenomenon', 0.015), ('developed', 0.015), ('attaching', 0.015), ('adds', 0.015), ('id', 0.015), ('heuristic', 0.014), ('soon', 0.014), ('noun', 0.014), ('hobbs', 0.013), ('overwhelming', 0.013), ('standardize', 0.013), ('largely', 0.013), ('availability', 0.013), ('employed', 0.013), ('axis', 0.012), ('qian', 0.012), ('gender', 0.012), ('nominal', 0.012), ('corpus', 0.012), ('coreference', 0.011), ('referents', 0.011), ('contextsensitive', 0.011), ('fmeasure', 0.011), ('isozaki', 0.011), ('largescale', 0.011), ('zelenko', 0.011), ('structures', 0.011), ('recall', 0.011), ('achieves', 0.011), ('experimentation', 0.011), ('subjects', 0.011), ('release', 0.011), ('constituents', 0.011), ('treebank', 0.01), ('negative', 0.01), ('rule', 0.01), ('role', 0.01), ('committee', 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

Author: Fang Kong ; Guodong Zhou

Abstract: This paper proposes a unified framework for zero anaphora resolution, which can be divided into three sub-tasks: zero anaphor detection, anaphoricity determination and antecedent identification. In particular, all the three sub-tasks are addressed using tree kernel-based methods with appropriate syntactic parse tree structures. Experimental results on a Chinese zero anaphora corpus show that the proposed tree kernel-based methods significantly outperform the feature-based ones. This indicates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel-based methods in modeling such structural information. To our best knowledge, this is the first systematic work dealing with all the three sub-tasks in Chinese zero anaphora resolution via a unified framework. Moreover, we release a Chinese zero anaphora corpus of 100 documents, which adds a layer of annotation to the manu- ally-parsed sentences in the Chinese Treebank (CTB) 6.0.

2 0.081937626 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

Author: Karthik Raghunathan ; Heeyoung Lee ; Sudarshan Rangarajan ; Nate Chambers ; Mihai Surdeanu ; Dan Jurafsky ; Christopher Manning

Abstract: Most coreference resolution models determine if two mentions are coreferent using a single function over a set of constraints or features. This approach can lead to incorrect decisions as lower precision features often overwhelm the smaller number of high precision ones. To overcome this problem, we propose a simple coreference architecture based on a sieve that applies tiers of deterministic coreference models one at a time from highest to lowest precision. Each tier builds on the previous tier’s entity cluster output. Further, our model propagates global information by sharing attributes (e.g., gender and number) across mentions in the same cluster. This cautious sieve guarantees that stronger features are given precedence over weaker ones and that each decision is made using all of the information available at the time. The framework is highly modular: new coreference modules can be plugged in without any change to the other modules. In spite of its simplicity, our approach outperforms many state-of-the-art supervised and unsupervised models on several standard corpora. This suggests that sievebased approaches could be applied to other NLP tasks.

3 0.043526009 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions

Author: Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: unkown-abstract

4 0.04207914 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

5 0.040872499 20 emnlp-2010-Automatic Detection and Classification of Social Events

Author: Apoorv Agarwal ; Owen Rambow

Abstract: In this paper we introduce the new task of social event extraction from text. We distinguish two broad types of social events depending on whether only one or both parties are aware of the social contact. We annotate part of Automatic Content Extraction (ACE) data, and perform experiments using Support Vector Machines with Kernel methods. We use a combination of structures derived from phrase structure trees and dependency trees. A characteristic of our events (which distinguishes them from ACE events) is that the participating entities can be spread far across the parse trees. We use syntactic and semantic insights to devise a new structure derived from dependency trees and show that this plays a role in achieving the best performing system for both social event detection and classification tasks. We also use three data sampling approaches to solve the problem of data skewness. Sampling methods improve the F1-measure for the task of relation detection by over 20% absolute over the baseline.

6 0.040710002 40 emnlp-2010-Effects of Empty Categories on Machine Translation

7 0.039861072 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

8 0.036807686 59 emnlp-2010-Identifying Functional Relations in Web Text

9 0.036182173 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

10 0.032481331 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

11 0.030668795 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

12 0.029762983 114 emnlp-2010-Unsupervised Parse Selection for HPSG

13 0.02632946 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

14 0.02555106 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks

15 0.024540931 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

16 0.02438581 67 emnlp-2010-It Depends on the Translation: Unsupervised Dependency Parsing via Word Alignment

17 0.023544298 62 emnlp-2010-Improving Mention Detection Robustness to Noisy Input

18 0.021972384 95 emnlp-2010-SRL-Based Verb Selection for ESL

19 0.021794841 121 emnlp-2010-What a Parser Can Learn from a Semantic Role Labeler and Vice Versa

20 0.021628849 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.078), (1, 0.031), (2, 0.036), (3, 0.09), (4, 0.006), (5, -0.043), (6, 0.035), (7, -0.014), (8, -0.019), (9, -0.06), (10, 0.02), (11, -0.046), (12, -0.032), (13, -0.011), (14, -0.013), (15, -0.002), (16, 0.103), (17, 0.016), (18, 0.05), (19, -0.023), (20, -0.077), (21, 0.06), (22, -0.053), (23, 0.031), (24, -0.11), (25, 0.035), (26, 0.069), (27, 0.096), (28, -0.124), (29, 0.142), (30, -0.034), (31, -0.211), (32, 0.128), (33, 0.068), (34, -0.21), (35, -0.177), (36, 0.3), (37, 0.075), (38, -0.126), (39, 0.123), (40, 0.199), (41, -0.086), (42, -0.341), (43, -0.187), (44, 0.02), (45, 0.13), (46, 0.118), (47, -0.011), (48, -0.376), (49, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99041694 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

Author: Fang Kong ; Guodong Zhou

Abstract: This paper proposes a unified framework for zero anaphora resolution, which can be divided into three sub-tasks: zero anaphor detection, anaphoricity determination and antecedent identification. In particular, all the three sub-tasks are addressed using tree kernel-based methods with appropriate syntactic parse tree structures. Experimental results on a Chinese zero anaphora corpus show that the proposed tree kernel-based methods significantly outperform the feature-based ones. This indicates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel-based methods in modeling such structural information. To our best knowledge, this is the first systematic work dealing with all the three sub-tasks in Chinese zero anaphora resolution via a unified framework. Moreover, we release a Chinese zero anaphora corpus of 100 documents, which adds a layer of annotation to the manu- ally-parsed sentences in the Chinese Treebank (CTB) 6.0.

2 0.30374873 8 emnlp-2010-A Multi-Pass Sieve for Coreference Resolution

Author: Karthik Raghunathan ; Heeyoung Lee ; Sudarshan Rangarajan ; Nate Chambers ; Mihai Surdeanu ; Dan Jurafsky ; Christopher Manning

Abstract: Most coreference resolution models determine if two mentions are coreferent using a single function over a set of constraints or features. This approach can lead to incorrect decisions as lower precision features often overwhelm the smaller number of high precision ones. To overcome this problem, we propose a simple coreference architecture based on a sieve that applies tiers of deterministic coreference models one at a time from highest to lowest precision. Each tier builds on the previous tier’s entity cluster output. Further, our model propagates global information by sharing attributes (e.g., gender and number) across mentions in the same cluster. This cautious sieve guarantees that stronger features are given precedence over weaker ones and that each decision is made using all of the information available at the time. The framework is highly modular: new coreference modules can be plugged in without any change to the other modules. In spite of its simplicity, our approach outperforms many state-of-the-art supervised and unsupervised models on several standard corpora. This suggests that sievebased approaches could be applied to other NLP tasks.

3 0.29682356 53 emnlp-2010-Fusing Eye Gaze with Speech Recognition Hypotheses to Resolve Exophoric References in Situated Dialogue

Author: Zahar Prasov ; Joyce Y. Chai

Abstract: In situated dialogue humans often utter linguistic expressions that refer to extralinguistic entities in the environment. Correctly resolving these references is critical yet challenging for artificial agents partly due to their limited speech recognition and language understanding capabilities. Motivated by psycholinguistic studies demonstrating a tight link between language production and human eye gaze, we have developed approaches that integrate naturally occurring human eye gaze with speech recognition hypotheses to resolve exophoric references in situated dialogue in a virtual world. In addition to incorporating eye gaze with the best recognized spoken hypothesis, we developed an algorithm to also handle multiple hypotheses modeled as word confusion networks. Our empirical results demonstrate that incorporating eye gaze with recognition hypotheses consistently outperforms the results obtained from processing recognition hypotheses alone. Incorporating eye gaze with word confusion networks further improves performance.

4 0.22039914 59 emnlp-2010-Identifying Functional Relations in Web Text

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Determining whether a textual phrase denotes a functional relation (i.e., a relation that maps each domain element to a unique range element) is useful for numerous NLP tasks such as synonym resolution and contradiction detection. Previous work on this problem has relied on either counting methods or lexico-syntactic patterns. However, determining whether a relation is functional, by analyzing mentions of the relation in a corpus, is challenging due to ambiguity, synonymy, anaphora, and other linguistic phenomena. We present the LEIBNIZ system that overcomes these challenges by exploiting the synergy between the Web corpus and freelyavailable knowledge resources such as Freebase. It first computes multiple typedfunctionality scores, representing functionality of the relation phrase when its arguments are constrained to specific types. It then aggregates these scores to predict the global functionality for the phrase. LEIBNIZ outperforms previous work, increasing area under the precisionrecall curve from 0.61 to 0.88. We utilize LEIBNIZ to generate the first public repository of automatically-identified functional relations.

5 0.18773067 93 emnlp-2010-Resolving Event Noun Phrases to Their Verbal Mentions

Author: Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: unkown-abstract

6 0.1752457 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

7 0.1717748 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

8 0.15583117 15 emnlp-2010-A Unified Framework for Scope Learning via Simplified Shallow Semantic Parsing

9 0.14038803 40 emnlp-2010-Effects of Empty Categories on Machine Translation

10 0.13708711 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

11 0.11925521 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

12 0.1183247 114 emnlp-2010-Unsupervised Parse Selection for HPSG

13 0.10794065 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

14 0.10518508 84 emnlp-2010-NLP on Spoken Documents Without ASR

15 0.10186251 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

16 0.097875088 110 emnlp-2010-Turbo Parsers: Dependency Parsing by Approximate Variational Inference

17 0.096217126 43 emnlp-2010-Enhancing Domain Portability of Chinese Segmentation Model Using Chi-Square Statistics and Bootstrapping

18 0.086704731 20 emnlp-2010-Automatic Detection and Classification of Social Events

19 0.081269979 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

20 0.080409408 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.011), (3, 0.034), (12, 0.037), (29, 0.044), (30, 0.022), (52, 0.025), (56, 0.038), (62, 0.01), (66, 0.065), (70, 0.416), (72, 0.041), (76, 0.049), (82, 0.028), (87, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83864927 14 emnlp-2010-A Tree Kernel-Based Unified Framework for Chinese Zero Anaphora Resolution

Author: Fang Kong ; Guodong Zhou

Abstract: This paper proposes a unified framework for zero anaphora resolution, which can be divided into three sub-tasks: zero anaphor detection, anaphoricity determination and antecedent identification. In particular, all the three sub-tasks are addressed using tree kernel-based methods with appropriate syntactic parse tree structures. Experimental results on a Chinese zero anaphora corpus show that the proposed tree kernel-based methods significantly outperform the feature-based ones. This indicates the critical role of the structural information in zero anaphora resolution and the necessity of tree kernel-based methods in modeling such structural information. To our best knowledge, this is the first systematic work dealing with all the three sub-tasks in Chinese zero anaphora resolution via a unified framework. Moreover, we release a Chinese zero anaphora corpus of 100 documents, which adds a layer of annotation to the manu- ally-parsed sentences in the Chinese Treebank (CTB) 6.0.

2 0.65343976 19 emnlp-2010-Automatic Analysis of Rhythmic Poetry with Applications to Generation and Translation

Author: Erica Greene ; Tugba Bodrumlu ; Kevin Knight

Abstract: Tugba Bodrumlu Dept. of Computer Science Univ. of Southern California Los Angeles, CA 90089 bodrumlu@cs . usc . edu Kevin Knight Information Sciences Institute Univ. of Southern California 4676 Admiralty Way Marina del Rey, CA 90292 kn i @ i i ght s .edu from existing online poetry corpora. We use these patterns to generate new poems and translate exist- We employ statistical methods to analyze, generate, and translate rhythmic poetry. We first apply unsupervised learning to reveal word-stress patterns in a corpus of raw poetry. We then use these word-stress patterns, in addition to rhyme and discourse models, to generate English love poetry. Finally, we translate Italian poetry into English, choosing target realizations that conform to desired rhythmic patterns.

3 0.25736132 21 emnlp-2010-Automatic Discovery of Manner Relations and its Applications

Author: Eduardo Blanco ; Dan Moldovan

Abstract: This paper presents a method for the automatic discovery of MANNER relations from text. An extended definition of MANNER is proposed, including restrictions on the sorts of concepts that can be part of its domain and range. The connections with other relations and the lexico-syntactic patterns that encode MANNER are analyzed. A new feature set specialized on MANNER detection is depicted and justified. Experimental results show improvement over previous attempts to extract MANNER. Combinations of MANNER with other semantic relations are also discussed.

4 0.25494996 40 emnlp-2010-Effects of Empty Categories on Machine Translation

Author: Tagyoung Chung ; Daniel Gildea

Abstract: We examine effects that empty categories have on machine translation. Empty categories are elements in parse trees that lack corresponding overt surface forms (words) such as dropped pronouns and markers for control constructions. We start by training machine translation systems with manually inserted empty elements. We find that inclusion of some empty categories in training data improves the translation result. We expand the experiment by automatically inserting these elements into a larger data set using various methods and training on the modified corpus. We show that even when automatic prediction of null elements is not highly accurate, it nevertheless improves the end translation result.

5 0.25487712 26 emnlp-2010-Classifying Dialogue Acts in One-on-One Live Chats

Author: Su Nam Kim ; Lawrence Cavedon ; Timothy Baldwin

Abstract: We explore the task of automatically classifying dialogue acts in 1-on-1 online chat forums, an increasingly popular means of providing customer service. In particular, we investigate the effectiveness of various features and machine learners for this task. While a simple bag-of-words approach provides a solid baseline, we find that adding information from dialogue structure and inter-utterance dependency provides some increase in performance; learners that account for sequential dependencies (CRFs) show the best performance. We report our results from testing using a corpus of chat dialogues derived from online shopping customer-feedback data.

6 0.25432408 55 emnlp-2010-Handling Noisy Queries in Cross Language FAQ Retrieval

7 0.25069728 20 emnlp-2010-Automatic Detection and Classification of Social Events

8 0.2480614 35 emnlp-2010-Discriminative Sample Selection for Statistical Machine Translation

9 0.24452487 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

10 0.2440571 32 emnlp-2010-Context Comparison of Bursty Events in Web Search and Online Media

11 0.24377683 62 emnlp-2010-Improving Mention Detection Robustness to Noisy Input

12 0.24221267 61 emnlp-2010-Improving Gender Classification of Blog Authors

13 0.24124792 27 emnlp-2010-Clustering-Based Stratified Seed Sampling for Semi-Supervised Relation Classification

14 0.24100901 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

15 0.24082749 31 emnlp-2010-Constraints Based Taxonomic Relation Classification

16 0.23992912 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

17 0.23940727 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

18 0.23887643 120 emnlp-2010-What's with the Attitude? Identifying Sentences with Attitude in Online Discussions

19 0.23870264 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

20 0.23840952 69 emnlp-2010-Joint Training and Decoding Using Virtual Nodes for Cascaded Segmentation and Tagging Tasks