acl acl2013 acl2013-245 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hen-Hsen Huang ; Kai-Chun Chang ; Hsin-Hsi Chen
Abstract: This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. [sent-5, score-0.763]
2 We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. [sent-6, score-1.27]
3 Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. [sent-7, score-0.607]
4 We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings. [sent-8, score-1.242]
5 1 Introduction Textual Entailment (TE) is a directional relationship between pairs of text expressions, text (T) and hypothesis (H). [sent-9, score-0.097]
6 If human would agree that the meaning of H can be inferred from the meaning of T, we say that T entails H (Dagan et al. [sent-10, score-0.119]
7 The researches on textual entailment have attracted much attention in recent years due to its potential applications (Androutsopoulos and Malakasiotis, 2010). [sent-12, score-0.629]
8 , 2011), a series of evaluations on the developments of English TE recognition technologies, have been held seven times up to 2011. [sent-14, score-0.074]
9 In the meanwhile, TE recognition technologies in other languages are also underway (Shima, et al. [sent-15, score-0.102]
10 , (2010) propose an evaluation metric to examine the characteristics of a TE recognition system. [sent-18, score-0.074]
11 They annotate texthypothesis pairs selected from the RTE-5 test set with a series of linguistic phenomena required in the human inference process. [sent-19, score-0.735]
12 The RTE systems are evaluated by the new indicators, such as how many T-H pairs annotated with a particular phentu edu tw ; hhchen@ ntu . [sent-20, score-0.109]
13 The indicators can tell developers which systems are bet- ter to deal with T-H pairs with the appearance of which phenomenon. [sent-25, score-0.135]
14 That would give developers a direction to enhance their RTE systems. [sent-26, score-0.038]
15 Such linguistic phenomena are thought as important in the human inference process by annotators. [sent-27, score-0.666]
16 We aim at knowing the ultimate performance of TE recognition systems which embody human knowledge in the inference process. [sent-29, score-0.233]
17 The experiments show five negative entailment phenomena are strong features for TE recognition, and this finding confirms the previous study of Vanderwende et al. [sent-30, score-1.315]
18 We propose a method to acquire the linguistic phenomena automatically and use them in TE recognition. [sent-32, score-0.565]
19 In Section 2, we introduce linguistic phenomena used by annotators in the inference process and point out five significant negative entailment phenomena. [sent-34, score-1.413]
20 Section 3 proposes a method to extract them from T-H pairs automatically, and discuss their effects on TE recognition. [sent-35, score-0.104]
21 , 2011) and discuss their ef- fects on TE recognition in Chinese. [sent-37, score-0.074]
22 2 Human Inference Process in TE We regard the human annotated phenomena as features in recognizing the binary entailment relation between the given T-H pairs, i. [sent-39, score-1.362]
23 Total 210 T-H pairs are chosen from the RTE-5 test set by Sammons et al. [sent-42, score-0.045]
24 (2010), and total 39 linguistic phenomena divided into the 5 aspects, including knowledge domains, hypothesis structures, inference phenomena, negative entailment phenome446 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-43, score-1.368]
25 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 4 6–450, na, and knowledge resources, are annotated on the selected dataset. [sent-45, score-0.037]
26 1 Five aspects as features We train SVM classifiers to evaluate the performances of the five aspects of phenomena as features for TE recognition. [sent-47, score-0.757]
27 (2010), two annotators are involved in labeling the above 39 linguistic phenomena on the T-H pairs. [sent-51, score-0.59]
28 Schemes “Annotator A” and “Annotator B” mean the phenomena labelled by annotator A and annotator B are used as features respectively. [sent-56, score-0.695]
29 The “A AND B” scheme, a strict criterion, denotes a phenomenon exists in a T-H pair only if both annotators agree with its appearance. [sent-57, score-0.17]
30 In contrast, the “A OR B” scheme, a looser criterion, denotes a phenomenon exists in a T-H pair if at least one annotator marks its appearance. [sent-58, score-0.207]
31 We can see that the aspect of negative entailment phenomena is the most significant feature among the five aspects. [sent-59, score-1.327]
32 With only 9 phenomena in this aspect, the SVM classifier achieves accuracy above 90% no matter which labeling schemes are adopted. [sent-60, score-0.642]
33 Comparatively, the best accuracy in RTE-5 task is 73. [sent-61, score-0.048]
34 In negative entailment phenomena aspect, the “A OR B” scheme achieves the best accuracy. [sent-63, score-1.288]
35 2 Negative entailment phenomena There is a large gap between using negative entailment phenomena and using the second effective features (i. [sent-66, score-2.309]
36 Moreover, using the negative entailment phenomena as features only is even better than using all the 39 linguistic phenomena. [sent-69, score-1.276]
37 We further analyze which negative entailment phenomena are more significant. [sent-70, score-1.243]
38 There are nine linguistic phenomena in the aspect of negative entailment. [sent-71, score-0.787]
39 We take each phenomenon as a single feature to do the two-way textual entailment recognition. [sent-72, score-0.704]
40 The 1st column is phenomenon ID, the 2nd column is the phenomenon, and the 3rd column is the accuracy of using the phenomenon in the binary classification. [sent-77, score-0.279]
41 62% shown in Table 1, the highest accuracy in Table 2 is 69. [sent-79, score-0.048]
42 It shows that each phenomenon is suitable for some T-H pairs, and merging all negative entailment phenomena together achieves the best performance. [sent-81, score-1.327]
43 We consider all possible combinations of these 9 negative entailment phenomena, i. [sent-82, score-0.684]
44 , +… + =511 feature settings, and use each feature setting to do 2-way entailment relation recognition by LIBSVM. [sent-84, score-0.75]
45 The model using all nine phenomena achieves the best accuracy of 97. [sent-86, score-0.649]
46 Examining the combination sets, we find phenomena IDs 3, 4, 5, 7 and 8 appear quite often in the top 4 feature settings of each combination set. [sent-88, score-0.532]
47 In fact, this setting achieves an accuracy of 95. [sent-89, score-0.084]
48 On the one hand, adding more phenomena into (3, 4, 5, 7, 8) setting does not have much performance difference. [sent-91, score-0.532]
49 In the above experiments, we do all the analyses on the corpus annotated with linguistic phenomena by human. [sent-92, score-0.627]
50 We aim at knowing the ulti447 mate performance of TE recognition systems embodying human knowledge in the inference. [sent-93, score-0.138]
51 The human knowledge in the inference cannot be captured by TE recognition systems fully correctly. [sent-94, score-0.175]
52 In the later experiments, we explore the five critical features, (3, 4, 5, 7, 8), and examine how the performance is affected if they are extracted automatically. [sent-95, score-0.072]
53 2 show that disconnected relation, exclusive argument, exclusive relation, missing argument, and missing relation are significant. [sent-97, score-0.7]
54 The arguments and the relations in Hypothesis (H) are all matched by counterparts in Text (T). [sent-101, score-0.176]
55 None of the arguments in T is connected to the matching relation. [sent-102, score-0.123]
56 There is a relation common to both the hypothesis and the text, but one argument is matched in a way that makes H contradict T. [sent-104, score-0.308]
57 There are two or more arguments in the hypothesis that are also related in the text, but by a relation that means H contradicts T. [sent-106, score-0.317]
58 Entailment fails because an argument in the Hypothesis is not present in the Text, either explicitly or implicitly. [sent-108, score-0.115]
59 Entailment fails because a relation in the Hypothesis is not present in the Text, either explicitly or implicitly. [sent-110, score-0.169]
60 To model the annotator’s inference process, we must first determine the arguments and the relations existing in T and H, and then align the arguments and relations in H to the related ones in T. [sent-111, score-0.419]
61 It is easy for human to find the important parts in a text description in the inference process, but it is challenging for a machine to determine what words are important and what are not, and to detect the boundary of arguments and relations. [sent-112, score-0.25]
62 Moreover, two arguments (relations) of strong semantic relatedness are not always literally identical. [sent-113, score-0.156]
63 In the following, a method is proposed to extract the phenomena from T-H pairs automatically. [sent-114, score-0.607]
64 Before extraction, the English T-H pairs are pre-processed by numerical character transformation, POS tagging, and dependency parsing with Stanford Parser (Marneffe, et al. [sent-115, score-0.045]
65 1 A feature extraction method Given a T-H pair, we first extract 4 sets of noun phrases based on their POS tags, including {noun in H}, {named entity (nnp) in H}, {compound noun (cnn) in T}, and {compound noun (cnn) in H}. [sent-118, score-0.159]
66 Then, we extract 2 sets of relations, including {relation in H} and {relation in T}, where each relation in the sets is in a form of Predicate(Argument1, Argument2). [sent-119, score-0.172]
67 Some typical examples of relations are verb(subject, object) for verb phrases, neg(A, B) for negations, num(Noun, number) for numeric modifier, and tmod(C, temporal argument) for temporal modifier. [sent-120, score-0.053]
68 A predicate has only 2 arguments in this representation. [sent-121, score-0.123]
69 Instead of measuring the relatedness of T-H pairs by comparing T and H on the predicateargument structure (Wang and Zhang, 2009), our method tries to find the five negative entailment phenomena based on the similar representation. [sent-123, score-1.366]
70 Each of the five negative entailment phenomena is extracted as follows according to their definitions. [sent-124, score-1.288]
71 Furthermore, we introduce WordNet to align arguments in Hto T. [sent-126, score-0.123]
72 If (1) for each a {noun in H}{nnp in H}{cnn in H}, we can find a T too, and (2) for each r1=h(a1,a2) {relation in H}, we can find a relation r2=h(a3,a4) {relation in T} with the same header h, but with different arguments, i. [sent-128, score-0.188]
73 , a3 ≠ a1 and a4≠ a2, then we say the T-H pair has the “Disconnected Relation” phenomenon. [sent-130, score-0.091]
74 If there exist a relation r1=h(a1,a2) {relation in H}, and a relation r2=h(a3,a4) {relation in T} where both relations have the same header h, but either the pair (a1,a3) or the pair (a2,a4) is an antonym by looking up WordNet, then we say the T-H pair has the “Exclusive Argument” phenomenon. [sent-132, score-0.59]
75 If there exist a relation r1=h1(a1,a2) {relation in T}, and a relation r2=h2(a1,a2) {relation in H} where both relations have the same arguments, but h1 and h2 have the opposite meanings by consulting WordNet, then we say that the T-H pair has the “Exclusive Relation” phenomenon. [sent-134, score-0.468]
76 For each argument a1 {noun in H}{nnp in H}{cnn in H}, if there does not exist an argument a2 T such that a1=a2, then we say that the T-H pair has “Missing Argument” phenomenon. [sent-136, score-0.307]
77 For each relation r1=h1(a1,a2) {relation in H}, if there does not exist a relation r2=h2(a3,a4) {relation in T} such that h1=h2, then we say that the T-H pair has “Missing Relation” phenomenon. [sent-138, score-0.415]
78 2 Experiments and discussion The following two datasets are used in English TE recognition experiments. [sent-140, score-0.074]
79 The 210 T-H pairs are annotated with the linguistic phenomena by human annotators. [sent-142, score-0.681]
80 They are selected from the 600 pairs in RTE-5 test set, including 51% ENTAILMENT and 49% NO ENTAILMENT. [sent-143, score-0.045]
81 The “Machine-annotated” and the “Human-annotated” columns denote that the phenomena annotated by machine and human are used in the evaluation respectively. [sent-147, score-0.603]
82 Using “Human-annotated” phenomena can be seen as the upper-bound of the experiments. [sent-148, score-0.532]
83 Though the performance of using the phenomena extracted automatically by machine is not comparable to that of using the human annotated ones, the accuracy achieved by using only 5 features (59. [sent-152, score-0.678]
84 17%) is just a little lower than the average accuracy of all runs in RTE-5 formal runs (60. [sent-153, score-0.112]
85 It shows that the significant phenomena are really effective in dealing with entailment recognition. [sent-156, score-1.093]
86 If we can improve the performance of the automatic phenomena extraction, it may make a great progress on the textual entailment. [sent-157, score-0.627]
87 ain36g8re7dsn-% i tion using the extracted phenomena as features. [sent-161, score-0.532]
88 4 Negative Entailment Phenomena in Chinese RITE Dataset To make sure if negative entailment phenomena exist in other languages, we apply the methodologies in Sections 2 and 3 to the RITE dataset in NTCIR-9. [sent-162, score-1.282]
89 We annotate all the 9 negative entailment phenomena on Chinese T-H pairs according to the definitions by Sammons et al. [sent-163, score-1.285]
90 (2010) and analyze the effects of various combinations of the phenomena on the new annotated Chinese data. [sent-164, score-0.625]
91 The accuracy of using all the 9 phenomena as features (i. [sent-165, score-0.607]
92 The significant negative entailment phenomena on Chinese data, i. [sent-170, score-1.216]
93 The model using only 5 phenomena achieves an accuracy of 90. [sent-173, score-0.616]
94 We also classify the entailment relation using the phenomena extracted automatically by the similar method shown in Section 3. [sent-175, score-1.208]
95 The accuracy achieved by using the five automatically extracted phenomena as features is 57. [sent-177, score-0.679]
96 11%, and the average accuracy of all runs in NTCIR-9 RITE task is 59. [sent-178, score-0.08]
97 5 Conclusion In this paper we conclude that the negative entailment phenomena have a great effect in dealing with TE recognition. [sent-183, score-1.243]
98 Systems with human annotated knowledge achieve very good performance. [sent-184, score-0.071]
99 Though the automatic extraction of the negative entailment phenomena still needs a lot of efforts, it gives us a new direction to deal with the TE problem. [sent-186, score-1.242]
100 Besides, learning-based approaches to extract phenomena and multi-class TE recognition will be explored in the future. [sent-188, score-0.636]
wordName wordTfidf (topN-words)
[('entailment', 0.534), ('phenomena', 0.532), ('te', 0.227), ('rite', 0.154), ('negative', 0.15), ('exclusive', 0.143), ('relation', 0.142), ('sammons', 0.138), ('arguments', 0.123), ('cnn', 0.097), ('textual', 0.095), ('shima', 0.092), ('disconnected', 0.092), ('missing', 0.09), ('argument', 0.088), ('bentivogli', 0.077), ('phenomenon', 0.075), ('recognition', 0.074), ('gaithersburg', 0.072), ('five', 0.072), ('taiwan', 0.069), ('annotator', 0.068), ('inference', 0.067), ('nnp', 0.06), ('dagan', 0.059), ('chinese', 0.058), ('pascal', 0.056), ('recognizing', 0.056), ('iftene', 0.056), ('rte', 0.054), ('relations', 0.053), ('say', 0.053), ('hypothesis', 0.052), ('maryland', 0.051), ('tac', 0.05), ('androutsopoulos', 0.048), ('accuracy', 0.048), ('ido', 0.047), ('header', 0.046), ('pairs', 0.045), ('noun', 0.043), ('vanderwende', 0.04), ('exist', 0.04), ('aspect', 0.039), ('nltk', 0.039), ('levy', 0.039), ('danilo', 0.039), ('luisa', 0.039), ('pair', 0.038), ('developers', 0.038), ('annotated', 0.037), ('trang', 0.037), ('achieves', 0.036), ('scheme', 0.036), ('libsvm', 0.036), ('aspects', 0.036), ('bernardo', 0.034), ('human', 0.034), ('relatedness', 0.033), ('linguistic', 0.033), ('nine', 0.033), ('dang', 0.032), ('runs', 0.032), ('hoa', 0.032), ('agree', 0.032), ('extract', 0.03), ('compound', 0.03), ('knowing', 0.03), ('chang', 0.029), ('effects', 0.029), ('tmod', 0.028), ('embody', 0.028), ('nomenon', 0.028), ('prodromos', 0.028), ('traction', 0.028), ('underway', 0.028), ('marneffe', 0.027), ('fails', 0.027), ('features', 0.027), ('column', 0.027), ('dealing', 0.027), ('performances', 0.027), ('tw', 0.027), ('analyze', 0.027), ('fifth', 0.026), ('indicators', 0.026), ('dataset', 0.026), ('deal', 0.026), ('boundary', 0.026), ('schemes', 0.026), ('contradict', 0.026), ('uaic', 0.026), ('shuming', 0.026), ('looser', 0.026), ('malakasiotis', 0.026), ('annotators', 0.025), ('analyses', 0.025), ('vydiswaran', 0.024), ('annotate', 0.024), ('criterion', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
Author: Hen-Hsen Huang ; Kai-Chun Chang ; Hsin-Hsi Chen
Abstract: This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings.
2 0.38790402 297 acl-2013-Recognizing Partial Textual Entailment
Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych
Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.
3 0.21760887 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations
Author: Kimi Kaneko ; Yusuke Miyao ; Daisuke Bekki
Abstract: This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by investigating recognition accuracy.
4 0.16259769 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
Author: Katsuma Narisawa ; Yotaro Watanabe ; Junta Mizuno ; Naoaki Okazaki ; Kentaro Inui
Abstract: This paper presents novel methods for modeling numerical common sense: the ability to infer whether a given number (e.g., three billion) is large, small, or normal for a given context (e.g., number of people facing a water shortage). We first discuss the necessity of numerical common sense in solving textual entailment problems. We explore two approaches for acquiring numerical common sense. Both approaches start with extracting numerical expressions and their context from the Web. One approach estimates the distribution ofnumbers co-occurring within a context and examines whether a given value is large, small, or normal, based on the distri- bution. Another approach utilizes textual patterns with which speakers explicitly expresses their judgment about the value of a numerical expression. Experimental results demonstrate the effectiveness of both approaches.
5 0.09850011 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
Author: Mikhail Kozhevnikov ; Ivan Titov
Abstract: Semantic Role Labeling (SRL) has become one of the standard tasks of natural language processing and proven useful as a source of information for a number of other applications. We address the problem of transferring an SRL model from one language to another using a shared feature representation. This approach is then evaluated on three language pairs, demonstrating competitive performance as compared to a state-of-the-art unsupervised SRL system and a cross-lingual annotation projection baseline. We also consider the contribution of different aspects of the feature representation to the performance of the model and discuss practical applicability of this method. 1 Background and Motivation Semantic role labeling has proven useful in many natural language processing tasks, such as question answering (Shen and Lapata, 2007; Kaisser and Webber, 2007), textual entailment (Sammons et al., 2009), machine translation (Wu and Fung, 2009; Liu and Gildea, 2010; Gao and Vogel, 2011) and dialogue systems (Basili et al., 2009; van der Plas et al., 2009). Multiple models have been designed to automatically predict semantic roles, and a considerable amount of data has been annotated to train these models, if only for a few more popular languages. As the annotation is costly, one would like to leverage existing resources to minimize the human effort required to construct a model for a new language. A number of approaches to the construction of semantic role labeling models for new languages have been proposed. On one end of the scale is unsupervised SRL, such as Grenager and Manning (2006), which requires some expert knowledge, but no labeled data. It clusters together arguments that should bear the same semantic role, but does not assign a particular role to each cluster. On the other end is annotating a new dataset from scratch. There are also intermediate options, which often make use of similarities between languages. This way, if an accurate model exists for one language, it should help simplify the construction of a model for another, related language. The approaches in this third group often use parallel data to bridge the gap between languages. Cross-lingual annotation projection systems (Pad o´ and Lapata, 2009), for example, propagate information directly via word alignment links. However, they are very sensitive to the quality of parallel data, as well as the accuracy of a sourcelanguage model on it. An alternative approach, known as cross-lingual model transfer, or cross-lingual model adaptation, consists of modifying a source-language model to make it directly applicable to a new language. This usually involves constructing a shared feature representation across the two languages. McDonald et al. (201 1) successfully apply this idea to the transfer of dependency parsers, using part-of- speech tags as the shared representation of words. A later extension of T ¨ackstr o¨m et al. (2012) enriches this representation with cross-lingual word clusters, considerably improving the performance. In the case of SRL, a shared representation that is purely syntactic is likely to be insufficient, since structures with different semantics may be realized by the same syntactic construct, for example “in August” vs “in Britain”. However with the help of recently introduced cross-lingual word represen1190 Proce dingsS o f ita h,e B 5u1lgsta Arinan,u Aaulg Musete 4ti-n9g 2 o0f1 t3h.e ? Ac s2s0o1ci3a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 1 90–120 , tations, such as the cross-lingual clustering mentioned above or cross-lingual distributed word representations of Klementiev et al. (2012), we may be able to transfer models of shallow semantics in a similar fashion. In this work we construct a shared feature representation for a pair of languages, employing crosslingual representations of syntactic and lexical information, train a semantic role labeling model on one language and apply it to the other one. This approach yields an SRL model for a new language at a very low cost, effectively requiring only a source language model and parallel data. We evaluate on five (directed) language pairs EN-ZH, ZH-EN, EN-CZ, CZ-EN and EN-FR, where EN, FR, CZ and ZH denote English, French, Czech and Chinese, respectively. The transferred model is compared against two baselines: an unsupervised SRL system and a model trained on the output of a cross-lingual annotation projection system. In the next section we will describe our setup, then in section 3 present the shared feature representation we use, discuss the evaluation data and other technical aspects in section 4, present the results and conclude with an overview of related work. – 2 Setup The purpose of the study is not to develop a yet another semantic role labeling system any existing SRL system can (after some modification) be used in this setup but to assess the practical applicability of cross-lingual model transfer to this – – problem, compare it against the alternatives and identify its strong/weak points depending on a particular setup. 2.1 Semantic Role Labeling Model We consider the dependency-based version of semantic role labeling as described in Haji cˇ et al. (2009) and transfer an SRL model from one language to another. We only consider verbal predicates and ignore the predicate disambiguation stage. We also assume that the predicate identification information is available in most languages it can be obtained using a relatively simple heuristic based on part-of-speech tags. The model performs argument identification and classification (Johansson and Nugues, 2008) separately in a pipeline first each candidate is classified as being or not being a head of an argument phrase with respect to the predicate in question and then each of the arguments is assigned a role from a given inventory. The model is factorized over arguments the decisions regarding the classification of different arguments are made in– – – dependently of each other. With respect to the use of syntactic annotation we consider two options: using an existing dependency parser for the target language and obtaining one by means of cross-lingual transfer (see section 4.2). Following McDonald et al. (201 1), we assume that a part-of-speech tagger is available for the target language. 2.2 SRL in the Low-resource Setting Several approaches have been proposed to obtain an SRL model for a new language with little or no manual annotation. Unsupervised SRL models (Lang and Lapata, 2010) cluster the arguments of predicates in a given corpus according to their semantic roles. The performance of such models can be impressive, especially for those languages where semantic roles correlate strongly with syntactic relation of the argument to its predicate. However, assigning meaningful role labels to the resulting clusters requires additional effort and the model’s parameters generally need some adjustment for every language. If the necessary resources are already available for a closely related language, they can be utilized to facilitate the construction of a model for the target language. This can be achieved either by means of cross-lingual annotation projection (Yarowsky et al., 2001) or by cross-lingual model transfer (Zeman and Resnik, 2008). This last approach is the one we are considering in this work, and the other two options are treated as baselines. The unsupervised model will be further referred to as UNSUP and the projection baseline as PROJ. 2.3 Evaluation Measures We use the F1 measure as a metric for the argument identification stage and accuracy as an aggregate measure of argument classification performance. When comparing to the unsupervised SRL system the clustering evaluation measures are used instead. These are purity and collocation 1191 N1Ximajx|Gj∩ Ci| CO =N1Xjmiax|Gj∩ Ci|, PU = where Ci is the set of arguments in the i-th induced cluster, Gj is the set of arguments in the jth gold cluster and N is the total number of arguments. We report the harmonic mean ofthe two (Lang and Lapata, 2011) and denote it F1c to avoid confusing it with the supervised metric. 3 Model Transfer The idea of this work is to abstract the model away from the particular source language and apply it to a new one. This setup requires that we use the same feature representation for both languages, for example part-of-speech tags and dependency relation labels should be from the same inventory. Some features are not applicable to certain lan- guages because the corresponding phenomena are absent in them. For example, consider a strongly inflected language and an analytic one. While the latter can usually convey the information encoded in the word form in the former one (number, gender, etc.), finding a shared feature representation for such information is non-trivial. In this study we will confine ourselves to those features that are applicable to all languages in question, namely: part-of-speech tags, syntactic dependency structures and representations of the word’s identity. 3.1 Lexical Information We train a model on one language and apply it to a different one. In order for this to work, the words of the two languages have to be mapped into a common feature space. It is also desirable that closely related words from both languages have similar representations in this space. Word mapping. The first option is simply to use the source language words as the shared representation. Here every source language word would have itself as its representation and every target word would map into a source word that corresponds to it. In other words, we supply the model with a gloss of the target sentence. The mapping (bilingual dictionary) we use is derived from a word-aligned parallel corpus, by identifying, for each word in the target language, the word in the source language it is most often aligned to. Cross-lingual clusters. There is no guarantee that each of the words in the evaluation data is present in our dictionary, nor that the corresponding source-language word is present in the training data, so the model would benefit from the ability to generalize over closely related words. This can, for example, be achieved by using cross-lingual word clusters induced in T ¨ackstr o¨m et al. (2012). We incorporate these clusters as features into our model. 3.2 Syntactic Information Part-of-speech Tags. We map part-of-speech tags into the universal tagset following Petrov et al. (2012). This may have a negative effect on the performance of a monolingual model, since most part-of-speech tagsets are more fine-grained than the universal POS tags considered here. For example Penn Treebank inventory contains 36 tags and the universal POS tagset only 12. Since the finergrained POS tags often reflect more languagespecific phenomena, however, they would only be useful for very closely related languages in the cross-lingual setting. The universal part-of-speech tags used in evaluation are derived from gold-standard annotation for all languages except French, where predicted ones had to be used instead. Dependency Structure. Another important aspect of syntactic information is the dependency structure. Most dependency relation inventories are language-specific, and finding a shared representation for them is a challenging problem. One could map dependency relations into a simplified form that would be shared between languages, as it is done for part-of-speech tags in Petrov et al. (2012). The extent to which this would be useful, however, depends on the similarity of syntactic-semantic in– terfaces of the languages in question. In this work we discard the dependency relation labels where the inventories do not match and only consider the unlabeled syntactic dependency graph. Some discrepancies, such as variations in attachment order, may be present even there, but this does not appear to be the case with the datasets we use for evaluation. If a target language is poor in resources, one can obtain a dependency parser for the target language by means of cross-lingual model transfer (Zeman and Resnik, 2008). We 1192 take this into account and evaluate both using the original dependency structures and the ones obtained by means of cross-lingual model transfer. 3.3 The Model The model we use is based on that of Bj ¨orkelund et al. (2009). It is comprised of a set of linear classifiers trained using Liblinear (Fan et al., 2008). The feature model was modified to accommodate the cross-lingual cluster features and the reranker component was not used. We do not model the interaction between different argument roles in the same predicate. While this has been found useful, in the cross-lingual setup one has to be careful with the assumptions made. For example, modeling the sequence of roles using a Markov chain (Thompson et al., 2003) may not work well in the present setting, especially between distant languages, as the order or arguments is not necessarily preserved. Most constraints that prove useful for SRL (Chang et al., 2007) also require customization when applied to a new language, and some rely on languagespecific resources, such as a valency lexicon. Taking into account the interaction between different arguments of a predicate is likely to improve the performance of the transferred model, but this is outside the scope of this work. 3.4 Feature Selection Compatibility of feature representations is necessary but not sufficient for successful model transfer. We have to make sure that the features we use are predictive of similar outcomes in the two languages as well. Depending on the pair of languages in question, different aspects of the feature representation will retain or lose their predictive power. We can be reasonably certain that the identity of an argument word is predictive of its semantic role in any language, but it might or might not be true of, for example, the word directly preceding the argument word. It is therefore important to pre- SCPDGylOespoSntreslTabunc1lra:obsFel-daitnguplrdoaeusntpagd-elronwfu-dcsopeyrnsd c.eylafguhtorsia mepgnrhs vent the model from capturing overly specific aspects of the source language, which we do by confining the model to first-order features. We also avoid feature selection, which, performed on the source language, is unlikely to help the model to better generalize to the target one. The experiments confirm that feature selection and the use of second-order features degrade the performance of the transferred model. 3.5 Feature Groups For each word, we use its part-of-speech tag, cross-lingual cluster id, word identity (glossed, when evaluating on the target language) and its dependency relation to its parent. Features associated with an argument word include the attributes of the predicate word, the argument word, its parent, siblings and children, and the words directly preceding and following it. Also included are the sequences of part-of-speech tags and dependency relations on the path between the predicate and the argument. Since we are also interested in the impact of different aspects of the feature representation, we divide the features into groups as summarized in table 1 and evaluate their respective contributions to the performance of the model. If a feature group is enabled the model has access to the corre– sponding source of information. For example, if only POS group is enabled, the model relies on the part-of-speech tags of the argument, the predicate and the words to the right and left of the argument word. If Synt is enabled too, it also uses the POS tags of the argument’s parent, children and siblings. Word order information constitutes an implicit group that is always available. It includes the Pos it ion feature, which indicates whether the argument is located to the left or to the right of the predicate, and allows the model to look up the attributes of the words directly preceding and following the argument word. The model we compare against the baselines uses all applicable feature groups (Deprel is only used in EN-CZ and CZ-EN experiments with original syntax). 4 Evaluation 4.1 Datasets and Preprocessing Evaluation of the cross-lingual model transfer requires a rather specific kind of dataset. Namely, the data in both languages has to be annotated 1193 with the same set of semantic roles following the same (or compatible) guidelines, which is seldom the case. We have identified three language pairs for which such resources are available: EnglishChinese, English-Czech and English-French. The evaluation datasets for English and Chinese are those from the CoNLL Shared Task 2009 (Haji ˇc et al., 2009) (henceforth CoNLL-ST). Their annotation in the CoNLL-ST is not identical, but the guidelines for “core” semantic roles are similar (Kingsbury et al., 2004), so we evaluate only on core roles here. The data for the second language pair is drawn from the Prague Czech-English Dependency Treebank 2.0 (Haji ˇc et al., 2012), which we converted to a format similar to that of CoNLL-ST1 . The original annotation uses the tectogrammatical representation (Haji ˇc, 2002) and an inventory of semantic roles (or functors), most of which are interpretable across various predicates. Also note that the syntactic anno- tation of English and Czech in PCEDT 2.0 is quite similar (to the extent permitted by the difference in the structure of the two languages) and we can use the dependency relations in our experiments. For English-French, the English CoNLL-ST dataset was used as a source and the model was evaluated on the manually annotated dataset from van der Plas et al. (201 1). The latter contains one thousand sentences from the French part ofthe Europarl (Koehn, 2005) corpus, annotated with semantic roles following an adapted version of PropBank (Palmer et al., 2005) guidelines. The authors perform annotation projection from English to French, using a joint model of syntax and semantics and employing heuristics for filtering. We use a model trained on the output of this projection system as one of the baselines. The evaluation dataset is relatively small in this case, so we perform the transfer only one-way, from English to French. The part-of-speech tags in all datasets were replaced with the universal POS tags of Petrov et al. (2012). For Czech, we have augmented the map- pings to account for the tags that were not present in the datasets from which the original mappings were derived. Namely, tag “t” is mapped to “VERB” and “Y” to “PRON”. We use parallel data to construct a bilingual dictionary used in word mapping, as well as in the projection baseline. For English-Czech – 1see http://www.ml4nlp.de/code-and-data/treex2conll and English-French, the data is drawn from Europarl (Koehn, 2005), for English-Chinese from MultiUN (Eisele and Chen, 2010). The word alignments were obtained using GIZA++ (Och and Ney, 2003) and the intersection heuristic. – 4.2 Syntactic Transfer In the low-resource setting, we cannot always rely on the availability of an accurate dependency parser for the target language. If one is not available, the natural solution would be to use crosslingual model transfer to obtain it. Unfortunately, the models presented in the previous work, such as Zeman and Resnik (2008), McDonald et al. (201 1) and T ¨ackstr o¨m et al. (2012), were not made available, so we reproduced the direct transfer algorithm of McDonald et al. (201 1), using Malt parser (Nivre, 2008) and the same set of features. We did not reimplement the projected transfer algorithm, however, and used the default training procedure instead of perceptron-based learning. The dependency structure thus obtained is, of course, only a rough approximation even a much more sophisticated algorithm may not perform well when transferring syntax between such languages as Czech and English, given the inherent difference in their structure. The scores are shown in table 2. We will henceforth refer to the syntactic annotations that were provided with the datasets as original, as opposed to the annotations obtained by means of syntactic transfer. – 4.3 Baselines Unsupervised Baseline: We are using a version of the unsupervised semantic role induction system of Titov and Klementiev (2012a) adapted to SetupUAS, % Table2:SyntaciE C ZcN HNt- rE ZaCFnN HZRsfer34 692567acuracy,unlabe dat- tachment score (percent). Note that in case of French we evaluate against the output of a supervised system, since manual annotation is not available for this dataset. This score does not reflect the true performance of syntactic transfer. 1194 the shared feature representation considered in order to make the scores comparable with those of the transfer model and, more importantly, to enable evaluation on transferred syntax. Note that the original system, tailored to a more expressive language-specific syntactic representation and equipped with heuristics to identify active/passive voice and other phenomena, achieves higher scores than those we report here. Projection Baseline: The projection baseline we use for English-Czech and English-Chinese is a straightforward one: we label the source side of a parallel corpus using the source-language model, then identify those verbs on the target side that are aligned to a predicate, mark them as predicates and propagate the argument roles in the same fashion. A model is then trained on the resulting training data and applied to the test set. For English-French we instead use the output of a fully featured projection model of van der Plas et al. (201 1), published in the CLASSiC project. 5 Results In order to ensure that the results are consistent, the test sets, except for the French one, were partitioned into five equal parts (of 5 to 10 thousand sentences each, depending on the dataset) and the evaluation performed separately on each one. All evaluation figures for English, Czech or Chinese below are the average values over the five subsets. In case of French, the evaluation dataset is too small to split it further, so instead we ran the evaluation five times on a randomly selected 80% sample of the evaluation data and averaged over those. In both cases the results are consistent over the subsets, the standard deviation does not exceed 0.5% for the transfer system and projection baseline and 1% for the unsupervised system. 5.1 Argument Identification We summarize the results in table 3. Argument identification is known to rely heavily on syntactic information, so it is unsurprising that it proves inaccurate when transferred syntax is used. Our simple projection baseline suffers from the same problem. Even with original syntactic information available, the performance of argument identification is moderate. Note that the model of (van der Plas et al., 2011), though relying on more expressive syntax, only outperforms the transferred system by 3% (F1) on this task. SetupSyntaxTRANSPROJ ZEC NH Z- EFCZNRHt r a n s 3462 1. 536 142 35. 4269 Table3EZ C:N H- CFEZANHZRrgumeon rt ig identf56 7ic13 a. t27903ion,21569t10ra. 3976nsferd model vs. projection baseline, F1. Most unsupervised SRL approaches assume that the argument identification is performed by some external means, for example heuristically (Lang and Lapata, 2011). Such heuristics or unsupervised approaches to argument identification (Abend et al., 2009) can also be used in the present setup. 5.2 Argument Classification In the following tables, TRANS column contains the results for the transferred system, UNSUP for the unsupervised baseline and PROJ for projection baseline. We highlight in bold the higher score where the difference exceeds twice the maximum of the standard deviation estimates of the two results. Table 4 presents the unsupervised evaluation results. Note that the unsupervised model performs as well as the transferred one or better where the – – SetupSyntaxTRANSUNSUP ZEC NH Z- EFCZNRHt r a n s 768 93648. 34627 6 5873. 1769 TableEZ C4NHZ:- FCEZANHZRrgumoe nr itg clasi78 fi94 3c. a25136tion,8 7 r9a4263n. 07 sferd model vs. unsupervised baseline in terms of the clustering metric F1c (see section 2.3). 1195 SetupSyntaxTRANSPROJ ZEC NH Z- EFCZNRHt r a n s 657 053. 1 36456419. 372 Table5EZ C:N H- CFEZANHZRrgumeon rt ig clasif657ic1936a. t170 ion,65 9t3804ra. 20847nsferd model vs. projection baseline, accuracy. original syntactic dependencies are available. In the more realistic scenario with transferred syn- tax, however, the transferred model proves more accurate. In table 5 we compare the transferred system with the projection baseline. It is easy to see that the scores vary strongly depending on the language pair, due to both the difference in the annotation scheme used and the degree of relatedness between the languages. The drop in performance when transferring the model to another language is large in every case, though, see table 6. SetupTargetSource Table6:MoCEZdHeNZ l- FECaZNRcH urac67 y53169o. 017nthes87 o25670u. r1245ceandtrge language using original syntax. The source language scores for English vary between language pairs because of the difference in syntactic annotation and role subset used. We also include the individual F1 scores for the top-10 most frequent labels for EN-CZ transfer with original syntax in table 7. The model provides meaningful predictions here, despite low overall accuracy. Most of the labels2 are self-explanatory: Patient (PAT), Actor (ACT), Time (TWHEN), Effect (EFF), Location (LOC), Manner (MANN), Addressee (ADDR), Extent (EXT). CPHR marks the 2http://ufal.mff.cuni.cz/∼toman/pcedt/en/functors.html LabelFreq.F1Re.Pr. recall and precision for the top-10 most frequent roles. nominal part of a complex predicate, as in “to have [a plan]CPHR”, and DIR3 indicates destination. 5.3 Additional Experiments We now evaluate the contribution of different aspects of the feature representation to the performance of the model. Table 8 contains the results for English-French. FeaturesOrigTrans ferent feature subsets, using original and transferred syntactic information. The fact that the model performs slightly better with transferred syntax may be explained by two factors. Firstly, as we already mentioned, the original syntactic annotation is also produced automatically. Secondly, in the model transfer setup it is more important how closely the syntacticsemantic interface on the target side resembles that on the source side than how well it matches the “true” structure of the target language, and in this respect a transferred dependency parser may have an advantage over one trained on target-language data. The high impact of the Glos s features here 1196 may be partly attributed to the fact that the mapping is derived from the same corpus as the evaluation data Europarl (Koehn, 2005) and partly by the similarity between English and French in terms of word order, usage of articles and prepositions. The moderate contribution of the crosslingual cluster features are likely due to the insufficient granularity of the clustering for this task. For more distant language pairs, the contributions of individual feature groups are less interpretable, so we only highlight a few observations. First of all, both EN-CZ and CZ-EN benefit noticeably from the use of the original syntactic annotation, including dependency relations, but not from the transferred syntax, most likely due to the low syntactic transfer performance. Both perform better when lexical information is available, although – – the improvement is not as significant as in the case of French only up to 5%. The situation with Chinese is somewhat complicated in that adding lexical information here fails to yield an improvement in terms of the metric considered. This is likely due to the fact that we consider only the core roles, which can usually be predicted with high accuracy based on syntactic information alone. – 6 Related Work Development of robust statistical models for core NLP tasks is a challenging problem, and adaptation of existing models to new languages presents a viable alternative to exhaustive annotation for each language. Although the models thus obtained are generally imperfect, they can be further refined for a particular language and domain using techniques such as active learning (Settles, 2010; Chen et al., 2011). Cross-lingual annotation projection (Yarowsky et al., 2001) approaches have been applied ex- tensively to a variety of tasks, including POS tagging (Xi and Hwa, 2005; Das and Petrov, 2011), morphology segmentation (Snyder and Barzilay, 2008), verb classification (Merlo et al., 2002), mention detection (Zitouni and Florian, 2008), LFG parsing (Wr o´blewska and Frank, 2009), information extraction (Kim et al., 2010), SRL (Pad o´ and Lapata, 2009; van der Plas et al., 2011; Annesi and Basili, 2010; Tonelli and Pianta, 2008), dependency parsing (Naseem et al., 2012; Ganchev et al., 2009; Smith and Eisner, 2009; Hwa et al., 2005) or temporal relation prediction (Spreyer and Frank, 2008). Interestingly, it has also been used to propagate morphosyntactic information between old and modern versions of the same language (Meyer, 2011). Cross-lingual model transfer methods (McDonald et al., 2011; Zeman and Resnik, 2008; Durrett et al., 2012; Søgaard, 2011; Lopez et al., 2008) have also been receiving much attention recently. The basic idea behind model transfer is similar to that of cross-lingual annotation projection, as we can see from the way parallel data is used in, for example, McDonald et al. (201 1). A crucial component of direct transfer approaches is the unified feature representation. There are at least two such representations of lexical information (Klementiev et al., 2012; T ¨ackstr o¨m et al., 2012), but both work on word level. This makes it hard to account for phenomena that are expressed differently in the languages considered, for example the syntactic function of a certain word may be indicated by a preposition, inflection or word order, depending on the language. Accurate representation of such information would require an extra level of abstraction (Haji ˇc, 2002). A side-effect ofusing adaptation methods is that we are forced to use the same annotation scheme for the task in question (SRL, in our case), which in turn simplifies the development of cross-lingual tools for downstream tasks. Such representations are also likely to be useful in machine translation. Unsupervised semantic role labeling methods (Lang and Lapata, 2010; Lang and Lapata, 2011; Titov and Klementiev, 2012a; Lorenzo and Cerisara, 2012) also constitute an alternative to cross-lingual model transfer. For an overview of of semi-supervised approaches we refer the reader to Titov and Klementiev (2012b). 7 Conclusion We have considered the cross-lingual model transfer approach as applied to the task of semantic role labeling and observed that for closely related languages it performs comparably to annotation projection approaches. It allows one to quickly construct an SRL model for a new language without manual annotation or language-specific heuristics, provided an accurate model is available for one of the related languages along with a certain amount of parallel data for the two languages. While an1197 notation projection approaches require sentenceand word-aligned parallel data and crucially depend on the accuracy of the syntactic parsing and SRL on the source side of the parallel corpus, cross-lingual model transfer can be performed using only a bilingual dictionary. Unsupervised SRL approaches have their advantages, in particular when no annotated data is available for any of the related languages and there is a syntactic parser available for the target one, but the annotation they produce is not always sufficient. In applications such as Information Retrieval it is preferable to have precise labels, rather than just clusters of arguments, for example. Also note that when applying cross-lingual model transfer in practice, one can improve upon the performance of the simplistic model we use for evaluation, for example by picking the features manually, taking into account the properties of the target language. Domain adaptation techniques can also be employed to adjust the model to the target language. Acknowledgments The authors would like to thank Alexandre Klementiev and Ryan McDonald for useful suggestions and T ¨ackstr o¨m et al. (2012) for sharing the cross-lingual word representations. This research is supported by the MMCI Cluster of Excellence. References Omri Abend, Roi Reichart, and Ari Rappoport. 2009. Unsupervised argument identification for semantic role labeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL ’09, pages 28–36, Stroudsburg, PA, USA. Association for Computational Linguistics. Paolo Annesi and Roberto Basili. 2010. Cross-lingual alignment of FrameNet annotations through hidden Markov models. In Proceedings of the 11th international conference on Computational Linguistics and Intelligent Text Processing, CICLing’ 10, pages 12– 25, Berlin, Heidelberg. Springer-Verlag. Roberto Basili, Diego De Cao, Danilo Croce, Bonaventura Coppola, and Alessandro Moschitti. 2009. Cross-language frame semantics transfer in bilingual corpora. In Alexander F. Gelbukh, editor, Proceedings of the 10th International Conference on Computational Linguistics and Intelligent Text Pro- cessing, pages 332–345. Anders Bj ¨orkelund, Love Hafdell, and Pierre Nugues. 2009. Multilingual semantic role labeling. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 43–48, Boulder, Colorado, June. Association for Computational Linguistics. Ming-Wei Chang, Lev Ratinov, and Dan Roth. 2007. Guiding semi-supervision with constraint-driven learning. In ACL. Chenhua Chen, Alexis Palmer, and Caroline Sporleder. 2011. Enhancing active learning for semantic role labeling via compressed dependency trees. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 183–191, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing. Dipanjan Das and Slav Petrov. 2011. Unsupervised part-of-speech tagging with bilingual graph-based projections. Proceedings of the Association for Computational Linguistics. Greg Durrett, Adam Pauls, and Dan Klein. 2012. Syntactic transfer using a bilingual lexicon. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 1–1 1, Jeju Island, Korea, July. Association for Computational Linguistics. Andreas Eisele and Yu Chen. 2010. MultiUN: A multilingual corpus from United Nation documents. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10). European Language Resources Association (ELRA). Rong-En Fan, Kai-Wei Chang, Cho-Jui Hsieh, XiangRui Wang, and Chih-Jen Lin. 2008. LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9: 1871–1874. Kuzman Ganchev, Jennifer Gillenwater, and Ben Taskar. 2009. Dependency grammar induction via bitext projection constraints. In Proceedings of the 47th Annual Meeting of the ACL, pages 369–377, Stroudsburg, PA, USA. Association for Computational Linguistics. Qin Gao and Stephan Vogel. 2011. Corpus expansion for statistical machine translation with semantic role label substitution rules. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 294–298, Portland, Oregon, USA. Trond Grenager and Christopher D. Manning. 2006. Unsupervised discovery of a statistical verb lexicon. In Proceedings of EMNLP. Jan Haji cˇ. 2002. Tectogrammatical representation: Towards a minimal transfer in machine translation. In Robert Frank, editor, Proceedings of the 6th International Workshop on Tree Adjoining Grammars 1198 and Related Frameworks (TAG+6), pages 216— 226, Venezia. Universita di Venezia. Jan Haji cˇ, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Ant o`nia Mart ı´, Llu ı´s M `arquez, Adam Meyers, Joakim Nivre, Sebastian Pad o´, Jan Sˇt eˇp a´nek, Pavel Stra nˇ a´k, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. The CoNLL2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado. Jan Haji cˇ, Eva Haji cˇov a´, Jarmila Panevov a´, Petr Sgall, Ond ˇrej Bojar, Silvie Cinkov´ a, Eva Fuˇ c ´ıkov a´, Marie Mikulov a´, Petr Pajas, Jan Popelka, Ji ˇr´ ı Semeck´ y, Jana Sˇindlerov a´, Jan Sˇt eˇp a´nek, Josef Toman, Zde nˇka Ure sˇov a´, and Zden eˇk Zˇabokrtsk y´. 2012. Announcing Prague Czech-English dependency treebank 2.0. In Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Mehmet U gˇur Doˇ gan, Bente Maegaard, Joseph Mariani, Jan Odijk, and Stelios Piperidis, editors, Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, May. European Language Resources Association (ELRA). Rebecca Hwa, Philip Resnik, Amy Weinberg, Clara Cabezas, and Okan Kolak. 2005. Bootstrapping parsers via syntactic projection across parallel text. Natural Language Engineering, 11(3):3 11–325. Richard Johansson and Pierre Nugues. 2008. Dependency-based semantic role labeling of PropBank. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 69–78, Honolulu, Hawaii. Michael Kaisser and Bonnie Webber. 2007. Question answering based on semantic roles. In ACL Workshop on Deep Linguistic Processing. Seokhwan Kim, Minwoo Jeong, Jonghoon Lee, and Gary Geunbae Lee. 2010. A cross-lingual annotation projection approach for relation detection. In Proceedings of the 23rd International Conference on Computational Linguistics, COLING ’ 10, pages 564–571, Stroudsburg, PA, USA. Association for Computational Linguistics. Paul Kingsbury, Nianwen Xue, and Martha Palmer. 2004. Propbanking in parallel. In In Proceedings of the Workshop on the Amazing Utility of Parallel and Comparable Corpora, in conjunction with LREC’04. Alexandre Klementiev, Ivan Titov, and Binod Bhattarai. 2012. Inducing crosslingual distributed representations of words. In Proceedings of the International Conference on Computational Linguistics (COLING), Bombay, India. Philipp Koehn. 2005. Europarl: A parallel corpus for statistical machine translation. In Conference Proceedings: the tenth Machine Translation Summit, pages 79–86, Phuket, Thailand. AAMT. Joel Lang and Mirella Lapata. 2010. Unsupervised induction of semantic roles. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 939–947, Los Angeles, California, June. Association for Computational Linguistics. Joel Lang and Mirella Lapata. 2011. Unsupervised semantic role induction via split-merge clustering. In Proc. of Annual Meeting of the Association for Computational Linguistics (ACL). Ding Liu and Daniel Gildea. 2010. Semantic role features for machine translation. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), Beijing, China. Adam Lopez, Daniel Zeman, Michael Nossal, Philip Resnik, and Rebecca Hwa. 2008. Cross-language parser adaptation between related languages. In IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 35–42, Hyderabad, India, January. Alejandra Lorenzo and Christophe Cerisara. 2012. Unsupervised frame based semantic role induction: application to French and English. In Proceedings of the ACL 2012 Joint Workshop on Statistical Parsing and Semantic Processing of Morphologically Rich Languages, pages 30–35, Jeju, Republic of Korea, July. Association for Computational Linguistics. Ryan McDonald, Slav Petrov, and Keith Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 11, pages 62–72, Stroudsburg, PA, USA. Association for Computational Linguistics. Paola Merlo, Suzanne Stevenson, Vivian Tsang, and Gianluca Allaria. 2002. A multi-lingual paradigm for automatic verb classification. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL’02), pages 207– 214, Philadelphia, PA. Roland Meyer. 2011. New wine in old wineskins?– Tagging old Russian via annotation projection from modern translations. Russian Linguistics, 35(2):267(15). Tahira Naseem, Regina Barzilay, and Amir Globerson. 2012. Selective sharing for multilingual dependency parsing. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, pages 629–637, Jeju Island, Korea, July. Association for Computational Linguistics. Joakim Nivre. 2008. Algorithms for deterministic incremental dependency parsing. Comput. Linguist., 34(4):513–553, December. 1199 Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). Sebastian Pad o´ and Mirella Lapata. 2009. Crosslingual annotation projection for semantic roles. Journal of Artificial Intelligence Research, 36:307– 340. Martha Palmer, Daniel Gildea, and Paul Kingsbury. 2005. The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31:71–105. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of LREC, May. Mark Sammons, Vinod Vydiswaran, Tim Vieira, Nikhil Johri, Ming wei Chang, Dan Goldwasser, Vivek Srikumar, Gourab Kundu, Yuancheng Tu, Kevin Small, Joshua Rule, Quang Do, and Dan Roth. 2009. Relation alignment for textual entailment recognition. In Text Analysis Conference (TAC). Burr Settles. 2010. Active learning literature survey. Computer Sciences Technical Report, 1648. Dan Shen and Mirella Lapata. 2007. Using semantic roles to improve question answering. In EMNLP. David A Smith and Jason Eisner. 2009. Parser adaptation and projection with quasi-synchronous grammar features. In Proceedings of the 2009 Confer- ence on Empirical Methods in Natural Language Processing, pages 822–831. Association for Computational Linguistics. Benjamin Snyder and Regina Barzilay. 2008. Crosslingual propagation for morphological analysis. In Proceedings of the 23rd national conference on Artificial intelligence. Anders Søgaard. 2011. Data point selection for crosslanguage adaptation of dependency parsers. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, volume 2 of HLT ’11, pages 682–686, Stroudsburg, PA, USA. Association for Computational Linguistics. Kathrin Spreyer and Anette Frank. 2008. Projectionbased acquisition of a temporal labeller. Proceedings of IJCNLP 2008. Oscar T¨ ackstr o¨m, Ryan McDonald, and Jakob Uszkoreit. 2012. Cross-lingual word clusters for direct transfer of linguistic structure. In Proc. of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), pages 477– 487, Montr ´eal, Canada. Cynthia A. Thompson, Roger Levy, and Christopher D. Manning. 2003. A generative model for seman- tic role labeling. In Proceedings of the 14th European Conference on Machine Learning, ECML 2003, pages 397–408, Dubrovnik, Croatia. Ivan Titov and Alexandre Klementiev. 2012a. A Bayesian approach to unsupervised semantic role induction. In Proc. of European Chapter of the Association for Computational Linguistics (EACL). Ivan Titov and Alexandre Klementiev. 2012b. Semisupervised semantic role labeling: Approaching from an unsupervised perspective. In Proceedings of the International Conference on Computational Linguistics (COLING), Bombay, India, December. Sara Tonelli and Emanuele Pianta. 2008. Frame information transfer from English to Italian. In Proceedings of LREC 2008. Lonneke van der Plas, James Henderson, and Paola Merlo. 2009. Domain adaptation with artificial data for semantic parsing of speech. In Proc. 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 125–128, Boulder, Colorado. Lonneke van der Plas, Paola Merlo, and James Henderson. 2011. Scaling up automatic cross-lingual semantic role annotation. In Proceedings of the 49th Annual Meeting of the Association for Computa- tional Linguistics: Human Language Technologies, HLT ’ 11, pages 299–304, Stroudsburg, PA, USA. Association for Computational Linguistics. Alina Wr o´blewska and Anette Frank. 2009. Crosslingual projection of LFG F-structures: Building an F-structure bank for Polish. In Eighth International Workshop on Treebanks and Linguistic Theories, page 209. Dekai Wu and Pascale Fung. 2009. Can semantic role labeling improve SMT? In Proceedings of 13th Annual Conference of the European Association for Machine Translation (EAMT 2009), Barcelona. Chenhai Xi and Rebecca Hwa. 2005. A backoff model for bootstrapping resources for non-English languages. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 85 1–858, Stroudsburg, PA, USA. David Yarowsky, Grace Ngai, and Ricahrd Wicentowski. 2001. Inducing multilingual text analysis tools via robust projection across aligned corpora. In Proceedings of Human Language Technology Conference. Daniel Zeman and Philip Resnik. 2008. Crosslanguage parser adaptation between related lan- guages. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages, pages 35– 42, Hyderabad, India, January. Asian Federation of Natural Language Processing. Imed Zitouni and Radu Florian. 2008. Mention detection crossing the language barrier. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1200
6 0.097004168 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates
7 0.094305657 269 acl-2013-PLIS: a Probabilistic Lexical Inference System
8 0.089282006 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
9 0.08545541 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
10 0.076869279 352 acl-2013-Towards Accurate Distant Supervision for Relational Facts Extraction
11 0.075205594 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
12 0.073196478 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
13 0.072129384 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
14 0.069347523 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
15 0.06803859 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
16 0.06734743 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
17 0.066390634 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
18 0.066094145 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling
19 0.06604214 267 acl-2013-PARMA: A Predicate Argument Aligner
20 0.064961039 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
topicId topicWeight
[(0, 0.164), (1, 0.044), (2, -0.054), (3, -0.102), (4, -0.032), (5, 0.074), (6, -0.074), (7, 0.011), (8, 0.032), (9, 0.058), (10, 0.023), (11, 0.02), (12, -0.024), (13, 0.019), (14, -0.002), (15, -0.091), (16, 0.099), (17, -0.011), (18, 0.084), (19, -0.062), (20, 0.075), (21, 0.009), (22, -0.098), (23, 0.105), (24, -0.053), (25, 0.229), (26, -0.144), (27, 0.004), (28, 0.066), (29, -0.33), (30, -0.041), (31, 0.156), (32, 0.09), (33, -0.098), (34, -0.041), (35, -0.067), (36, -0.005), (37, -0.161), (38, -0.117), (39, -0.117), (40, 0.043), (41, -0.016), (42, -0.003), (43, 0.164), (44, -0.04), (45, 0.049), (46, 0.107), (47, 0.104), (48, 0.022), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.97418642 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
Author: Hen-Hsen Huang ; Kai-Chun Chang ; Hsin-Hsi Chen
Abstract: This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings.
2 0.93510306 297 acl-2013-Recognizing Partial Textual Entailment
Author: Omer Levy ; Torsten Zesch ; Ido Dagan ; Iryna Gurevych
Abstract: Textual entailment is an asymmetric relation between two text fragments that describes whether one fragment can be inferred from the other. It thus cannot capture the notion that the target fragment is “almost entailed” by the given text. The recently suggested idea of partial textual entailment may remedy this problem. We investigate partial entailment under the faceted entailment model and the possibility of adapting existing textual entailment methods to this setting. Indeed, our results show that these methods are useful for rec- ognizing partial entailment. We also provide a preliminary assessment of how partial entailment may be used for recognizing (complete) textual entailment.
3 0.91131997 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations
Author: Kimi Kaneko ; Yusuke Miyao ; Daisuke Bekki
Abstract: This paper proposes a methodology for generating specialized Japanese data sets for textual entailment, which consists of pairs decomposed into basic sentence relations. We experimented with our methodology over a number of pairs taken from the RITE-2 data set. We compared our methodology with existing studies in terms of agreement, frequencies and times, and we evaluated its validity by investigating recognition accuracy.
4 0.7519325 202 acl-2013-Is a 204 cm Man Tall or Small ? Acquisition of Numerical Common Sense from the Web
Author: Katsuma Narisawa ; Yotaro Watanabe ; Junta Mizuno ; Naoaki Okazaki ; Kentaro Inui
Abstract: This paper presents novel methods for modeling numerical common sense: the ability to infer whether a given number (e.g., three billion) is large, small, or normal for a given context (e.g., number of people facing a water shortage). We first discuss the necessity of numerical common sense in solving textual entailment problems. We explore two approaches for acquiring numerical common sense. Both approaches start with extracting numerical expressions and their context from the Web. One approach estimates the distribution ofnumbers co-occurring within a context and examines whether a given value is large, small, or normal, based on the distri- bution. Another approach utilizes textual patterns with which speakers explicitly expresses their judgment about the value of a numerical expression. Experimental results demonstrate the effectiveness of both approaches.
5 0.61418444 269 acl-2013-PLIS: a Probabilistic Lexical Inference System
Author: Eyal Shnarch ; Erel Segal-haLevi ; Jacob Goldberger ; Ido Dagan
Abstract: This paper presents PLIS, an open source Probabilistic Lexical Inference System which combines two functionalities: (i) a tool for integrating lexical inference knowledge from diverse resources, and (ii) a framework for scoring textual inferences based on the integrated knowledge. We provide PLIS with two probabilistic implementation of this framework. PLIS is available for download and developers of text processing applications can use it as an off-the-shelf component for injecting lexical knowledge into their applications. PLIS is easily configurable, components can be extended or replaced with user generated ones to enable system customization and further research. PLIS includes an online interactive viewer, which is a powerful tool for investigating lexical inference processes. 1 Introduction and background Semantic Inference is the process by which machines perform reasoning over natural language texts. A semantic inference system is expected to be able to infer the meaning of one text from the meaning of another, identify parts of texts which convey a target meaning, and manipulate text units in order to deduce new meanings. Semantic inference is needed for many Natural Language Processing (NLP) applications. For instance, a Question Answering (QA) system may encounter the following question and candidate answer (Example 1): Q: which explorer discovered the New World? A: Christopher Columbus revealed America. As there are no overlapping words between the two sentences, to identify that A holds an answer for Q, background world knowledge is needed to link Christopher Columbus with explorer and America with New World. Linguistic knowledge is also needed to identify that reveal and discover refer to the same concept. Knowledge is needed in order to bridge the gap between text fragments, which may be dissimilar on their surface form but share a common meaning. For the purpose of semantic inference, such knowledge can be derived from various resources (e.g. WordNet (Fellbaum, 1998) and others, detailed in Section 2.1) in a form which we denote as inference links (often called inference/entailment rules), each is an ordered pair of elements in which the first implies the meaning of the second. For instance, the link ship→vessel can be derived from tshtaen hypernym rkel sahtiiopn→ ovfe Wsseolr cdNanet b. Other applications can benefit from utilizing inference links to identify similarity between language expressions. In Information Retrieval, the user’s information need may be expressed in relevant documents differently than it is expressed in the query. Summarization systems should identify text snippets which convey the same meaning. Our work addresses a generic, application in- dependent, setting of lexical inference. We therefore adopt the terminology of Textual Entailment (Dagan et al., 2006), a generic paradigm for applied semantic inference which captures inference needs of many NLP applications in a common underlying task: given two textual fragments, termed hypothesis (H) and text (T), the task is to recognize whether T implies the meaning of H, denoted T→H. For instance, in a QA application, H reprTe→seHnts. Fthoer question, a innd a T Q a c aanpdpilidcaattei answer. pInthis setting, T is likely to hold an answer for the question if it entails the question. It is challenging to properly extract the needed inference knowledge from available resources, and to effectively utilize it within the inference process. The integration of resources, each has its own format, is technically complex and the quality 97 ProceedingSsof oiaf, th Beu 5lg1asrtia A,n Anuuaglu Mst 4ee-9tin 2g0 o1f3. th ?ec A20ss1o3ci Aastisoonci faotrio Cno fomrp Cuotamtipountaalti Loinnaglu Lisitnigcsu,is patigcess 97–102, Figure 1: PLIS schema - a text-hypothesis pair is processed by the Lexical Integrator which uses a set of lexical resources to extract inference chains which connect the two. The Lexical Inference component provides probability estimations for the validity of each level of the process. ofthe resulting inference links is often unknown in advance and varies considerably. For coping with this challenge we developed PLIS, a Probabilistic Lexical Inference System1 . PLIS, illustrated in Fig 1, has two main modules: the Lexical Integra- tor (Section 2) accepts a set of lexical resources and a text-hypothesis pair, and finds all the lexical inference relations between any pair of text term ti and hypothesis term hj, based on the available lexical relations found in the resources (and their combination). The Lexical Inference module (Section 3) provides validity scores for these relations. These term-level scores are used to estimate the sentence-level likelihood that the meaning of the hypothesis can be inferred from the text, thus making PLIS a complete lexical inference system. Lexical inference systems do not look into the structure of texts but rather consider them as bag ofterms (words or multi-word expressions). These systems are easy to implement, fast to run, practical across different genres and languages, while maintaining a competitive level of performance. PLIS can be used as a stand-alone efficient inference system or as the lexical component of any NLP application. PLIS is a flexible system, allowing users to choose the set of knowledge resources as well as the model by which inference 1The complete software package is available at http:// www.cs.biu.ac.il/nlp/downloads/PLIS.html and an online interactive viewer is available for examination at http://irsrv2. cs.biu.ac.il/nlp-net/PLIS.html. is done. PLIS can be easily extended with new knowledge resources and new inference models. It comes with a set of ready-to-use plug-ins for many common lexical resources (Section 2.1) as well as two implementation of the scoring framework. These implementations, described in (Shnarch et al., 2011; Shnarch et al., 2012), provide probability estimations for inference. PLIS has an interactive online viewer (Section 4) which provides a visualization of the entire inference process, and is very helpful for analysing lexical inference models and lexical resources usability. 2 Lexical integrator The input for the lexical integrator is a set of lexical resources and a pair of text T and hypothesis H. The lexical integrator extracts lexical inference links from the various lexical resources to connect each text term ti ∈ T with each hypothesis term hj ∈ H2. A lexical i∈nfTer wenicthe elianckh hinydpicoathteess a semantic∈ rHelation between two terms. It could be a directional relation (Columbus→navigator) or a bai ddiirreeccttiioonnaall one (car ←→ automobile). dSirinecceti knowledge resources vary lien) their representation methods, the lexical integrator wraps each lexical resource in a common plug-in interface which encapsulates resource’s inner representation method and exposes its knowledge as a list of inference links. The implemented plug-ins that come with PLIS are described in Section 2.1. Adding a new lexical resource and integrating it with the others only demands the implementation of the plug-in interface. As the knowledge needed to connect a pair of terms, ti and hj, may be scattered across few resources, the lexical integrator combines inference links into lexical inference chains to deduce new pieces of knowledge, such as Columbus −r −e −so −u −rc −e →2 −r −e −so −u −rc −e →1 navigator explorer. Therefore, the only assumption −t −he − l−e −x →ica elx integrator makes, regarding its input lexical resources, is that the inferential lexical relations they provide are transitive. The lexical integrator generates lexical infer- ence chains by expanding the text and hypothesis terms with inference links. These links lead to new terms (e.g. navigator in the above chain example and t0 in Fig 1) which can be further expanded, as all inference links are transitive. A transitivity 2Where iand j run from 1 to the length of the text and hypothesis respectively. 98 limit is set by the user to determine the maximal length for inference chains. The lexical integrator uses a graph-based representation for the inference chains, as illustrates in Fig 1. A node holds the lemma, part-of-speech and sense of a single term. The sense is the ordinal number of WordNet sense. Whenever we do not know the sense of a term we implement the most frequent sense heuristic.3 An edge represents an inference link and is labeled with the semantic relation of this link (e.g. cytokine→protein is larbeellaetdio wni othf tt hheis sW linokrd (Nee.gt .re clayttiookni hypernym). 2.1 Available plug-ins for lexical resources We have implemented plug-ins for the follow- ing resources: the English lexicon WordNet (Fellbaum, 1998)(based on either JWI, JWNL or extJWNL java APIs4), CatVar (Habash and Dorr, 2003), a categorial variations database, Wikipedia-based resource (Shnarch et al., 2009), which applies several extraction methods to derive inference links from the text and structure of Wikipedia, VerbOcean (Chklovski and Pantel, 2004), a knowledge base of fine-grained semantic relations between verbs, Lin’s distributional similarity thesaurus (Lin, 1998), and DIRECT (Kotlerman et al., 2010), a directional distributional similarity thesaurus geared for lexical inference. To summarize, the lexical integrator finds all possible inference chains (of a predefined length), resulting from any combination of inference links extracted from lexical resources, which link any t, h pair of a given text-hypothesis. Developers can use this tool to save the hassle of interfacing with the different lexical knowledge resources, and spare the labor of combining their knowledge via inference chains. The lexical inference model, described next, provides a mean to decide whether a given hypothesis is inferred from a given text, based on weighing the lexical inference chains extracted by the lexical integrator. 3 Lexical inference There are many ways to implement an inference model which identifies inference relations between texts. A simple model may consider the 3This disambiguation policy was better than considering all senses of an ambiguous term in preliminary experiments. However, it is a matter of changing a variable in the configuration of PLIS to switch between these two policies. 4http://wordnet.princeton.edu/wordnet/related-projects/ number of hypothesis terms for which inference chains, originated from text terms, were found. In PLIS, the inference model is a plug-in, similar to the lexical knowledge resources, and can be easily replaced to change the inference logic. We provide PLIS with two implemented baseline lexical inference models which are mathematically based. These are two Probabilistic Lexical Models (PLMs), HN-PLM and M-PLM which are described in (Shnarch et al., 2011; Shnarch et al., 2012) respectively. A PLM provides probability estimations for the three parts of the inference process (as shown in Fig 1): the validity probability of each inference chain (i.e. the probability for a valid inference relation between its endpoint terms) P(ti → hj), the probability of each hypothesis term to →b e i hnferred by the entire text P(T → hj) (term-level probability), eanntdir teh tee probability o hf the entire hypothesis to be inferred by the text P(T → H) (sentencelteov eble probability). HN-PLM describes a generative process by which the hypothesis is generated from the text. Its parameters are the reliability level of each of the resources it utilizes (that is, the prior probability that applying an arbitrary inference link derived from each resource corresponds to a valid inference). For learning these parameters HN-PLM applies a schema of the EM algorithm (Dempster et al., 1977). Its performance on the recognizing textual entailment task, RTE (Bentivogli et al., 2009; Bentivogli et al., 2010), are in line with the state of the art inference systems, including complex systems which perform syntactic analysis. This model is improved by M-PLM, which deduces sentence-level probability from term-level probabilities by a Markovian process. PLIS with this model was used for a passage retrieval for a question answering task (Wang et al., 2007), and outperformed state of the art inference systems. Both PLMs model the following prominent aspects of the lexical inference phenomenon: (i) considering the different reliability levels of the input knowledge resources, (ii) reducing inference chain probability as its length increases, and (iii) increasing term-level probability as we have more inference chains which suggest that the hypothesis term is inferred by the text. Both PLMs only need sentence-level annotations from which they derive term-level inference probabilities. To summarize, the lexical inference module 99 ?(? → ?) Figure 2: PLIS interactive viewer with Example 1 demonstrates knowledge integration of multiple inference chains and resource combination (additional explanations, which are not part of the demo, are provided in orange). provides the setting for interfacing with the lexical integrator. Additionally, the module provides the framework for probabilistic inference models which estimate term-level probabilities and integrate them into a sentence-level inference decision, while implementing prominent aspects of lexical inference. The user can choose to apply another inference logic, not necessarily probabilistic, by plugging a different lexical inference model into the provided inference infrastructure. 4 The PLIS interactive system PLIS comes with an online interactive viewer5 in which the user sets the parameters of PLIS, inserts a text-hypothesis pair and gets a visualization of the entire inference process. This is a powerful tool for investigating knowledge integration and lexical inference models. Fig 2 presents a screenshot of the processing of Example 1. On the right side, the user configures the system by selecting knowledge resources, adjusting their configuration, setting the transitivity limit, and choosing the lexical inference model to be applied by PLIS. After inserting a text and a hypothesis to the appropriate text boxes, the user clicks on the infer button and PLIS generates all lexical inference chains, of length up to the transitivity limit, that connect text terms with hypothesis terms, as available from the combination of the selected input re5http://irsrv2.cs.biu.ac.il/nlp-net/PLIS.html sources. Each inference chain is presented in a line between the text and hypothesis. PLIS also displays the probability estimations for all inference levels; the probability of each chain is presented at the end of its line. For each hypothesis term, term-level probability, which weighs all inference chains found for it, is given below the dashed line. The overall sentence-level probability integrates the probabilities of all hypothesis terms and is displayed in the box at the bottom right corner. Next, we detail the inference process of Example 1, as presented in Fig 2. In this QA example, the probability of the candidate answer (set as the text) to be relevant for the given question (the hypothesis) is estimated. When utilizing only two knowledge resources (WordNet and Wikipedia), PLIS is able to recognize that explorer is inferred by Christopher Columbus and that New World is inferred by America. Each one of these pairs has two independent inference chains, numbered 1–4, as evidence for its inference relation. Both inference chains 1 and 3 include a single inference link, each derived from a different relation of the Wikipedia-based resource. The inference model assigns a higher probability for chain 1since the BeComp relation is much more reliable than the Link relation. This comparison illustrates the ability of the inference model to learn how to differ knowledge resources by their reliability. Comparing the probability assigned by the in100 ference model for inference chain 2 with the probabilities assigned for chains 1 and 3, reveals the sophisticated way by which the inference model integrates lexical knowledge. Inference chain 2 is longer than chain 1, therefore its probability is lower. However, the inference model assigns chain 2 a higher probability than chain 3, even though the latter is shorter, since the model is sensitive enough to consider the difference in reliability levels between the two highly reliable hypernym relations (from WordNet) of chain 2 and the less reliable Link relation (from Wikipedia) of chain 3. Another aspect of knowledge integration is exemplified in Fig 2 by the three circled probabilities. The inference model takes into consideration the multiple pieces of evidence for the inference of New World (inference chains 3 and 4, whose probabilities are circled). This results in a termlevel probability estimation for New World (the third circled probability) which is higher than the probabilities of each chain separately. The third term of the hypothesis, discover, remains uncovered by the text as no inference chain was found for it. Therefore, the sentence-level inference probability is very low, 37%. In order to identify that the hypothesis is indeed inferred from the text, the inference model should be provided with indications for the inference of discover. To that end, the user may increase the transitivity limit in hope that longer inference chains provide the needed information. In addition, the user can examine other knowledge resources in search for the missing inference link. In this example, it is enough to add VerbOcean to the input of PLIS to expose two inference chains which connect reveal with discover by combining an inference link from WordNet and another one from VerbOcean. With this additional information, the sentence-level probability increases to 76%. This is a typical scenario of utilizing PLIS, either via the interactive system or via the software, for analyzing the usability of the different knowledge resources and their combination. A feature of the interactive system, which is useful for lexical resources analysis, is that each term in a chain is clickable and links to another screen which presents all the terms that are inferred from it and those from which it is inferred. Additionally, the interactive system communicates with a server which runs PLIS, in a fullduplex WebSocket connection6. This mode of operation is publicly available and provides a method for utilizing PLIS, without having to install it or the lexical resources it uses. Finally, since PLIS is a lexical system it can easily be adjusted to other languages. One only needs to replace the basic lexical text processing tools and plug in knowledge resources in the target language. If PLIS is provided with bilingual resources,7 it can operate also as a cross-lingual inference system (Negri et al., 2012). For instance, the text in Fig 3 is given in English, while the hypothesis is written in Spanish (given as a list of lemma:part-of-speech). The left side of the figure depicts a cross-lingual inference process in which the only lexical knowledge resource used is a man- ually built English-Spanish dictionary. As can be seen, two Spanish terms, jugador and casa remain uncovered since the dictionary alone cannot connect them to any of the English terms in the text. As illustrated in the right side of Fig 3, PLIS enables the combination of the bilingual dictionary with monolingual resources to produce cross-lingual inference chains, such as footballer−h −y −p −er−n y −m →player− −m −a −nu − →aljugador. Such inferenc−e − c−h −a −in − →s hpalavey trh− e− capability otro. overcome monolingual language variability (the first link in this chain) as well as to provide cross-lingual translation (the second link). 5 Conclusions To utilize PLIS one should gather lexical resources, obtain sentence-level annotations and train the inference model. Annotations are available in common data sets for task such as QA, Information Retrieval (queries are hypotheses and snippets are texts) and Student Response Analysis (reference answers are the hypotheses that should be inferred by the student answers). For developers of NLP applications, PLIS offers a ready-to-use lexical knowledge integrator which can interface with many common lexical knowledge resources and constructs lexical inference chains which combine the knowledge in them. A developer who wants to overcome lexical language variability, or to incorporate background knowledge, can utilize PLIS to inject lex6We used the socket.io implementation. 7A bilingual resource holds inference links which connect terms in different languages (e.g. an English-Spanish dictionary can provide the inference link explorer→explorador). 101 Figure 3 : PLIS as a cross-lingual inference system. Left: the process with a single manual bilingual resource. Right: PLIS composes cross-lingual inference chains to increase hypothesis coverage and increase sentence-level inference probability. ical knowledge into any text understanding application. PLIS can be used as a lightweight inference system or as the lexical component of larger, more complex inference systems. Additionally, PLIS provides scores for infer- ence chains and determines the way to combine them in order to recognize sentence-level inference. PLIS comes with two probabilistic lexical inference models which achieved competitive performance levels in the tasks of recognizing textual entailment and passage retrieval for QA. All aspects of PLIS are configurable. The user can easily switch between the built-in lexical resources, inference models and even languages, or extend the system with additional lexical resources and new inference models. Acknowledgments The authors thank Eden Erez for his help with the interactive viewer and Miquel Espl a` Gomis for the bilingual dictionaries. This work was partially supported by the European Community’s 7th Framework Programme (FP7/2007-2013) under grant agreement no. 287923 (EXCITEMENT) and the Israel Science Foundation grant 880/12. References Luisa Bentivogli, Ido Dagan, Hoa Trang Dang, Danilo Giampiccolo, and Bernardo Magnini. 2009. The fifth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Luisa Bentivogli, Peter Clark, Ido Dagan, Hoa Trang Dang, and Danilo Giampiccolo. 2010. The sixth PASCAL recognizing textual entailment challenge. In Proc. of TAC. Timothy Chklovski and Patrick Pantel. 2004. VerbOcean: Mining the web for fine-grained semantic verb relations. In Proc. of EMNLP. Ido Dagan, Oren Glickman, and Bernardo Magnini. 2006. The PASCAL recognising textual entailment challenge. In Lecture Notes in Computer Science, volume 3944, pages 177–190. A. P. Dempster, N. M. Laird, and D. B. Rubin. 1977. Maximum likelihood from incomplete data via the EM algorithm. Journal of the royal statistical society, series [B], 39(1): 1–38. Christiane Fellbaum, editor. 1998. WordNet: An Electronic Lexical Database. MIT Press, Cambridge, Massachusetts. Nizar Habash and Bonnie Dorr. 2003. A categorial variation database for English. In Proc. of NAACL. Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering, 16(4):359–389. Dekang Lin. 1998. Automatic retrieval and clustering of similar words. In Proc. of COLOING-ACL. Matteo Negri, Alessandro Marchetti, Yashar Mehdad, Luisa Bentivogli, and Danilo Giampiccolo. 2012. Semeval-2012 task 8: Cross-lingual textual entailment for content synchronization. In Proc. of SemEval. Eyal Shnarch, Libby Barak, and Ido Dagan. 2009. Extracting lexical reference rules from Wikipedia. In Proc. of ACL. Eyal Shnarch, Jacob Goldberger, and Ido Dagan. 2011. Towards a probabilistic model for lexical entailment. In Proc. of the TextInfer Workshop. Eyal Shnarch, Ido Dagan, and Jacob Goldberger. 2012. A probabilistic lexical model for ranking textual inferences. In Proc. of *SEM. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy model? A quasisynchronous grammar for QA. In Proc. of EMNLP. 102
6 0.42611799 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
7 0.3966319 365 acl-2013-Understanding Tables in Context Using Standard NLP Toolkits
8 0.36885655 339 acl-2013-Temporal Signals Help Label Temporal Relations
9 0.36733651 61 acl-2013-Automatic Interpretation of the English Possessive
10 0.36560073 237 acl-2013-Margin-based Decomposed Amortized Inference
11 0.35658315 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
12 0.34199327 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
13 0.32647243 242 acl-2013-Mining Equivalent Relations from Linked Data
14 0.31029984 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
15 0.30923983 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling
16 0.30916476 41 acl-2013-Aggregated Word Pair Features for Implicit Discourse Relation Disambiguation
17 0.30561808 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
18 0.30273616 265 acl-2013-Outsourcing FrameNet to the Crowd
19 0.29809114 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation
20 0.29744384 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic
topicId topicWeight
[(0, 0.073), (2, 0.097), (6, 0.035), (11, 0.256), (15, 0.016), (24, 0.034), (26, 0.064), (35, 0.078), (42, 0.037), (48, 0.054), (70, 0.051), (88, 0.025), (90, 0.027), (95, 0.063)]
simIndex simValue paperId paperTitle
1 0.94348276 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
Author: Francesco Sartorio ; Giorgio Satta ; Joakim Nivre
Abstract: We present a novel transition-based, greedy dependency parser which implements a flexible mix of bottom-up and top-down strategies. The new strategy allows the parser to postpone difficult decisions until the relevant information becomes available. The novel parser has a ∼12% error reduction in unlabeled attach∼ment score over an arc-eager parser, with a slow-down factor of 2.8.
Author: Tirthankar Dasgupta
Abstract: In this work we present psycholinguistically motivated computational models for the organization and processing of Bangla morphologically complex words in the mental lexicon. Our goal is to identify whether morphologically complex words are stored as a whole or are they organized along the morphological line. For this, we have conducted a series of psycholinguistic experiments to build up hypothesis on the possible organizational structure of the mental lexicon. Next, we develop computational models based on the collected dataset. We observed that derivationally suffixed Bangla words are in general decomposed during processing and compositionality between the stem . and the suffix plays an important role in the decomposition process. We observed the same phenomena for Bangla verb sequences where experiments showed noncompositional verb sequences are in general stored as a whole in the ML and low traces of compositional verbs are found in the mental lexicon. 1 IInnttrroodduuccttiioonn Mental lexicon is the representation of the words in the human mind and their associations that help fast retrieval and comprehension (Aitchison, 1987). Words are known to be associated with each other in terms of, orthography, phonology, morphology and semantics. However, the precise nature of these relations is unknown. An important issue that has been a subject of study for a long time is to identify the fundamental units in terms of which the mental lexicon is i itkgp .ernet . in organized. That is, whether lexical representations in the mental lexicon are word based or are they organized along morphological lines. For example, whether a word such as “unimaginable” is stored in the mental lexicon as a whole word or do we break it up “un-” , “imagine” and “able”, understand the meaning of each of these constituent and then recombine the units to comprehend the whole word. Such questions are typically answered by designing appropriate priming experiments (Marslen-Wilson et al., 1994) or other lexical decision tasks. The reaction time of the subjects for recognizing various lexical items under appropriate conditions reveals important facts about their organization in the brain. (See Sec. 2 for models of morphological organization and access and related experiments). A clear understanding of the structure and the processing mechanism of the mental lexicon will further our knowledge of how the human brain processes language. Further, these linguistically important and interesting questions are also highly significant for computational linguistics (CL) and natural language processing (NLP) applications. Their computational significance arises from the issue of their storage in lexical resources like WordNet (Fellbaum, 1998) and raises the questions like, how to store morphologically complex words, in a lexical resource like WordNet keeping in mind the storage and access efficiency. There is a rich literature on organization and lexical access of morphologically complex words where experiments have been conducted mainly for derivational suffixed words of English, Hebrew, Italian, French, Dutch, and few other languages (Marslen-Wilson et al., 2008; Frost et al., 1997; Grainger, et al., 1991 ; Drews and Zwitserlood, 1995). However, we do not know of any such investigations for Indian languages, which 123 Sofia, BuPrlgoacreiead, iAngusgu osft 4h-e9 A 2C01L3 S.tu ?c d2en0t1 3Re Ases aorc hiat Wio nrk fsohro Cp,om papguesta 1ti2o3n–a1l2 L9in,guistics are morphologically richer than many of their Indo-European cousins. Moreover, Indian languages show some distinct phenomena like, compound and composite verbs for which no such investigations have been conducted yet. On the other hand, experiments indicate that mental representation and processing of morphologically complex words are not quite language independent (Taft, 2004). Therefore, the findings from experiments in one language cannot be generalized to all languages making it important to conduct similar experimentations in other languages. This work aims to design cognitively motivated computational models that can explain the organization and processing of Bangla morphologically complex words in the mental lexicon. Presently we will concentrate on the following two aspects: OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa PPo o l yy-mmoorrpphheemmiicc wwoorrddss:: our objective here is to determine whether the mental lexicon decomposes morphologically complex words into its constituent morphemes or does it represent the unanalyzed surface form of a word. OOrrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoomm-ppoouunndd vveerrbbss ((CCVV)) :: compound verbs are the subject of much debate in linguistic theory. No consensus has been reached yet with respect to the issue that whether to consider them as unitary lexical units or are they syntactically assembled combinations of two independent lexical units. As linguistic arguments have so far not led to a consensus, we here use cognitive experiments to probe the brain signatures of verb-verb combinations and propose cognitive as well as computational models regarding the possible organization and processing of Bangla CVs in the mental lexicon (ML). With respect to this, we apply the different priming and other lexical decision experiments, described in literature (Marslen-Wilson et al., 1994; Bentin, S. and Feldman, 1990) specifically for derivationally suffixed polymorphemic words and compound verbs of Bangla. Our cross-modal and masked priming experiment on Bangla derivationally suffixed words shows that morphological relatedness between lexical items triggers a significant priming effect, even when the forms are phonologically/orthographically unrelated. These observations are similar to those reported for English and indicate that derivationally suffixed words in Bangla are in general accessed through decomposition of the word into its constituent morphemes. Further, based on the experimental data we have developed a series of computational models that can be used to predict the decomposition of Bangla polymorphemic words. Our evaluation result shows that decom- position of a polymorphemic word depends on several factors like, frequency, productivity of the suffix and the compositionality between the stem and the suffix. The organization of the paper is as follows: Sec. 2 presents related works; Sec. 3 describes experiment design and procedure; Sec. 4 presents the processing of CVs; and finally, Sec. 5 concludes the paper by presenting the future direction of the work. 2 RReellaatteedd WWoorrkkss 2. . 11 RReepprreesseennttaattiioonn ooff ppoollyymmoorrpphheemmiicc wwoorrddss Over the last few decades many studies have attempted to understand the representation and processing of morphologically complex words in the brain for various languages. Most of the studies are designed to support one of the two mutually exclusive paradigms: the full-listing and the morphemic model. The full-listing model claims that polymorphic words are represented as a whole in the human mental lexicon (Bradley, 1980; Butterworth, 1983). On the other hand, morphemic model argues that morphologically complex words are decomposed and represented in terms of the smaller morphemic units. The affixes are stripped away from the root form, which in turn are used to access the mental lexicon (Taft and Forster, 1975; Taft, 1981 ; MacKay, 1978). Intermediate to these two paradigms is the partial decomposition model that argues that different types of morphological forms are processed separately. For instance, the derived morphological forms are believed to be represented as a whole, whereas the representation of the inflected forms follows the morphemic model (Caramazza et al., 1988). Traditionally, priming experiments have been used to study the effects of morphology in language processing. Priming is a process that results in increase in speed or accuracy of response to a stimulus, called the target, based on the occurrence of a prior exposure of another stimulus, called the prime (Tulving et al., 1982). Here, subjects are exposed to a prime word for a short duration, and are subsequently shown a target word. The prime and target words may be morphologically, phonologically or semantically re124 lated. An analysis of the effect of the reaction time of subjects reveals the actual organization and representation of the lexicon at the relevant level. See Pulvermüller (2002) for a detailed account of such phenomena. It has been argued that frequency of a word influences the speed of lexical processing and thus, can serve as a diagnostic tool to observe the nature and organization of lexical representations. (Taft, 1975) with his experiment on English inflected words, argued that lexical decision responses of polymorphemic words depends upon the base word frequency. Similar observation for surface word frequency was also observed by (Bertram et al., 2000;Bradley, 1980;Burani et al., 1987;Burani et al., 1984;Schreuder et al., 1997; Taft 1975;Taft, 2004) where it has been claimed that words having low surface frequency tends to decompose. Later, Baayen(2000) proposed the dual processing race model that proposes that a specific morphologically complex form is accessed via its parts if the frequency of that word is above a certain threshold of frequency, then the direct route will win, and the word will be accessed as a whole. If it is below that same threshold of frequency, the parsing route will win, and the word will be accessed via its parts. 2. . 22 RReepprreesseennttaattiioonn ooff CCoommppoouunndd A compound verb (CV) consists of two verbs (V1 and V2) acting as and expresses a single expression For example, in the sentence VVeerrbbss a sequence of a single verb of meaning. রুটিগুল ো খেল খেল ো (/ruTigulo kheYe phela/) ―bread-plural-the eat and drop-pres. Imp‖ ―Eat the breads‖ the verb sequence “খেল খেল ো (eat drop)” is an example of CV. Compound verbs are a special phenomena that are abundantly found in IndoEuropean languages like Indian languages. A plethora of works has been done to provide linguistic explanations on the formation of such word, yet none so far has led to any consensus. Hook (1981) considers the second verb V2 as an aspectual complex comparable to the auxiliaries. Butt (1993) argues CV formations in Hindi and Urdu are either morphological or syntactical and their formation take place at the argument struc- ture. Bashir (1993) tried to construct a semantic analysis based on “prepared” and “unprepared mind”. Similar findings have been proposed by Pandharipande (1993) that points out V1 and V2 are paired on the basis of their semantic compatibility, which is subject to syntactic constraints. Paul (2004) tried to represent Bangla CVs in terms of HPSG formalism. She proposes that the selection of a V2 by a V1 is determined at the semantic level because the two verbs will unify if and only if they are semantically compatible. Since none of the linguistic formalism could satisfactorily explain the unique phenomena of CV formation, we here for the first time drew our attention towards psycholinguistic and neurolinguistic studies to model the processing of verb-verb combinations in the ML and compare these responses with that of the existing models. 3 TThhee PPrrooppoosseedd AApppprrooaacchheess 3. . 11 TThhee ppssyycchhoolliinngguuiissttiicc eexxppeerriimmeennttss We apply two different priming experiments namely, the cross modal priming and masked priming experiment discussed in (Forster and Davis, 1984; Rastle et al., 2000;Marslen-Wilson et al., 1994; Marslen-Wilson et al., 2008) for Bangla morphologically complex words. Here, the prime is morphologically derived form of the target presented auditorily (for cross modal priming) or visually (for masked priming). The subjects were asked to make a lexical decision whether the given target is a valid word in that language. The same target word is again probed but with a different audio or visual probe called the control word. The control shows no relationship with the target. For example, baYaska (aged) and baYasa (age) is a prime-target pair, for which the corresponding control-target pair could be naYana (eye) and baYasa (age). Similar to (Marslen-Wilson et al., 2008) the masked priming has been conducted for three different SOA (Stimulus Onset Asynchrony), 48ms, 72ms and 120ms. The SOA is measured as the amount of time between the start the first stimulus till the start of the next stimulus. TCM abl-’+ Sse-+ O1 +:-DatjdgnmAshielbatArDu)f(osiAMrawnteihmsgcdaoe)lEx-npgmAchebamr)iD-gnatmprhdiYlbeaA(n ftrTsli,ae(+gnrmdisc)phroielctn)osrelated, and - implies unrelated. There were 500 prime-target and controltarget pairs classified into five classes. Depending on the class, the prime is related to the target 125 either in terms of morphology, semantics, orthography and/or Phonology (See Table 1). The experiments were conducted on 24 highly educated native Bangla speakers. Nineteen of them have a graduate degree and five hold a post graduate degree. The age of the subjects varies between 22 to 35 years. RReessuullttss:: The RTs with extreme values and incorrect decisions were excluded from the data. The data has been analyzed using two ways ANOVA with three factors: priming (prime and control), conditions (five classes) and prime durations (three different SOA). We observe strong priming effects (p<0.05) when the target word is morphologically derived and has a recognizable suffix, semantically and orthographically related with respect to the prime; no priming effects are observed when the prime and target words are orthographically related but share no morphological or semantic relationship; although not statistically significant (p>0.07), but weak priming is observed for prime target pairs that are only semantically related. We see no significant difference between the prime and control RTs for other classes. We also looked at the RTs for each of the 500 target words. We observe that maximum priming occurs for words in [M+S+O+](69%), some priming is evident in [M+S+O-](51%) and [M'+S-O+](48%), but for most of the words in [M-S+O-](86%) and [M-S-O+](92%) no priming effect was observed. 3. . 22 FFrreeqquueennccyy DDiissttrriibbuuttiioonn MMooddeellss ooff MMoo rrpphhoo-llooggiiccaall PPrroocceessssiinngg From the above results we saw that not all polymorphemic words tend to decompose during processing, thus we need to further investigate the processing phenomena of Bangla derived words. One notable means is to identify whether the stem or suffix frequency is involved in the processing stage of that word. For this, we apply different frequency based models to the Bangla polymorphemic words and try to evaluate their performance by comparing their predicted results with the result obtained through the priming experiment. MMooddeell --11:: BBaassee aanndd SSuurrffaaccee wwoorrdd ffrreeqquueennccyy ee ff-ffeecctt -- It states that the probability of decomposition of a Bangla polymorphemic word depends upon the frequency of its base word. Thus, if the stem frequency of a polymorphemic word crosses a given threshold value, then the word will decomposed into its constituent morpheme. Similar claim has been made for surface word frequency model where decomposition depends upon the frequency of the surface word itself. We have evaluated both the models with the 500 words used in the priming experiments discussed above. We have achieved an accuracy of 62% and 49% respectively for base and surface word frequency models. MMooddeell --22:: CCoommbbiinniinngg tthhee bbaassee aanndd ssuurrffaaccee wwoorrdd ffrreeq quueennccyy -- In a pursuit towards an extended model, we combine model 1 and 2 together. We took the log frequencies of both the base and the derived words and plotted the best-fit regression curve over the given dataset. The evaluation of this model over the same set of 500 target words returns an accuracy of 68% which is better than the base and surface word frequency models. However, the proposed model still fails to predict processing of around 32% of words. This led us to further enhance the model. For this, we analyze the role of suffixes in morphological processing. MMooddeell -- 33:: DDeeggrreeee ooff AAffffiixxaattiioonn aanndd SSuuffffiixx PPrroodd-uuccttiivviittyy:: we examine whether the regression analysis between base and derived frequency of Bangla words varies between suffixes and how these variations affect morphological decomposition. With respect to this, we try to compute the degree of affixation between the suffix and the base word. For this, we perform regression analysis on sixteen different Bangla suffixes with varying degree of type and token frequencies. For each suffix, we choose 100 different derived words. We observe that those suffixes having high value of intercept are forming derived words whose base frequencies are substantially high as compared to their derived forms. Moreover we also observe that high intercept value for a given suffix indicates higher inclination towards decomposition. Next, we try to analyze the role of suffix type/token ratio and compare them with the base/derived frequency ratio model. This has been done by regression analysis between the suffix type-token ratios with the base-surface frequency ratio. We further tried to observe the role of suffix productivity in morphological processing. For this, we computed the three components of productivity P, P* and V as discussed in (Hay and Plag, 2004). P is the “conditioned degree of productivity” and is the probability that we are encountering a word with an affix and it is representing a new type. P* is the “hapaxedconditioned degree of productivity”. It expresses the probability that when an entirely new word is 126 encountered it will contain the suffix. V is the “type frequency”. Finally, we computed the productivity of a suffix through its P, P* and V values. We found that decomposition of Bangla polymorphemic word is directly proportional to the productivity of the suffix. Therefore, words that are composed of productive suffixes (P value ranges between 0.6 and 0.9) like “-oYAlA”, “-giri”, “-tba” and “-panA” are highly decomposable than low productive suffixes like “-Ani”, “-lA”, “-k”, and “-tama”. The evaluation of the proposed model returns an accuracy of 76% which comes to be 8% better than the preceding models. CCoommbbiinniinngg MMooddeell --22 aanndd MMooddeell -- 33:: One important observation that can be made from the above results is that, model-3 performs best in determining the true negative values. It also possesses a high recall value of (85%) but having a low precision of (50%). In other words, the model can predict those words for which decomposition will not take place. On the other hand, results of Model-2 posses a high precision of 70%. Thus, we argue that combining the above two models can better predict the decomposition of Bangla polymorphemic words. Hence, we combine the two models together and finally achieved an overall accuracy of 80% with a precision of 87% and a recall of 78%. This surpasses the performance of the other models discussed earlier. However, around 22% of the test words were wrongly classified which the model fails to justify. Thus, a more rigorous set of experiments and data analysis are required to predict access mechanisms of such Bangla polymorphemic words. 3. . 33 SStteemm- -SSuuffffiixx CCoommppoossiittiioonnaalliittyy Compositionality refers to the fact that meaning of a complex expression is inferred from the meaning of its constituents. Therefore, the cost of retrieving a word from the secondary memory is directly proportional to the cost of retrieving the individual parts (i.e the stem and the suffix). Thus, following the work of (Milin et al., 2009) we define the compositionality of a morphologically complex word (We) as: C(We)=α 1H(We)+α α2H(e)+α α3H(W|e)+ α4H(e|W) Where, H(x) is entropy of an expression x, H(W|e) is the conditional entropy between the stem W and suffix e and is the proportionality factor whose value is computed through regression analysis. Next, we tried to compute the compositionality of the stem and suffixes in terms of relative entropy D(W||e) and Point wise mutual information (PMI). The relative entropy is the measure of the distance between the probability distribution of the stem W and the suffix e. The PMI measures the amount of information that one random variable (the stem) contains about the other (the suffix). We have compared the above three techniques with the actual reaction time data collected through the priming and lexical decision experiment. We observed that all the three information theoretic models perform much better than the frequency based models discussed in the earlier section, for predicting the decomposability of Bangla polymorphemic words. However, we think it is still premature to claim anything concrete at this stage of our work. We believe much more rigorous experiments are needed to be per- formed in order to validate our proposed models. Further, the present paper does not consider factors related to age of acquisition, and word familiarity effects that plays important role in the processing of morphologically complex words. Moreover, it is also very interesting to see how stacking of multiple suffixes in a word are processed by the human brain. 44 OOrrggaanniizzaattiioonn aanndd PPrroocceessssiinngg ooff CCoomm-ppoouunndd VVeerrbbss iinn tthhee MMeennttaall LLeexxiiccoonn Compound verbs, as discussed above, are special type of verb sequences consisting of two or more verbs acting as a single verb and express a single expression of meaning. The verb V1 is known as pole and V2 is called as vector. For example, “ওঠে পড়া ” (getting up) is a compound verb where individual words do not entirely reflects the meaning of the whole expression. However, not all V1+V2 combinations are CVs. For example, expressions like, “নিঠে য়াও ”(take and then go) and “ নিঠে আঠ ়া” (return back) are the examples of verb sequences where meaning of the whole expression can be derived from the mean- ing of the individual component and thus, these verb sequences are not considered as CV. The key question linguists are trying to identify for a long time and debating a lot is whether to consider CVs as a single lexical units or consider them as two separate units. Since linguistic rules fails to explain the process, we for the first time tried to perform cognitive experiments to understand the organization and processing of such verb sequences in the human mind. A clear understanding about these phenomena may help us to classify or extract actual CVs from other verb 127 sequences. In order to do so, presently we have applied three different techniques to collect user data. In the first technique, we annotated 4500 V1+V2 sequences, along with their example sentences, using a group of three linguists (the expert subjects). We asked the experts to classify the verb sequences into three classes namely, CV, not a CV and not sure. Each linguist has received 2000 verb pairs along with their respective example sentences. Out of this, 1500 verb sequences are unique to each of them and rest 500 are overlapping. We measure the inter annotator agreement using the Fleiss Kappa (Fleiss et al., 1981) measure (κ) where the agreement lies around 0.79. Next, out of the 500 common verb sequences that were annotated by all the three linguists, we randomly choose 300 V1+V2 pairs and presented them to 36 native Bangla speakers. We ask each subjects to give a compositionality score of each verb sequences under 1-10 point scale, 10 being highly compositional and 1 for noncompositional. We found an agreement of κ=0.69 among the subjects. We also observe a continuum of compositionality score among the verb sequences. This reflects that it is difficult to classify Bangla verb sequences discretely into the classes of CV and not a CV. We then, compare the compositionality score with that of the expert user’s annotation. We found a significant correlation between the expert annotation and the compositionality score. We observe verb sequences that are annotated as CVs (like, খেঠে খিল )কঠে খি ,ওঠে পড ,have got low compositionality score (average score ranges between 1-4) on the other hand high compositional values are in general tagged as not a cv (নিঠে য়া (come and get), নিঠে আে (return back), তুঠল খেঠেনি (kept), গনিঠে পিল (roll on floor)). This reflects that verb sequences which are not CV shows high degree of compositionality. In other words non CV verbs can directly interpret from their constituent verbs. This leads us to the possibility that compositional verb sequences requires individual verbs to be recognized separately and thus the time to recognize such expressions must be greater than the non-compositional verbs which maps to a single expression of meaning. In order to validate such claim we perform a lexical decision experiment using 32 native Bangla speakers with 92 different verb sequences. We followed the same experimental procedure as discussed in (Taft, 2004) for English polymorphemic words. However, rather than derived words, the subjects were shown a verb sequence and asked whether they recognize them as a valid combination. The reaction time (RT) of each subject is recorded. Our preliminarily observation from the RT analysis shows that as per our claim, RT of verb sequences having high compositionality value is significantly higher than the RTs for low or noncompositional verbs. This proves our hypothesis that Bangla compound verbs that show less compositionality are stored as a hole in the mental lexicon and thus follows the full-listing model whereas compositional verb phrases are individually parsed. However, we do believe that our experiment is composed of a very small set of data and it is premature to conclude anything concrete based only on the current experimental results. 5 FFuuttuurree DDiirreeccttiioonnss In the next phase of our work we will focus on the following aspects of Bangla morphologically complex words: TThhee WWoorrdd FFaammiilliiaarriittyy EEffffeecctt:: Here, our aim is to study the role of familiarity of a word during its processing. We define the familiarity of a word in terms of corpus frequency, Age of acquisition, the level of language exposure of a person, and RT of the word etc. RRoollee ooff ssuuffffiixx ttyyppeess iinn mmoorrpphhoollooggiiccaall ddeeccoo mm ppoo-ssiittiioonn:: For native Bangla speakers which morphological suffixes are internalized and which are just learnt in school, but never internalized. We can compare the representation of Native, Sanskrit derived and foreign suffixes in Bangla words. CCoommppuuttaattiioonnaall mmooddeellss ooff oorrggaanniizzaattiioonn aanndd pprroocceessssiinngg ooff BBaannggllaa ccoommppoouunndd vveerrbbss :: presently we have performed some small set of experiments to study processing of compound verbs in the mental lexicon. In the next phase of our work we will extend the existing experiments and also apply some more techniques like, crowd sourcing and language games to collect more relevant RT and compositionality data. Finally, based on the collected data we will develop computational models that can explain the possible organizational structure and processing mechanism of morphologically complex Bangla words in the mental lexicon. Reference Aitchison, J. (1987). ―Words in the mind: An introduction to the mental lexicon‖. Wiley-Blackwell, 128 Baayen R. H. (2000). ―On frequency, transparency and productivity‖. G. Booij and J. van Marle (eds), Yearbook of Morphology, pages 181-208, Baayen R.H. (2003). ―Probabilistic approaches to morphology‖. Probabilistic linguistics, pages 229287. Baayen R.H., T. Dijkstra, and R. Schreuder. (1997). ―Singulars and plurals in dutch: Evidence for a parallel dual-route model‖. Journal of Memory and Language, 37(1):94-1 17. Bashir, E. (1993), ―Causal Chains and Compound Verbs.‖ In M. K. Verma ed. (1993). Bentin, S. & Feldman, L.B. (1990). The contribution of morphological and semantic relatedness to repetition priming at short and long lags: Evidence from Hebrew. Quarterly Journal of Experimental Psychology, 42, pp. 693–71 1. Bradley, D. (1980). Lexical representation of derivational relation, Juncture, Saratoga, CA: Anma Libri, pp. 37-55. Butt, M. (1993), ―Conscious choice and some light verbs in Urdu.‖ In M. K. Verma ed. (1993). Butterworth, B. (1983). Lexical Representation, Language Production, Vol. 2, pp. 257-294, San Diego, CA: Academic Press. Caramazza, A., Laudanna, A. and Romani, C. (1988). Lexical access and inflectional morphology. Cognition, 28, pp. 297-332. Drews, E., and Zwitserlood, P. (1995).Morphological and orthographic similarity in visual word recognition. Journal of Experimental Psychology:HumanPerception andPerformance, 21, 1098– 1116. Fellbaum, C. (ed.). (1998). WordNet: An Electronic Lexical Database, MIT Press. Forster, K.I., and Davis, C. (1984). Repetition priming and frequency attenuation in lexical access. Journal of Experimental Psychology: Learning, Memory, and Cognition, 10, 680–698. Frost, R., Forster, K.I., & Deutsch, A. (1997). What can we learn from the morphology of Hebrew? A masked-priming investigation of morphological representation. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 829–856. Grainger, J., Cole, P., & Segui, J. (1991). Masked morphological priming in visual word recognition. Journal of Memory and Language, 30, 370–384. Hook, P. E. (1981). ―Hindi Structures: Intermediate Level.‖ Michigan Papers on South and Southeast Asia, The University of Michigan Center for South and Southeast Studies, Ann Arbor, Michigan. Joseph L Fleiss, Bruce Levin, and Myunghee Cho Paik. 1981. The measurement of interrater agreement. Statistical methods for rates and proportions,2:212–236. MacKay,D.G.(1978), Derivational rules and the internal lexicon. Journal of Verbal Learning and Verbal Behavior, 17, pp.61-71. Marslen-Wilson, W.D., & Tyler, L.K. (1997). Dissociating types of mental computation. Nature, 387, pp. 592–594. Marslen-Wilson, W.D., & Tyler, L.K. (1998). Rules, representations, and the English past tense. Trends in Cognitive Sciences, 2, pp. 428–435. Marslen-Wilson, W.D., Tyler, L.K., Waksler, R., & Older, L. (1994). Morphology and meaning in the English mental lexicon. Psychological Review, 101, pp. 3–33. Marslen-Wilson,W.D. and Zhou,X.( 1999). Abstractness, allomorphy, and lexical architecture. Language and Cognitive Processes, 14, 321–352. Milin, P., Kuperman, V., Kosti´, A. and Harald R., H. (2009). Paradigms bit by bit: an information- theoretic approach to the processing of paradigmatic structure in inflection and derivation, Analogy in grammar: Form and acquisition, pp: 214— 252. Pandharipande, R. (1993). ―Serial verb construction in Marathi.‖ In M. K. Verma ed. (1993). Paul, S. (2004). An HPSG Account of Bangla Compound Verbs with LKB Implementation, Ph.D. Dissertation. CALT, University of Hyderabad. Pulvermüller, F. (2002). The Neuroscience guage. Cambridge University Press. of Lan- Stolz, J.A., and Feldman, L.B. (1995). The role of orthographic and semantic transparency of the base morpheme in morphological processing. In L.B. Feldman (Ed.) Morphological aspects of language processing. Hillsdale, NJ: Lawrence Erlbaum Associates Inc. Taft, M., and Forster, K.I.(1975). Lexical storage and retrieval of prefix words. Journal of Verbal Learning and Verbal Behavior, Vol.14, pp. 638-647. Taft, M.(1988). A morphological decomposition model of lexical access. Linguistics, 26, pp. 657667. Taft, M. (2004). Morphological decomposition and the reverse base frequency effect. Quarterly Journal of Experimental Psychology, 57A, pp. 745-765 Tulving, E., Schacter D. L., and Heather A.(1982). Priming Effects in Word Fragment Completion are independent of Recognition Memory. Journal of Experimental Psychology: Learning, Memory and Cognition, vol.8 (4). 129
same-paper 3 0.93503267 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
Author: Hen-Hsen Huang ; Kai-Chun Chang ; Hsin-Hsi Chen
Abstract: This paper aims at understanding what human think in textual entailment (TE) recognition process and modeling their thinking process to deal with this problem. We first analyze a labeled RTE-5 test set and find that the negative entailment phenomena are very effective features for TE recognition. Then, a method is proposed to extract this kind of phenomena from text-hypothesis pairs automatically. We evaluate the performance of using the negative entailment phenomena on both the English RTE-5 dataset and Chinese NTCIR-9 RITE dataset, and conclude the same findings.
4 0.93312854 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
5 0.92901796 50 acl-2013-An improved MDL-based compression algorithm for unsupervised word segmentation
Author: Ruey-Cheng Chen
Abstract: We study the mathematical properties of a recently proposed MDL-based unsupervised word segmentation algorithm, called regularized compression. Our analysis shows that its objective function can be efficiently approximated using the negative empirical pointwise mutual information. The proposed extension improves the baseline performance in both efficiency and accuracy on a standard benchmark.
6 0.92685997 61 acl-2013-Automatic Interpretation of the English Possessive
7 0.92325211 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
8 0.92309582 170 acl-2013-GlossBoot: Bootstrapping Multilingual Domain Glossaries from the Web
9 0.91516465 75 acl-2013-Building Japanese Textual Entailment Specialized Data Sets for Inference of Basic Sentence Relations
10 0.8537845 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
11 0.85060585 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
12 0.84712565 242 acl-2013-Mining Equivalent Relations from Linked Data
13 0.83135724 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars
14 0.82304877 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
15 0.79943633 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
16 0.79661298 4 acl-2013-A Context Free TAG Variant
17 0.79581171 154 acl-2013-Extracting bilingual terminologies from comparable corpora
18 0.79483414 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
19 0.78739649 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
20 0.78665674 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars