acl acl2012 acl2012-48 knowledge-graph by maker-knowledge-mining

48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

Source: pdf

Author: Ingrid Falk ; Claire Gardent ; Jean-Charles Lamirel

Abstract: We present a novel approach to the automatic acquisition of a Verbnet like classification of French verbs which involves the use (i) of a neural clustering method which associates clusters with features, (ii) of several supervised and unsupervised evaluation metrics and (iii) of various existing syntactic and semantic lexical resources. We evaluate our approach on an established test set and show that it outperforms previous related work with an Fmeasure of 0.70.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 From the theoretical viewpoint, they permit capturing syntactic and/or semantic generalisations about verbs (Levin, 1993; Kipper Schuler, 2006). [sent-7, score-0.443]

2 While there has been much work on automatically acquiring verb classes for English (Sun et al. [sent-9, score-0.293]

3 They exploit features extracted from a large scale subcategorisation lexicon (LexSchem (Messiant, 2008)) acquired fully automatically from Le Monde newspaper corpus and show that, as for English, syntactic frames and verb selectional preferences perform better than lexical cooccurence features. [sent-19, score-0.387]

4 1 on 116 verbs occurring at least 150 times in Lexschem. [sent-21, score-0.383]

5 The best performance is achieved when restricting the approach to verbs occurring at least 4000 times (43 verbs) with an F-measure of 65. [sent-22, score-0.383]

6 On the other hand, Falk and Gardent (201 1) present a classification approach for French verbs based on the use ofFormal Concept Analysis (FCA). [sent-24, score-0.387]

7 On In this paper, we describe a novel approach to the clustering of French verbs which (i) gives good results on the established benchmark used in (Sun et al. [sent-29, score-0.555]

8 , 2010) and (ii) associates verbs with a feature profile describing their syntactic and semantic properties. [sent-30, score-0.562]

9 , 2011b)) which uses the features characterising each cluster both to guide the clustering process and to label the output clusters. [sent-32, score-0.491]

10 We show that the approach yields promising results (F-measure of 70%) and that the clustering produced systematically associates verbs with syntactic frames and thematic grids thereby providing an interesting basis for the creation and evaluation of a Verbnet-like classification. [sent-37, score-1.053]

11 2 Lexical Resources Used Our aim is to accquire a classification which covers the core verbs of French, could be used to support semantic role labelling and is similar in spirit to the English Verbnet. [sent-41, score-0.49]

12 , locative argument, concrete object) or thematic role information (e. [sent-51, score-0.219]

13 We use the English Verbnet as a resource for associating French verbs with thematic grids as follows. [sent-54, score-0.63]

14 We translate the verbs in the English Verbnet classes to French using English-French dictionaries2. [sent-55, score-0.509]

15 We first map French verbs with English Verbnet classes: A French verb is associated with an English Verbnet class if, according to our dictionaries, it is a translation of an English verb in this class. [sent-63, score-0.704]

16 The features we use are similar to those used in (Mouton, 2010): they are numeric and are derived for example from the number of translations an English or French verb had, the size of the Verbnet classes, the number of classes a verb is a member of etc. [sent-67, score-0.458]

17 We select 6000 pairs with highest probability estimates and obtain the translated classes by assigning each verb in a selected pair to the pair’s class. [sent-69, score-0.293]

18 This way French verbs are effectively associated with one or more English Verbnet thematic grids. [sent-70, score-0.56]

19 1 Clustering Methods The IGNGF clustering method is an incremental neural “winner-take-most” clustering method belonging to the family of the free topology neural clustering methods. [sent-72, score-0.828]

20 3The training data consists of the verbs and Verbnet classes used in the gold standard presented in (Sun et al. [sent-78, score-0.62]

21 Feature maximisation is a cluster quality metric which associates each cluster with maximal features i. [sent-84, score-0.581]

22 A feature is then said to be maximal for a given cluster iff its Feature F-measure is higher for that cluster than for any other cluster. [sent-88, score-0.464]

23 The IGNGF method was shown to outperform other usual neural and non neural methods for clustering tasks on relatively clean data (Lamirel et al. [sent-89, score-0.4]

24 Since we use features extracted from manually validated sources, this clustering technique seems a good fit for our application. [sent-91, score-0.29]

25 In addition, the feature maximisation and cluster labeling performed by the IGNGF method has proved promising both for visualising clustering results (Lamirel et al. [sent-92, score-0.549]

26 We make use of these processes in all our experiments and systematically compute cluster labelling and feature maximisation on the output clusterings. [sent-95, score-0.386]

27 This facilitates clustering interpretation in that cluster labeling clearly indicates the association between clusters (verbs) and their prevalent features. [sent-98, score-0.696]

28 And this supports the creation of a Verbnet style classification in that cluster labeling directly provides classes grouping together verbs, thematic grids and subcategorisation frames. [sent-99, score-0.78]

29 Each induced cluster is assigned the gold class (its prevalent class, prev(C)) to which most of its member verbs belong. [sent-106, score-0.851]

30 A verb is then said to be correct if the gold associates it with the prevalent class of the cluster it is in. [sent-107, score-0.696]

31 Given this, purity is the ratio between the number of correct gold verbs in the clustering and the total number of gold verbs in the clustering6: mPUR =PC∈ClusterVinegr,|bpsreGvo(lCd∩)|C>l1ust|eprirnegv(C) ∩ C|, where VerbsGold∩Clustering is the total number of gold verbs in the clustering. [sent-108, score-1.604]

32 Accuracy represents the proportion of gold verbs in those clusters which are associated with a gold class, compared to all the gold verbs in the clustering. [sent-109, score-1.211]

33 the cluster dom(CGold) which has most verbs in common with the gold class. [sent-111, score-0.655]

34 To assess the extent to which a clustering matches the gold classification, we additionally compute the coverage of each clustering that is, the proportion of gold classes that are prevalent classes in the clustering. [sent-114, score-1.103]

35 , 2006), unsupervised evaluation metrics based on cluster labelling and feature maximisation can prove very useful for identifying the best clustering strategy. [sent-118, score-0.6]

36 Computed on the clustering results, this metrics evaluates the quality of a clustering w. [sent-121, score-0.428]

37 6Clusters for which the prevalent class has only one element are ignored to a gold standard. [sent-127, score-0.307]

38 , 2010) to be effective in detecting degenerated clustering results including a small number of large heterogeneous, “garbage” clusters and a big number of small size “chunk” clusters. [sent-129, score-0.376]

39 First, the local Recall and the local Precision of a feature f in a cluster c are defined as follows: (Pcf) (Rfc) Rfc=||Vvfcf|| Pcf=||vVfcc|| vcf where is the set of verbs having feature f in c, Vc the set of verbs in c and Vf, the set of verbs with feature f. [sent-130, score-1.4]

40 Cumulative Micro-Precision (CMP) is then defined as follows: CMP =Pi=|Cinf|P,|Csi=u|pC|i|nCfi1+|,| C2sPup|c∈Ci1C+i+,f∈FcPcf where Ci+ represents the subset of clusters of C for which the number of associated verbs is greater than i, and: Cinf = argminci∈C |ci |, Csup = argmaxci∈C |ci | 3. [sent-131, score-0.537]

41 3 Cluster display, feature f-Measure and confidence score To facilitate interpretation, clusters are displayed as illustrated in Table 1. [sent-132, score-0.22]

42 1) and features whose Feature F-measure is under the average Feature F-measure of the overall clustering are clearly delineated from others. [sent-135, score-0.254]

43 In addition, for each verb in a cluster, a confidence score is displayed which is the ratio between the sum of the F-measures of its cluster maximised features over the sum of the F-measures of the overall cluster maximised features. [sent-136, score-0.639]

44 Table 1: Sample output for a cluster produced with the grid-scf-sem feature set and the IGNGF clustering method. [sent-163, score-0.475]

45 For each clustering method (K-Means and IGNGF), we let the number of clusters vary between 1 and 30 to obtain a partition that reaches an optimum F-measure and a number of clusters that is in the same order of magnitude as the initial number of Gold classes (i. [sent-165, score-0.706]

46 4 Features and Data Features In the simplest case the features are the subcategorisation frames (scf) associated to the verbs by our lexicon. [sent-168, score-0.587]

47 We also experiment with different combinations of additional, syntactic (synt) and semantic features (sem) extracted from the lex- icon and with the thematic grids (grid) extracted from the English Verbnet. [sent-169, score-0.431]

48 The thematic grid information is derived from the English Verbnet as explained in Section 2. [sent-170, score-0.296]

49 features are meant to help identify specific Verbnet classes and thematic roles. [sent-197, score-0.393]

50 These indicate whether a verb takes a locative or an asset argument and whether it requires a concrete object (non human role) or a plural role. [sent-199, score-0.249]

51 This resource consists of 16 fine grained Levin classes with 12 verbs each whose predominant sense in English belong to that class. [sent-203, score-0.509]

52 (2010)’s Gold Standard to 11 Verbnet classes thereby associating each class with a thematic grid. [sent-205, score-0.432]

53 Verbs For our clustering experiments we use the 2183 French verbs occurring in the translations of the 11 classes in the gold standard (cf. [sent-208, score-0.876]

54 Since we ignore verbs with only one feature the number of verbs and hverb, featurei pairs considered may vary slightly across experiments. [sent-210, score-0.74]

55 55 for verbs occurring at least 150 times in the training data and 0. [sent-219, score-0.383]

56 65 for verbs occurring at least 4000 times in this training data. [sent-220, score-0.383]

57 The results are not directly comparable however since the gold data is slightly different due to the grouping ofVerbnet classes through their thematic grids. [sent-221, score-0.464]

58 Section 3) highlight strong cluster cohesion with a number of clusters close to the number of gold classes (13 clusters for 11 gold classes); a low number of orphan verbs (i. [sent-225, score-1.258]

59 , verbs whose confidence score is zero); and a high Cumulated Micro Precision (CMP = 0. [sent-227, score-0.341]

60 72 indicates that approximately 8 out of the 11 gold classes could be matched to a prevalent label. [sent-230, score-0.396]

61 That is, 8 clusters were labelled with a prevalent label corresponding to 8 distinct gold classes. [sent-231, score-0.39]

62 In contrast, the classification obtained using the scf-synt-sem feature set has a higher CMP for the clustering with optimal mPUR (0. [sent-232, score-0.318]

63 61), a larger number of classes (16) and a higher number of orphans (156). [sent-234, score-0.236]

64 That is, this clustering has many clusters with strong feature cohesion but a class structure that markedly differs from the gold. [sent-235, score-0.513]

65 Since there might be differences in structure between the English Verbnet and the thematic classification for French we are building, this is not necessarily incorrect however. [sent-236, score-0.231]

66 Further investigation on a larger data set would be required to assess which clustering is in fact better given the data used and the classification searched for. [sent-237, score-0.26]

67 (2010) are selectional preferences while ours are thematic grids and a restricted set of manually encoded selectional preferences. [sent-242, score-0.289]

68 set scf grid, scf grid, scf, sem grid, scf, synt grid, scf, synt, sem scf, sem scf, synt scf, synt, sem Nbr. [sent-263, score-0.758]

69 verbs 2085 2085 2183 2150 2201 2183 2150 2101 mPUR 0. [sent-266, score-0.341]

70 Cumulative micro precision (CMP) is given for the clustering at the mPUR optimum and in paran- theses for 13 classes clustering. [sent-359, score-0.416]

71 2 Qualitative Analysis We carried out a manual analysis of the clusters ex- amining both the semantic coherence of each cluster (do the verbs in that cluster share a semantic component? [sent-362, score-1.013]

72 ) and the association between the thematic grids, the verbs and the syntactic frames provided by clustering. [sent-363, score-0.674]

73 Semantic homogeneity: To assess semantic homogeneity, we examined each cluster and sought to identify one or more Verbnet labels characterising the verbs contained in that cluster. [sent-364, score-0.63]

74 From the 13 clusters produced by clustering, 11 clusters could be labelled. [sent-365, score-0.324]

75 Table 6 shows these eleven clusters, the associated labels (abbreviated Verbnet class names), some example verbs, a sample subcategorisation frame drawn from the cluster maximising features and an illustrating sentence. [sent-366, score-0.43]

76 As can be seen, some clusters group together several subclasses and conversely, some Verbnet classes are spread over several clusters. [sent-367, score-0.33]

77 To start with, recall that we are aiming for a classification which groups together verbs with the same thematic grid. [sent-369, score-0.572]

78 Given this, cluster C2 correctly groups together two Verbnet classes (other cos-45. [sent-370, score-0.371]

79 In addition, the features 860 associated with this cluster indicate that verbs in these two classes are transitive, select a concrete object, and can be pronominalised which again is correct for most verbs in that cluster. [sent-374, score-1.127]

80 Similarly, cluster C11groups together verbs from two Verbnet classes with identical theta grid (light emission-43. [sent-375, score-0.823]

81 The third cluster grouping together verbs from two Verbnet classes is C7 which contains mainly judgement verbs (to applaud, bless, compliment, punish) but also some verbs from the (very large) other cos-45. [sent-378, score-1.394]

82 In this case, a prevalent shared feature is that both types of verbs accept a de-object that is, a prepositional object introduced by ”de” (Jean applaudit Marie d’avoir dans e´ / Jean applaudit Marie for having danced; Jean d ´egage le sable de la route / Jean clears the sand of the road). [sent-380, score-0.759]

83 Interestingly, clustering also highlights classes which are semantically homogeneous but syntactically distinct. [sent-382, score-0.382]

84 While clusters C6 and C10 both contain mostly verbs from the amuse-3 1. [sent-383, score-0.503]

85 1 class (amuser,agacer, e´nerver,d e´primer), their features indicate that verbs in C10 accept the pronominal form (e. [sent-384, score-0.46]

86 In this case, clustering highlights a syntactic distinction which is present in French but not in English. [sent-389, score-0.264]

87 In contrast, the dispersion of verbs from the other cos-45. [sent-390, score-0.341]

88 4 class over clusters C2 and C7 has no obvious explanation. [sent-391, score-0.241]

89 One reason might be that this class is rather large (361 verbs) and thus might contain French verbs that do not necessarily share properties with the original Verbnet class. [sent-392, score-0.42]

90 We examined whether the prevalent syntactic features labelling each cluster were compatible with the verbs and with the semantic class(es) manually assigned to the clusters. [sent-394, score-0.854]

91 3) correctly indicates that verbs in that cluster subcategorise for a sentential argument and an AOBJ (prepositional object in “ a`”) (e. [sent-397, score-0.6]

92 , Jean bafouille `a Marie qu ’il est amoureux / Jean stammers to Mary that he is in love); and that verbs in the C9 class (characterize-29. [sent-399, score-0.42]

93 In general, we found that the prevalent frames associated with each cluster adequately characterise the syntax of that verb class. [sent-401, score-0.577]

94 6 Conclusion We presented an approach to the automatic classification of french verbs which showed good results on an established testset and associates verb clusters with syntactic and semantic features. [sent-402, score-1.019]

95 Whether the features associated by the IGNGF clustering with the verb clusters appropriately caracterise these clusters remains an open question. [sent-403, score-0.737]

96 This suggests that overlapping clustering techniques need to 861 Table 6: Relations between clusters, syntactic frames and Verbnet like classes. [sent-405, score-0.362]

97 We are also investigating how the approach scales up to the full set of verbs present in the lexicon. [sent-407, score-0.341]

98 We intend to tap on that potential and explore how well the various semantic features that can be extracted from these resources support automatic verb classification for the full set of verbs present in our lexicon. [sent-409, score-0.604]

99 Mesures de qualit´ e de clustering de documents : prise en compte de la distribution des mots cl´ es. [sent-468, score-0.419]

100 Variations to incremental growing neural gas algorithm based on label maximization. [sent-519, score-0.265]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('verbs', 0.341), ('verbnet', 0.308), ('igngf', 0.307), ('clustering', 0.214), ('cluster', 0.203), ('thematic', 0.185), ('french', 0.182), ('cmp', 0.171), ('lamirel', 0.171), ('classes', 0.168), ('clusters', 0.162), ('scf', 0.14), ('verb', 0.125), ('gas', 0.12), ('mpur', 0.12), ('prevalent', 0.117), ('grid', 0.111), ('gold', 0.111), ('grids', 0.104), ('frames', 0.098), ('jean', 0.098), ('synt', 0.095), ('neural', 0.093), ('schulte', 0.089), ('sun', 0.086), ('falk', 0.085), ('idf', 0.084), ('class', 0.079), ('maximisation', 0.074), ('subcategorisation', 0.074), ('sem', 0.072), ('dicovalence', 0.068), ('orphans', 0.068), ('marie', 0.063), ('associates', 0.061), ('fmeasure', 0.06), ('feature', 0.058), ('object', 0.056), ('cumulative', 0.055), ('gardent', 0.054), ('semantic', 0.052), ('growing', 0.052), ('attik', 0.051), ('cuxac', 0.051), ('ethodes', 0.051), ('eynde', 0.051), ('normalisation', 0.051), ('labelling', 0.051), ('universit', 0.051), ('syntactic', 0.05), ('acc', 0.048), ('im', 0.048), ('classification', 0.046), ('la', 0.045), ('ladl', 0.045), ('walde', 0.045), ('occurring', 0.042), ('nancy', 0.042), ('features', 0.04), ('de', 0.04), ('english', 0.036), ('den', 0.036), ('levin', 0.036), ('nicolas', 0.036), ('validated', 0.036), ('france', 0.036), ('weighting', 0.035), ('associated', 0.034), ('micro', 0.034), ('purity', 0.034), ('applaudit', 0.034), ('asset', 0.034), ('barbut', 0.034), ('brew', 0.034), ('cgold', 0.034), ('characterising', 0.034), ('cheval', 0.034), ('cinf', 0.034), ('classi', 0.034), ('dans', 0.034), ('fca', 0.034), ('fille', 0.034), ('gallops', 0.034), ('galope', 0.034), ('ghribi', 0.034), ('glows', 0.034), ('hfrench', 0.034), ('ijcnn', 0.034), ('jeune', 0.034), ('kup', 0.034), ('locative', 0.034), ('martinetz', 0.034), ('maximised', 0.034), ('mertens', 0.034), ('oishi', 0.034), ('pcf', 0.034), ('primer', 0.034), ('prudent', 0.034), ('pvp', 0.034), ('pwvf', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

Author: Ingrid Falk ; Claire Gardent ; Jean-Charles Lamirel

2 0.26167253 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

Abstract: We present a novel approach for verb subcategorization lexicons using a simple graphical model. In contrast to previous methods, we show how the model can be trained without parsed input or a predefined subcategorization frame inventory. Our method outperforms the state-of-the-art on a verb clustering task, and is easily trained on arbitrary domains. This quantitative evaluation is com- plemented by a qualitative discussion of verbs and their frames. We discuss the advantages of graphical models for this task, in particular the ease of integrating semantic information about verbs and arguments in a principled fashion. We conclude with future work to augment the approach.

3 0.21372321 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.

4 0.12720555 64 acl-2012-Crosslingual Induction of Semantic Roles

Author: Ivan Titov ; Alexandre Klementiev

Abstract: We argue that multilingual parallel data provides a valuable source of indirect supervision for induction of shallow semantic representations. Specifically, we consider unsupervised induction of semantic roles from sentences annotated with automatically-predicted syntactic dependency representations and use a stateof-the-art generative Bayesian non-parametric model. At inference time, instead of only seeking the model which explains the monolingual data available for each language, we regularize the objective by introducing a soft constraint penalizing for disagreement in argument labeling on aligned sentences. We propose a simple approximate learning algorithm for our set-up which results in efficient inference. When applied to German-English parallel data, our method obtains a substantial improvement over a model trained without using the agreement signal, when both are tested on non-parallel sentences.

5 0.1268522 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

Author: Limin Yao ; Sebastian Riedel ; Andrew McCallum

Abstract: To discover relation types from text, most methods cluster shallow or syntactic patterns of relation mentions, but consider only one possible sense per pattern. In practice this assumption is often violated. In this paper we overcome this issue by inducing clusters of pattern senses from feature representations of patterns. In particular, we employ a topic model to partition entity pairs associated with patterns into sense clusters using local and global features. We merge these sense clusters into semantic relations using hierarchical agglomerative clustering. We compare against several baselines: a generative latent-variable model, a clustering method that does not disambiguate between path senses, and our own approach but with only local features. Experimental results show our proposed approach discovers dramatically more accurate clusters than models without sense disambiguation, and that incorporating global features, such as the document theme, is crucial.

6 0.10555319 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

7 0.081966467 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

8 0.078368574 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

9 0.071407937 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

10 0.070142165 58 acl-2012-Coreference Semantics from Web Features

11 0.065136343 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

12 0.062901169 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

13 0.060916144 101 acl-2012-Fully Abstractive Approach to Guided Summarization

14 0.06052411 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

15 0.058954522 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

16 0.057418428 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

17 0.05342409 187 acl-2012-Subgroup Detection in Ideological Discussions

18 0.052792687 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

19 0.050534479 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection

20 0.050162476 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.162), (1, 0.066), (2, -0.059), (3, -0.001), (4, 0.023), (5, 0.055), (6, -0.024), (7, 0.036), (8, -0.002), (9, -0.012), (10, -0.016), (11, -0.116), (12, 0.183), (13, 0.043), (14, -0.241), (15, 0.083), (16, 0.03), (17, -0.002), (18, -0.072), (19, 0.024), (20, -0.12), (21, -0.038), (22, -0.032), (23, 0.051), (24, 0.147), (25, -0.159), (26, 0.136), (27, -0.121), (28, 0.298), (29, 0.104), (30, -0.061), (31, -0.077), (32, -0.175), (33, 0.166), (34, -0.032), (35, -0.086), (36, 0.26), (37, 0.077), (38, -0.03), (39, -0.125), (40, -0.108), (41, 0.055), (42, 0.047), (43, -0.051), (44, 0.151), (45, -0.096), (46, 0.042), (47, 0.103), (48, -0.031), (49, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97294474 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

Author: Ingrid Falk ; Claire Gardent ; Jean-Charles Lamirel

2 0.90899038 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

Author: Thomas Lippincott ; Anna Korhonen ; Diarmuid O Seaghdha

3 0.52360779 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

4 0.35741931 120 acl-2012-Information-theoretic Multi-view Domain Adaptation

Author: Pei Yang ; Wei Gao ; Qi Tan ; Kam-Fai Wong

Abstract: We use multiple views for cross-domain document classification. The main idea is to strengthen the views’ consistency for target data with source training data by identifying the correlations of domain-specific features from different domains. We present an Information-theoretic Multi-view Adaptation Model (IMAM) based on a multi-way clustering scheme, where word and link clusters can draw together seemingly unrelated domain-specific features from both sides and iteratively boost the consistency between document clusterings based on word and link views. Experiments show that IMAM significantly outperforms state-of-the-art baselines.

5 0.34508008 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus

Author: Yuri Lin ; Jean-Baptiste Michel ; Erez Aiden Lieberman ; Jon Orwant ; Will Brockman ; Slav Petrov

Abstract: We present a new edition of the Google Books Ngram Corpus, which describes how often words and phrases were used over a period of five centuries, in eight languages; it reflects 6% of all books ever published. This new edition introduces syntactic annotations: words are tagged with their part-of-speech, and headmodifier relationships are recorded. The annotations are produced automatically with statistical models that are specifically adapted to historical text. The corpus will facilitate the study of linguistic trends, especially those related to the evolution of syntax.

6 0.33128664 166 acl-2012-Qualitative Modeling of Spatial Prepositions and Motion Expressions

7 0.32472229 64 acl-2012-Crosslingual Induction of Semantic Roles

8 0.32203579 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

9 0.30812678 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes

10 0.3060357 58 acl-2012-Coreference Semantics from Web Features

11 0.305338 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

12 0.30208173 16 acl-2012-A Nonparametric Bayesian Approach to Acoustic Model Discovery

13 0.29437253 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

14 0.28698674 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

15 0.26111406 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging

16 0.24344094 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

17 0.2348672 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

18 0.22272009 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

19 0.21490081 186 acl-2012-Structuring E-Commerce Inventory

20 0.21280707 56 acl-2012-Computational Approaches to Sentence Completion

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(17, 0.347), (25, 0.021), (26, 0.073), (28, 0.033), (30, 0.033), (37, 0.045), (39, 0.038), (45, 0.011), (57, 0.014), (59, 0.026), (74, 0.028), (84, 0.02), (85, 0.045), (90, 0.06), (92, 0.072), (94, 0.017), (99, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73692197 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources

Author: Ingrid Falk ; Claire Gardent ; Jean-Charles Lamirel

2 0.57191908 211 acl-2012-Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation

Author: Benjamin Borschinger ; Mark Johnson

Abstract: We present a novel extension to a recently proposed incremental learning algorithm for the word segmentation problem originally introduced in Goldwater (2006). By adding rejuvenation to a particle filter, we are able to considerably improve its performance, both in terms of finding higher probability and higher accuracy solutions.

3 0.54097432 31 acl-2012-Authorship Attribution with Author-aware Topic Models

Author: Yanir Seroussi ; Fabian Bohnert ; Ingrid Zukerman

Abstract: Authorship attribution deals with identifying the authors of anonymous texts. Building on our earlier finding that the Latent Dirichlet Allocation (LDA) topic model can be used to improve authorship attribution accuracy, we show that employing a previously-suggested Author-Topic (AT) model outperforms LDA when applied to scenarios with many authors. In addition, we define a model that combines LDA and AT by representing authors and documents over two disjoint topic sets, and show that our model outperforms LDA, AT and support vector machines on datasets with many authors.

4 0.35561308 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

5 0.34790316 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

6 0.34761554 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

7 0.34329408 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

8 0.34278381 130 acl-2012-Learning Syntactic Verb Frames using Graphical Models

9 0.34266037 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

10 0.34265643 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition

11 0.3424609 198 acl-2012-Topic Models, Latent Space Models, Sparse Coding, and All That: A Systematic Understanding of Probabilistic Semantic Extraction in Large Corpus

12 0.33726645 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

13 0.33687237 167 acl-2012-QuickView: NLP-based Tweet Search

14 0.33625415 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

15 0.33601299 209 acl-2012-Unsupervised Semantic Role Induction with Global Role Ordering

16 0.33539519 83 acl-2012-Error Mining on Dependency Trees

17 0.33526203 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

18 0.33349651 139 acl-2012-MIX Is Not a Tree-Adjoining Language

19 0.33300951 41 acl-2012-Bootstrapping a Unified Model of Lexical and Phonetic Acquisition

20 0.33046481 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities