acl acl2013 acl2013-192 knowledge-graph by maker-knowledge-mining

192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering


Source: pdf

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. [sent-3, score-0.425]

2 We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. [sent-5, score-0.148]

3 Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. [sent-6, score-0.667]

4 We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1. [sent-7, score-0.156]

5 1 Introduction Verb classes (VCs), subcategorization frames (SCFs) and selectional preferences (SPs) capture different aspects of predicate-argument structure. [sent-8, score-0.366]

6 These three of types of information have proved useful for Natural Language Processing (NLP) 1The source code of the clustering algorithms and evaluation is submitted with this paper and will be made publicly available upon acceptance of the paper. [sent-10, score-0.138]

7 The task of SCF induction involves identifying the arguments of a verb lemma and generalizing about the frames (i. [sent-24, score-0.378]

8 For example, in (1), the verb ”show” takes the frame SUBJ-DOBJCCOMP (subject, direct object, and clausal complement). [sent-27, score-0.213]

9 (1) [A number of SCF acquisition papers]SUBJ [show]VERB [their readers]DOBJ [which features are most valuable for the acquisition process]CCOMP. [sent-28, score-0.224]

10 In sentence (2), for example, the verb ”show” takes the frame SUBJ-DOBJ. [sent-30, score-0.213]

11 Finally, VC induction involves clustering together verbs with similar meaning, reflected in similar SCFs and SPs. [sent-35, score-0.274]

12 O´ S ´eaghdha (2010) presented a “dual-topic” model for SPs that induces also verb clusters. [sent-41, score-0.192]

13 (2012) presented a joint model for inducing simple syntactic frames and VCs. [sent-44, score-0.155]

14 Our framework is based on Determinantal Point Processes (DPPs, (Kulesza, 2012; Kulesza and Taskar, 2012c)), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets. [sent-50, score-0.148]

15 We first show how individual-task DPP kernel matrices can be naturally combined to construct a joint kernel. [sent-51, score-0.207]

16 We then introduce a novel clustering algorithm based on iterative DPP sampling which can (contrary to other probabilistic frameworks such as Markov random fields) be performed both accurately and efficiently. [sent-53, score-0.22]

17 We also contribute by evaluating the value of the clusters induced by our model for the acquisition of the three information types. [sent-55, score-0.268]

18 Our evaluation against a well-known VC gold standard shows that our clustering model outperforms the state-of-theart verb clustering algorithm of Sun and Korhonen (2009), in our setup where no manually created SCF or SP data is available. [sent-56, score-0.499]

19 2 Previous Work SCF acquisition Most current works induce SCFs from the output of an unlexicalized parser (i. [sent-58, score-0.139]

20 (2012) presented a Bayesian network model for syntactic frame induction that identifies SPs on argument types. [sent-68, score-0.141]

21 (2012) do not capture sets of arguments for verbs so are far simpler than traditional SCFs. [sent-70, score-0.145]

22 Current approaches to SCF acquisition suffer from lack of semantic information which is needed to guide the purely syntax-driven acquisition process. [sent-71, score-0.224]

23 SP acquisition Considerable research has been conducted on SP acquisition, with a variety of unsupervised models proposed for this task that use no hand-crafted information during training. [sent-74, score-0.145]

24 Previous unsupervised VC acquisition approaches clustered a variety of linguistic features using different (e. [sent-86, score-0.145]

25 The linguistic features included SCFs and SPs, but these were induced separately and then feeded as features to the clustering algorithm. [sent-93, score-0.192]

26 Our framework combines together SCF-motivated and SP-motivated kernel matrices , and uses the joint kernel to induce verb clusters which are likely to be highly relevant for both tasks. [sent-94, score-0.618]

27 Importantly, no manual or automatic system for SCF or SP acquisition has been utilized when constructing the kernel matrices, we only consider features extracted from the output of an unlexicalized parser. [sent-95, score-0.236]

28 (2012) presented a joint unsupervised model of SCF and SP acquisition based on non-negative tensor factorization. [sent-104, score-0.225]

29 Our idea is to utilize DPPs for verb clustering that informs both SCF and SP acquisition. [sent-110, score-0.296]

30 Individual task DPP kernels represent (i) the quality of a data point (verb) as its average featurebased similarity with the other points in the data set and (ii) the divergence between a pair of points as the inverse similarity between them. [sent-114, score-0.166]

31 The high quality and diverse subsets sampled from the DPP model are considered good cluster seeds as they are likely to be relatively uniformly spread and to provide good coverage of the data set. [sent-116, score-0.26]

32 The algorithm induces an hierarchical clustering, which is particularly suitable for semantic tasks, where a set of clusters that share a parent consists of pure members (i. [sent-117, score-0.164]

33 most of the points in each cluster member belong to the same gold cluster) and together provide good coverage of the verb space. [sent-119, score-0.369]

34 1), we discuss the construction of the joint DPP kernel, given a kernel for each individual task, In section 3. [sent-121, score-0.162]

35 2 Constructing a Joint Kernel Matrix DPPs are particularly suitable for joint modeling as they come with various simple and intuitive ways to combine individual model kernel matrices into a joint kernel. [sent-145, score-0.245]

36 This stems from the fact that every positive-semidefinite matrix forms a legal DPP kernel (equation 1). [sent-146, score-0.174]

37 2 In practice we construct the joint kernel in the following way. [sent-149, score-0.162]

38 d PL2 nd etwfinoe Dd by L, P2 = A2TA2, we construct the joint kernel L12: L12 = L1L2L2L1 Where C A1TA1AT2A2. [sent-152, score-0.162]

39 Features Our algorithm builds two DPP kernel matrices (the GenKernelMatrix function), in which the rows and columns correspond to the verbs in the data set, such that the (i, j)-th entry corresponds to verbs number iand j. [sent-156, score-0.375]

40 Following equations 2 and 3 one matrix is built for SCF and one for SP, and they are then combined into the 2Note that we do not take a product of the individual models but only of their kernel matrices. [sent-157, score-0.174]

41 865 joint kernel matrix (the GenJointMat function) following equation 5. [sent-161, score-0.212]

42 Each kernel matrix requires a proper feature representation φ and quality score q. [sent-162, score-0.21]

43 In both kernels we represent a verb by the counts of the grammatical relations (GRs) it participates in. [sent-163, score-0.192]

44 In the SCF kernel a GR is represented by the GR type and the POS tags of the verb and its arguments. [sent-164, score-0.282]

45 In the SP kernels the GRs are represented by the POS tags of the verb and its arguments as well as by the argument head word. [sent-165, score-0.287]

46 The quality score qi of the i-th verb is the average similarity of this verb with the other verbs in the dataset. [sent-167, score-0.471]

47 Cluster set construction In its while loop, the algorithm iteratively generates fixed-size cluster sets such that each data point belongs to exactly one cluster in one set. [sent-168, score-0.36]

48 These cluster sets form the leaf level of the tree in Figure (1). [sent-169, score-0.146]

49 It does so by extracting the T highest probability K-point samples from a set of M subsets, each of which sampled from the joint DPP model, and clustering them by the cluster procedure. [sent-170, score-0.356]

50 The cluster procedure first seeds a K-cluster set with the highest probability sample. [sent-172, score-0.146]

51 Then, it gradually extends the clusters by iteratively mapping the samples, in decreasing order of probability, to the existing clusters (the m1Mapping function). [sent-173, score-0.204]

52 Mapping is done by attaching every point in the mapped subset to its closet cluster, where the distance between a point and the cluster is the maximum over the distances between the point and each of the points in the cluster. [sent-174, score-0.294]

53 Based on the DPP properties, the higher the probability of a sampled subset, the more likely it is to consist of distinct points that provide a good coverage of the verb set. [sent-176, score-0.186]

54 By iteratively extending the clusters with high probability subsets, we thus expect each cluster set to consist of clusters that demonstrate these properties. [sent-177, score-0.35]

55 The iterative DPP-samples clustering (the While loop) generates the lowest level of the tree, by dividing the data set into cluster sets, each of which consists of K clusters. [sent-185, score-0.284]

56 Each point in the data set belongs to exactly one cluster in exactly one set. [sent-186, score-0.186]

57 The agglomerative clustering then iteratively combines cluster sets such that in each iteration two sets are combined to one set with K clusters. [sent-187, score-0.388]

58 Agglomerative Clustering Finally, the AgglomerativeClustering function builds a hierarchy of cluster sets, by iteratively combining cluster set pairs. [sent-188, score-0.292]

59 In each iteration it computes the similarity between any such pair, defined to be the lowest similarity between their cluster members, which is in turn defined to be the lowest cosine similarity between their point members. [sent-189, score-0.231]

60 The most similar cluster sets are combined such that each of the clusters in one set is mapped to its most similar cluster in the other set. [sent-190, score-0.394]

61 In this step the algorithm generates data partitions at different granularity levels from finest (from the iterative sampling step) to the coarsest set (generated by the last agglomerative clustering iteration and consisting of exactly K clusters). [sent-191, score-0.324]

62 4 Evaluation Data sets and gold standards We evaluated the SCFs and verb clusters on gold standard datasets. [sent-193, score-0.334]

63 3 SCF types per verb) obtained by annotating 250 corpus occurrences per verb with the SCF types of (de Cruys et al. [sent-197, score-0.158]

64 Where a verb has more than one VerbNet class, we assign it to the one supported by the highest number of member verbs. [sent-200, score-0.158]

65 R915 R5B Table 2: Performance of the Corpus Statistics SP baseline (non-filtered, NF) as well as for three filtering methods: frequency based (filter-baseline, B), DPP-cluster based (DPP) and AC cluster based (AC). [sent-238, score-0.19]

66 ficient representation of each class, we collected from VerbNet the verbs for which at least one of the possible classes is represented in the 183 verbs set by at least one and at most seven verbs. [sent-243, score-0.178]

67 Since 176 out of the 183 initial verbs are represented in this corpus, our final gold standard consists of 34 classes containing 277 verbs, ofwhich 176 have SCF gold standard and has been evaluated for this task. [sent-247, score-0.163]

68 Clustering Evaluation We first evaluate the quality of the clusters induced by our algorithm (DPP-cluster) compared to the gold standard VCs (table 1). [sent-249, score-0.257]

69 We also compare to the state-of-the-art spectral clustering method of Sun and Korhonen (2009) where our 4Importantly, the kernel matrix L used in the agglomerative clustering process is also used by AC. [sent-251, score-0.539]

70 kernel matrix is used for the distance between data points (SC) 5. [sent-252, score-0.202]

71 We evaluated the unified cluster set induced in each iteration of our algorithm and of the AC baseline and induced the same number of clusters as in each iteration of our algorithm using the SC baseline. [sent-253, score-0.552]

72 Since the number of clusters in each iteration is not an argument for our algorithm or for the AC baseline, the number of clusters slightly differ between the two. [sent-254, score-0.316]

73 We do this by gathering the GR combinations for each of the verbs in our gold standard, assuming they are frames and gathering their frequencies. [sent-275, score-0.243]

74 To remedy this imbalance, we apply a cluster based filtering method on top of the maximum- recall frequency filter. [sent-286, score-0.226]

75 This filter excludes a candidate frame from a verb’s lexicon only if it meets the frequency filter criterion and appears in no more than N other members of the cluster of the verb in question. [sent-287, score-0.391]

76 The filter utilizes the clustering produced by the seventh to last iteration of DPPcluster that contains seven clusters with approximately 30 members each. [sent-288, score-0.285]

77 Table 3 clearly demonstrates that cluster based filtering (DPP-cluster and AC) is the only method that provides a good balance between the recall and the precision of the SCF lexicon. [sent-291, score-0.258]

78 Moreover, the lexicon induced by this method includes a substantially higher number of frames per verb compared to the other filtering methods. [sent-292, score-0.405]

79 This clearly demonstrates that the clustering serves to provide SCF acquisition with semantic information needed for improved performance. [sent-294, score-0.25]

80 All methods except from cluster based filtering (DPP-cluster and AC) induce lexicons with strong recall/precision imbalance. [sent-324, score-0.217]

81 Cluster based filtering keeps a larger number of frames in the lexicon compared to the frequency thresholding baseline, while keeping similar F-score levels. [sent-325, score-0.193]

82 We therefore propose to measure both aspects of the SP task by computing both the recall and the precision between the list of possible arguments a verb can take according to the model and the corresponding test corpus list 9. [sent-330, score-0.282]

83 We evaluate the value of our clustering for SP acquisition in the particularly challenging scenario of domain adaptation. [sent-331, score-0.25]

84 Using the verb clusters we create a filtered version of the BNC argument lexicon which includes in the noun argument list of a verb only those nouns that appear in the BNC as arguments of that verb and of one of its cluster members. [sent-337, score-0.888]

85 Table10 02− presents the results for verbs with up to 200, 600 and 1000 noun arguments in the training data. [sent-345, score-0.145]

86 In all cases, the relative error reduction of the DPP cluster filter is substantially higher than that of the frequency baseline. [sent-346, score-0.174]

87 Note that for this task the baseline AC clusters are of low quality which is reflects by an error reduction ratio of up to 0. [sent-347, score-0.166]

88 5 Conclusions and Future Work In this paper we have presented the first unified framework for the induction of verb clusters, subcategorization frames and selectional preferences from corpus data. [sent-349, score-0.621]

89 Our key idea is to cluster together verbs with similar SCFs and SPs and to use the resulting clusters for SCF and SP induction. [sent-350, score-0.337]

90 To implement our idea we presented a novel method which involves constructing a product DPP model for SCFs and SPs and introduced a new algorithm that utilizes the efficient DPP sampling algorithms to cluster together verbs with similar SCFs and SPs. [sent-351, score-0.317]

91 The induced clusters performed well in evaluation against a VerbNet -based gold standard and proved useful in improving the quality of SCFs and SPs over strong baselines. [sent-352, score-0.229]

92 Not only the acquisition of different types information (syntactic and semantic) can support and inform each other, but also a unified framework can be useful for NLP tasks and applications which require rich information about predicate-argument structure. [sent-355, score-0.162]

93 IRASubcat, a highly customizable, language independent tool for the acquisition of verbal subcategorization information from corpus. [sent-363, score-0.24]

94 Inferring semantic roles using subcategorization frames and maximum entropy model. [sent-382, score-0.245]

95 Unsupervised acquisition of verb subcategorization frames from shallow-parsed corpora. [sent-505, score-0.515]

96 Which are the best features for automatic verb classification. [sent-513, score-0.158]

97 A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora. [sent-550, score-0.357]

98 Combining EM training and the MDL principle for an automatic verb classification incorporating selectional preferences. [sent-576, score-0.239]

99 Experiments on the automatic induction of german semantic verb classes. [sent-580, score-0.205]

100 Unsupervised and constrained dirichlet process mixture models for verb clustering. [sent-612, score-0.158]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('scf', 0.442), ('dpp', 0.344), ('scfs', 0.289), ('sps', 0.229), ('sp', 0.179), ('korhonen', 0.172), ('verb', 0.158), ('kulesza', 0.152), ('cluster', 0.146), ('vcs', 0.14), ('cruys', 0.14), ('clustering', 0.138), ('subcategorization', 0.128), ('determinantal', 0.127), ('kernel', 0.124), ('frames', 0.117), ('acquisition', 0.112), ('dpps', 0.11), ('clusters', 0.102), ('lippincott', 0.093), ('verbs', 0.089), ('anna', 0.081), ('selectional', 0.081), ('subsets', 0.078), ('schulte', 0.072), ('taskar', 0.071), ('verbnet', 0.071), ('bnc', 0.069), ('walde', 0.063), ('topsample', 0.062), ('topsamples', 0.062), ('agglomerative', 0.059), ('eaghdha', 0.057), ('arguments', 0.056), ('donovan', 0.055), ('frame', 0.055), ('induced', 0.054), ('sampling', 0.054), ('vc', 0.052), ('sun', 0.051), ('matrix', 0.05), ('unified', 0.05), ('rooth', 0.048), ('briscoe', 0.048), ('induction', 0.047), ('dppcluster', 0.047), ('genkernelmatrix', 0.047), ('im', 0.046), ('carroll', 0.045), ('matrices', 0.045), ('iteration', 0.045), ('filtering', 0.044), ('tensor', 0.042), ('ac', 0.041), ('preferences', 0.04), ('point', 0.04), ('gr', 0.039), ('argument', 0.039), ('joint', 0.038), ('gold', 0.037), ('levin', 0.036), ('quality', 0.036), ('preiss', 0.036), ('recall', 0.036), ('kernels', 0.034), ('rasp', 0.034), ('elegant', 0.034), ('sy', 0.034), ('induces', 0.034), ('samples', 0.034), ('unsupervised', 0.033), ('diarmuid', 0.033), ('processes', 0.032), ('lexicon', 0.032), ('precision', 0.032), ('basili', 0.031), ('agglomerativeclustering', 0.031), ('altamirano', 0.031), ('chesley', 0.031), ('cholakov', 0.031), ('grs', 0.031), ('ienco', 0.031), ('messiant', 0.031), ('minors', 0.031), ('nant', 0.031), ('swier', 0.031), ('zapirain', 0.031), ('van', 0.031), ('reisinger', 0.03), ('spectral', 0.03), ('qi', 0.03), ('sc', 0.03), ('subj', 0.029), ('predicateargument', 0.029), ('reduction', 0.028), ('de', 0.028), ('algorithm', 0.028), ('points', 0.028), ('joanis', 0.028), ('induce', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999893 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

2 0.2743215 119 acl-2013-Diathesis alternation approximation for verb clustering

Author: Lin Sun ; Diana McCarthy ; Anna Korhonen

Abstract: Although diathesis alternations have been used as features for manual verb classification, and there is recent work on incorporating such features in computational models of human language acquisition, work on large scale verb classification has yet to examine the potential for using diathesis alternations as input features to the clustering process. This paper proposes a method for approximating diathesis alternation behaviour in corpus data and shows, using a state-of-the-art verb clustering system, that features based on alternation approximation outperform those based on independent subcategorization frames. Our alternation-based approach is particularly adept at leveraging information from less frequent data.

3 0.13490492 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

Author: Zhenhua Tian ; Hengheng Xiang ; Ziqi Liu ; Qinghua Zheng

Abstract: This paper presents an unsupervised random walk approach to alleviate data sparsity for selectional preferences. Based on the measure of preferences between predicates and arguments, the model aggregates all the transitions from a given predicate to its nearby predicates, and propagates their argument preferences as the given predicate’s smoothed preferences. Experimental results show that this approach outperforms several state-of-the-art methods on the pseudo-disambiguation task, and it better correlates with human plausibility judgements.

4 0.10665427 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov

Abstract: Hierarchical Bayesian Models (HBMs) have been used with some success to capture empirically observed patterns of under- and overgeneralization in child language acquisition. However, as is well known, HBMs are “ideal”learningsystems, assumingaccess to unlimited computational resources that may not be available to child language learners. Consequently, it remains crucial to carefully assess the use of HBMs along with alternative, possibly simpler, candidate models. This paper presents such an evaluation for a language acquisi- tion domain where explicit HBMs have been proposed: the acquisition of English dative constructions. In particular, we present a detailed, empiricallygrounded model-selection comparison of HBMs vs. a simpler alternative based on clustering along with maximum likelihood estimation that we call linear competition learning (LCL). Our results demonstrate that LCL can match HBM model performance without incurring on the high computational costs associated with HBMs.

5 0.10416804 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

Author: Jesse Dunietz ; Lori Levin ; Jaime Carbonell

Abstract: Lexical resources such as WordNet and VerbNet are widely used in a multitude of NLP tasks, as are annotated corpora such as treebanks. Often, the resources are used as-is, without question or examination. This practice risks missing significant performance gains and even entire techniques. This paper addresses the importance of resource quality through the lens of a challenging NLP task: detecting selectional preference violations. We present DAVID, a simple, lexical resource-based preference violation detector. With asis lexical resources, DAVID achieves an F1-measure of just 28.27%. When the resource entries and parser outputs for a small sample are corrected, however, the F1-measure on that sample jumps from 40% to 61.54%, and performance on other examples rises, suggesting that the algorithm becomes practical given refined resources. More broadly, this paper shows that resource quality matters tremendously, sometimes even more than algorithmic improvements.

6 0.10275497 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

7 0.094040476 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

8 0.092192978 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

9 0.089678138 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

10 0.089199089 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering

11 0.089105032 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

12 0.085179791 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

13 0.082831427 378 acl-2013-Using subcategorization knowledge to improve case prediction for translation to German

14 0.081333451 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures

15 0.080709733 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

16 0.078122579 29 acl-2013-A Visual Analytics System for Cluster Exploration

17 0.076070175 248 acl-2013-Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

18 0.073416084 224 acl-2013-Learning to Extract International Relations from Political Context

19 0.070563905 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

20 0.070391744 116 acl-2013-Detecting Metaphor by Contextual Analogy


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, 0.034), (2, -0.002), (3, -0.103), (4, -0.066), (5, -0.062), (6, -0.055), (7, 0.079), (8, -0.013), (9, -0.016), (10, 0.009), (11, -0.028), (12, 0.004), (13, 0.024), (14, -0.068), (15, -0.031), (16, -0.009), (17, 0.025), (18, 0.143), (19, 0.057), (20, 0.116), (21, 0.014), (22, 0.058), (23, -0.076), (24, 0.148), (25, -0.063), (26, -0.04), (27, -0.007), (28, 0.056), (29, 0.126), (30, 0.026), (31, 0.02), (32, -0.057), (33, -0.1), (34, -0.166), (35, 0.035), (36, -0.177), (37, -0.14), (38, -0.074), (39, 0.173), (40, 0.016), (41, -0.008), (42, -0.027), (43, 0.046), (44, -0.066), (45, -0.069), (46, 0.009), (47, 0.006), (48, 0.068), (49, -0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93552476 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

2 0.91008437 119 acl-2013-Diathesis alternation approximation for verb clustering

Author: Lin Sun ; Diana McCarthy ; Anna Korhonen

Abstract: Although diathesis alternations have been used as features for manual verb classification, and there is recent work on incorporating such features in computational models of human language acquisition, work on large scale verb classification has yet to examine the potential for using diathesis alternations as input features to the clustering process. This paper proposes a method for approximating diathesis alternation behaviour in corpus data and shows, using a state-of-the-art verb clustering system, that features based on alternation approximation outperform those based on independent subcategorization frames. Our alternation-based approach is particularly adept at leveraging information from less frequent data.

3 0.76237714 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple

Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov

Abstract: Hierarchical Bayesian Models (HBMs) have been used with some success to capture empirically observed patterns of under- and overgeneralization in child language acquisition. However, as is well known, HBMs are “ideal”learningsystems, assumingaccess to unlimited computational resources that may not be available to child language learners. Consequently, it remains crucial to carefully assess the use of HBMs along with alternative, possibly simpler, candidate models. This paper presents such an evaluation for a language acquisi- tion domain where explicit HBMs have been proposed: the acquisition of English dative constructions. In particular, we present a detailed, empiricallygrounded model-selection comparison of HBMs vs. a simpler alternative based on clustering along with maximum likelihood estimation that we call linear competition learning (LCL). Our results demonstrate that LCL can match HBM model performance without incurring on the high computational costs associated with HBMs.

4 0.60385168 344 acl-2013-The Effects of Lexical Resource Quality on Preference Violation Detection

Author: Jesse Dunietz ; Lori Levin ; Jaime Carbonell

Abstract: Lexical resources such as WordNet and VerbNet are widely used in a multitude of NLP tasks, as are annotated corpora such as treebanks. Often, the resources are used as-is, without question or examination. This practice risks missing significant performance gains and even entire techniques. This paper addresses the importance of resource quality through the lens of a challenging NLP task: detecting selectional preference violations. We present DAVID, a simple, lexical resource-based preference violation detector. With asis lexical resources, DAVID achieves an F1-measure of just 28.27%. When the resource entries and parser outputs for a small sample are corrected, however, the F1-measure on that sample jumps from 40% to 61.54%, and performance on other examples rises, suggesting that the algorithm becomes practical given refined resources. More broadly, this paper shows that resource quality matters tremendously, sometimes even more than algorithmic improvements.

5 0.54710186 310 acl-2013-Semantic Frames to Predict Stock Price Movement

Author: Boyi Xie ; Rebecca J. Passonneau ; Leon Wu ; German G. Creamer

Abstract: Semantic frames are a rich linguistic resource. There has been much work on semantic frame parsers, but less that applies them to general NLP problems. We address a task to predict change in stock price from financial news. Semantic frames help to generalize from specific sentences to scenarios, and to detect the (positive or negative) roles of specific companies. We introduce a novel tree representation, and use it to train predictive models with tree kernels using support vector machines. Our experiments test multiple text representations on two binary classification tasks, change of price and polarity. Experiments show that features derived from semantic frame parsing have significantly better performance across years on the polarity task.

6 0.53381175 366 acl-2013-Understanding Verbs based on Overlapping Verbs Senses

7 0.52029598 29 acl-2013-A Visual Analytics System for Cluster Exploration

8 0.51841468 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

9 0.48493579 186 acl-2013-Identifying English and Hungarian Light Verb Constructions: A Contrastive Approach

10 0.48107931 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures

11 0.47370207 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context

12 0.4625845 220 acl-2013-Learning Latent Personas of Film Characters

13 0.44803649 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

14 0.43932012 265 acl-2013-Outsourcing FrameNet to the Crowd

15 0.43539351 224 acl-2013-Learning to Extract International Relations from Political Context

16 0.43277675 286 acl-2013-Psycholinguistically Motivated Computational Models on the Organization and Processing of Morphologically Complex Words

17 0.42648107 149 acl-2013-Exploring Word Order Universals: a Probabilistic Graphical Model Approach

18 0.40055171 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis

19 0.39948499 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

20 0.39650947 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.055), (6, 0.029), (11, 0.084), (14, 0.018), (15, 0.012), (17, 0.252), (24, 0.039), (26, 0.046), (28, 0.017), (35, 0.068), (42, 0.07), (48, 0.054), (70, 0.056), (88, 0.026), (90, 0.026), (95, 0.068)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78470671 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

Author: Roi Reichart ; Anna Korhonen

Abstract: Subcategorization frames (SCFs), selectional preferences (SPs) and verb classes capture related aspects of the predicateargument structure. We present the first unified framework for unsupervised learning of these three types of information. We show how to utilize Determinantal Point Processes (DPPs), elegant probabilistic models that are defined over the possible subsets of a given dataset and give higher probability mass to high quality and diverse subsets, for clustering. Our novel clustering algorithm constructs a joint SCF-DPP DPP kernel matrix and utilizes the efficient sampling algorithms of DPPs to cluster together verbs with similar SCFs and SPs. We evaluate the induced clusters in the context of the three tasks and show results that are superior to strong baselines for each 1.

2 0.75826001 167 acl-2013-Generalizing Image Captions for Image-Text Parallel Corpus

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: The ever growing amount of web images and their associated texts offers new opportunities for integrative models bridging natural language processing and computer vision. However, the potential benefits of such data are yet to be fully realized due to the complexity and noise in the alignment between image content and text. We address this challenge with contributions in two folds: first, we introduce the new task of image caption generalization, formulated as visually-guided sentence compression, and present an efficient algorithm based on dynamic beam search with dependency-based constraints. Second, we release a new large-scale corpus with 1 million image-caption pairs achieving tighter content alignment between images and text. Evaluation results show the intrinsic quality of the generalized captions and the extrinsic utility of the new imagetext parallel corpus with respect to a concrete application of image caption transfer.

3 0.59588957 119 acl-2013-Diathesis alternation approximation for verb clustering

Author: Lin Sun ; Diana McCarthy ; Anna Korhonen

Abstract: Although diathesis alternations have been used as features for manual verb classification, and there is recent work on incorporating such features in computational models of human language acquisition, work on large scale verb classification has yet to examine the potential for using diathesis alternations as input features to the clustering process. This paper proposes a method for approximating diathesis alternation behaviour in corpus data and shows, using a state-of-the-art verb clustering system, that features based on alternation approximation outperform those based on independent subcategorization frames. Our alternation-based approach is particularly adept at leveraging information from less frequent data.

4 0.5878644 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu

Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.

5 0.58581686 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

Author: Ji Ma ; Jingbo Zhu ; Tong Xiao ; Nan Yang

Abstract: In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. We propose a simple variant of “early-update” to ensure valid update in the training process. The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity. On CTB, we achieve 94.01% tagging accuracy and 86.33% unlabeled attachment score with a relatively small beam width. On PTB, we also achieve state-of-the-art performance. 1

6 0.58361554 111 acl-2013-Density Maximization in Context-Sense Metric Space for All-words WSD

7 0.58169276 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching

8 0.58045626 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

9 0.57977998 80 acl-2013-Chinese Parsing Exploiting Characters

10 0.57822675 275 acl-2013-Parsing with Compositional Vector Grammars

11 0.57669973 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

12 0.57520604 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

13 0.57499576 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

14 0.57419515 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

15 0.57332265 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

16 0.5732789 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

17 0.57314301 225 acl-2013-Learning to Order Natural Language Texts

18 0.57313776 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

19 0.57148755 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

20 0.57059389 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation