acl acl2013 acl2013-283 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. [sent-4, score-0.227]
2 In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. [sent-5, score-0.308]
3 We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. [sent-6, score-0.637]
4 Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. [sent-7, score-0.476]
5 For example, in information extraction, a list of slots in the target domain is given to the system, and in natural language generation, content models are trained to learn the content structure of texts in the target domain for information structuring and automatic summarization. [sent-10, score-0.372]
6 (2013) formalized a domain as a set of frames consisting of prototypical sequences of events, slots, and slot fillers or entities, inspired by classical AI work such as Schank and Abelson’s (1977) scripts. [sent-20, score-0.407]
7 These would be indicated by event heads such as kill, arrest, charge, plead. [sent-23, score-0.267]
8 Relevant slots would include VICTIM, SUSPECT, AUTHORITIES, PLEA, etc. [sent-24, score-0.144]
9 By contrast, distributional semantic models are trained on large, domain-general corpora. [sent-27, score-0.25]
10 In this paper, we propose to inject contextualized distributional semantic vectors into generative probabilistic models, in order to combine their complementary strengths for domain modelling. [sent-36, score-0.716]
11 There are a number of potential advantages that distributional semantic models offer. [sent-37, score-0.25]
12 Second, the contextualiza- tion process allows the semantic vectors to implicitly encode disambiguated word sense and syntactic information, without further adding to the complexity of the generative model. [sent-39, score-0.193]
13 Our model, the Distributional Semantic Hidden Markov Model (DSHMM), incorporates contextualized distributional semantic vectors into a generative probabilistic model as observed emissions. [sent-40, score-0.653]
14 First, we apply it to slot induction on guided summarization data over five different domains. [sent-42, score-0.609]
15 We show that our model outperforms a baseline version of our method that does not use distributional semantic vectors, as well as a recent state-of-the-art template induction method. [sent-43, score-0.346]
16 Then, we perform an extrinsic evaluation using multi-document summarization, wherein we show that our model is able to learn event and slot topics that are appropriate to include in a summary. [sent-44, score-0.571]
17 From a modelling perspective, these results show that probabilistic models for content modelling and template induction benefit from distributional semantics trained on a much larger cor- pus. [sent-45, score-0.586]
18 From the perspective of distributional semantics, this work broadens the variety of problems to which distributional semantics can be applied, and proposes methods to perform inference in a probabilistic setting beyond geometric measures such as cosine similarity. [sent-46, score-0.443]
19 2 Related Work Probabilistic content models were proposed by Barzilay and Lee (2004), and related models have since become popular for summarization (Fung and Ngai, 2006; Haghighi and Vanderwende, 2009), and information ordering (Elsner et al. [sent-47, score-0.223]
20 Other related generative models include topic models and structured versions thereof (Blei et al. [sent-49, score-0.165]
21 In terms of domain learning in the form of template induction, heuristic methods involving multiple clustering steps have been proposed (Filatova et al. [sent-52, score-0.146]
22 , 1975), but only recently have distributional models been proposed that are compositional (Mitchell and Lapata, 2008; Clark et al. [sent-59, score-0.208]
23 We recently showed how such models can be evaluated for their ability to support semantic inference for use in complex NLP tasks like question answering or automatic summarization (Cheung and Penn, 2012). [sent-62, score-0.196]
24 Combining distributional information and probabilistic models has actually been explored in previous work. [sent-63, score-0.259]
25 This model can be thought of as an HMM with two layers of latent variables, representing events and slots in the domain. [sent-71, score-0.205]
26 Given a document consisting of a sequence of T clauses headed by propositional heads (verbs or event nouns), and argument H~ A~, a DSHMMmodels the joint probability of observations H~, A~, and latent random variables E~ and S~ representing domain events and slots respectively; i. [sent-72, score-0.632]
27 More specifically, it generates the event heads and arguments which are crucial in identifying events and slots. [sent-77, score-0.366]
28 We assume that event heads are verbs or event nouns, while arguments are the head words oftheir syntactically dependent noun phrases. [sent-78, score-0.604]
29 Within each clause, a hierarchy oflatent and observed variables maps to corresponding elements in the clause (Table 1), as follows: Event Variables At the top-level, a categorical latent variable Et with NE possible states represents the event that is described by clause t. [sent-80, score-0.466]
30 The internal structure of the clause is generated by conditioning on the state of Et, including the head of the clause, and the slots for each argument in the clause. [sent-83, score-0.423]
31 Slot Variables Categorical latent variables with NS possible states represent the slot that an argument fills, and are conditioned on the event variable in the clause, Et (i. [sent-84, score-0.632]
32 Head and Argument Emissions The head of the clause Ht is conditionally dependent on Et, and each argument Ata is likewise conditioned on its slot variable Sta. [sent-88, score-0.603]
33 Unlike in most applications of HMMs in text processing, in which the representation of a token is simply its word or lemma identity, tokens in DSHMM are also associated with a vector representation of their meaning in context according to a distributional semantic model (Section 3. [sent-89, score-0.338]
34 Thus, the emissions can be decomposed into pairs Ht = (lemma(Ht) , sem(Ht)) and Ata = (lemma(Ata) , sem(Ata)), where lemma and sem are functions that return the lemma identity and the semantic vector respectively. [sent-91, score-0.365]
35 The probability of the head of a clause is thus: PH(Ht|Et) = PlHemm(lemma(Ht)|Et) (1) ×× PsHem(sem(Ht)|Et), and the probability of a clausal argument is likewise: PA(Ata |Sta) = PlAemm(lemma(Ata) |Sta) PsAem(sem(Ata) |Sta) . [sent-92, score-0.223]
36 1 Vector Space Models of Semantics In this section, we describe several methods for producing the semantic vectors associated with each event head or argument; i. [sent-98, score-0.403]
37 We start with a description of the training of a basic model without any contextualization, then describe several contextualized models based on recent work. [sent-102, score-0.273]
38 Component-wise Operators Mitchell and Lapata (2008) investigate using component-wise operators to combine the vectors of verbs and their intransitive subjects. [sent-109, score-0.142]
39 We use component-wise operators to contextualize our vectors, but by combining with all of the arguments, and regardless of the event head’s category. [sent-110, score-0.267]
40 Let event head h be the syntactic head of a number of arguments a1, a2, . [sent-111, score-0.369]
41 Selectional Preferences Erk and Pad o´ (2008) (E&P;) incorporate inverse selectional preferences into their contextualization function. [sent-130, score-0.154]
42 The intuition is that a word should be contextualized such that its vector representation becomes more similar to the vectors of other words that its dependency neighbours often take in the same syntactic position. [sent-131, score-0.389]
43 For example, suppose catch is the head of the noun ball, in the relation of a direct object. [sent-132, score-0.149]
44 Then, the vector for ball would be contextualized to become similar to the vectors for other frequent direct objects of catch, such as baseball, or cold. [sent-133, score-0.389]
45 Likewise, the vector for catch would be contextualized to become similar to the vectors for throw, hit, etc. [sent-134, score-0.43]
46 Afterwards, the contextualized vector in the original space, can be transformed into a vector in the reduced space, cR, by cR = Σk−1VkT c~. [sent-140, score-0.341]
47 Distributional semantic vectors are traditionally compared by measures which ignore vector magnitudes, such as cosine similarity, but a multivari- c, µ ate Gaussian is sensitive to magnitudes. [sent-141, score-0.194]
48 395 We model the emission of these contextualized vectors in DSHMM as multivariate Gaussian distributions, so the semantic vector emissions can be written as PsHem, PsAem ∼ N(µ, Σ), where ∈ Rk is the mean and Σ ∈ RNk(×µk, Σis) ,t wheh ecroev µari ∈an Rce misa tthreix. [sent-143, score-0.488]
49 Train a distributional semantic model on a large, domain-general corpus. [sent-153, score-0.214]
50 Preprocess and generate contextualized vectors of event heads and arguments in the small corpus in the target domain. [sent-155, score-0.642]
51 Draw categorical distributions PiEnit; PE, PS, PlHemm (one per event state); PlAemm (one per slot state) from Dirichlet 2. [sent-159, score-0.646]
52 multivariate Gaussians PsHem,PsAemfor each event and slot state, respectively. [sent-161, score-0.535]
53 Generate the event hea∼d components lemm(Ht) ∼ PlHemm, sem(Ht) ∼ PsHem. [sent-172, score-0.191]
54 lemm(Ata) ∼ 4 Experiments PlAemm, sem(Ata) ∼ We trained the distributional semantic models using the Annotated Gigaword corpus (Napoles et al. [sent-177, score-0.25]
55 — We then trained DSHMM and conducted our evaluations on the TAC 2010 guided summarization data set (Owczarzak and Dang, 2010). [sent-182, score-0.183]
56 Lemmatization and extraction of event heads and arguments are done by preprocessing with the Stanford CoreNLP tool suite (Toutanova et al. [sent-183, score-0.305]
57 This data set contains 46 topic clusters of 20 articles each, grouped into five topic categories or domains. [sent-186, score-0.134]
58 Each topic cluster contains eight human-written “model” summaries (“model” here meaning a gold standard). [sent-188, score-0.216]
59 Half of the articles and model summaries in a topic cluster are used in the guided summarization task, and the rest are used in the update summarization task. [sent-189, score-0.517]
60 First, templates for the domains are provided, and the model summaries are annotated with slots from the template, allowing for an intrinsic evaluation of slot induction (Section 5). [sent-191, score-0.69]
61 Second, it contains multiple domain instances for each of the domains, and each domain instance comes annotated with eight model summaries, allowing for an extrinsic evaluation of our system (Section 6). [sent-192, score-0.162]
62 As part of the official TAC evaluation proce- dure, model summaries were manually segmented into contributors, and labelled with the slot in the TAC template that the contributor expresses. [sent-195, score-0.514]
63 For example, a summary fragment such as On 20 April 1999, a massacre occurred at Columbine High School is segmented into the contributors: (On 20 April 1999, WHEN); (a massacre occurred, WHAT); and (at Columbine High School, WHERE). [sent-196, score-0.149]
64 In the slot induction evaluation, this annotation is used as follows. [sent-197, score-0.426]
65 First, the maximal noun phrases are extracted from the contributors and clustered based on the TAC slot of the contributor. [sent-198, score-0.436]
66 These clusters of noun phrases then become the gold standard clusters against which automatic systems are compared. [sent-199, score-0.138]
67 This accounts for the fact that human annotators often only label the first occurrence of a word that belongs to a slot in a summary, and follows the standard evaluation procedure in previous information extraction tasks, such as MUC-4. [sent-201, score-0.344]
68 The same system cluster may be mapped multiple times, because several TAC slots can overlap. [sent-206, score-0.198]
69 For example, in the NATURAL DISASTERS domain, an earthquake may fit both the WHAT slot as well as the CAUSE slot, because it generated a tsunami. [sent-207, score-0.344]
70 We then extracted noun phrase clusters from the model summaries according to the slot labels produced by running the Viterbi algorithm on them. [sent-209, score-0.552]
71 5* w/ M&L;× Table 2: Slot induction results on the TAC guided summarization data set. [sent-228, score-0.265]
72 DSHMM with contextualized semantic vectors achieves the highest F1s, and are significantly better than PROFINDER. [sent-239, score-0.379]
73 Importantly, the contextualization of the vectors seems to be beneficial, at least with the M&L; component-wise operators. [sent-245, score-0.254]
74 In the next section, we show that the improve- ment from contextualization transfers to multidocument summarization results. [sent-246, score-0.272]
75 1), and a supervised variant that makes use of indomain summaries to learn the salient slots and events in the domain (Section 6. [sent-252, score-0.388]
76 1 A KL-based Criterion There are four main component distributions from our model that should be considered during extraction: (1) the distribution of events, (2) the distribution of slots, (3) the distribution of event heads, and (4) the distribution of arguments. [sent-255, score-0.253]
77 We estimate (1) as the context-independent probability ofbeing in a certain event state, which can be calculated using the Inside-Outside algorithm. [sent-256, score-0.191]
78 Slot distributions and (2) are defined analogously, where the summation occurs along all the slot variables. [sent-259, score-0.406]
79 For (3) and (4), we simply use the MLE estimates of the lemma emissions, where the estimates are made over the source text and the candidate summary instead of over the entire training set. [sent-260, score-0.143]
80 2 Supervised Learning The above unsupervised method results in summaries that closely mirror the source text in terms of the event and slot distributions, but this ignores the fact that not all such topics should be included in a summary. [sent-266, score-0.655]
81 Thus, we implemented a second method, in which we modify the KL criterion above by estimating and from other model summaries that are drawn from the same domain (i. [sent-269, score-0.219]
82 topic category), except for those summaries that are written for the specific topic cluster to be PˆE PˆS used for evaluation. [sent-271, score-0.258]
83 3 Method and Results We used the best performing models from the slot induction task and the above unsupervised and supervised methods based on KL-divergence to produce 100-word summaries of the guided summarization source text clusters. [sent-273, score-0.765]
84 For instance, top performing systems in TAC 2010 make use of manually constructed lists of entities known to fit the slots in the provided templates and sample topic statements, which our method automatically learns. [sent-324, score-0.186]
85 Be- cause the unsupervised method mimics the distributions in the source text at all levels, the method may negate the benefit of learning and simply produce summaries that match the source text in the word distributions, thus being an approximation of KLSUM. [sent-330, score-0.182]
86 As in the previous evaluation, the models with contextualized semantic vectors provide the best performance. [sent-332, score-0.415]
87 M&L;× performs very well, as in slot induction, b Mut& EL&×P paelrsfoo performs well, ausn li nke s oint the previous evaluation. [sent-333, score-0.344]
88 This result reinforces the importance of the contextualization procedure for distributional semantic models. [sent-334, score-0.368]
89 For each event state, we calculated the ratio PˆsEumm(e)/PˆsEource(e), for the probability of an event state e as estimated from the training summaries and the the source text respectively. [sent-336, score-0.558]
90 We selected the most preferred and dispreferred event and slot for each document cluster, and took the three most probable lemmata from the associated lemma distribution (Table 4). [sent-339, score-0.701]
91 It seems that supervision is beneficial because it picks out important event heads and arguments in the domain, such as charge, trial, and murder in the TRIALS domain. [sent-340, score-0.337]
92 PˆsSumm(s)/PˆsSource(s) 7 Conclusion We have shown that contextualized distributional semantic vectors can be successfully integrated into a generative probabilistic model for domain modelling, as demonstrated by improvements in slot induction and multi-document summarization. [sent-342, score-1.142]
93 The effectiveness of our model stems from the use of a large domain-general corpus to train the distributional semantic vectors, and the implicit syntactic and word sense information pro399 ferred (−) events and slots after supervised training. [sent-343, score-0.419]
94 While we have focused on the overall clustering of entities and the distribution of event and slot topics in this work, we would also like to investigate discourse modelling and content structuring. [sent-346, score-0.658]
95 Finally, our work shows that the application of distributional semantics to NLP tasks need not be confined to lexical disambiguation. [sent-347, score-0.22]
96 We would like to see modern distributional semantic methods incorporated into an even greater variety of applications. [sent-348, score-0.214]
97 2The event head say happens to appear in both the most preferred and dispreferred events in the ATTACKS domain. [sent-359, score-0.356]
98 Evaluating distributional models of semantics for syntactically invariant inference. [sent-379, score-0.256]
99 Combining optimal clustering and hidden markov models for extractive summarization. [sent-419, score-0.19]
100 Experimental support for a categorical compositional distributional model of meaning. [sent-424, score-0.221]
wordName wordTfidf (topN-words)
[('dshmm', 0.485), ('slot', 0.344), ('contextualized', 0.237), ('profinder', 0.194), ('event', 0.191), ('distributional', 0.172), ('ata', 0.156), ('contextualization', 0.154), ('slots', 0.144), ('cheung', 0.142), ('ht', 0.122), ('summaries', 0.12), ('summarization', 0.118), ('sta', 0.102), ('tac', 0.102), ('vectors', 0.1), ('clause', 0.094), ('induction', 0.082), ('pienit', 0.078), ('heads', 0.076), ('lemma', 0.072), ('summary', 0.071), ('head', 0.07), ('sem', 0.07), ('hmm', 0.069), ('dkl', 0.069), ('guided', 0.065), ('domain', 0.063), ('distributions', 0.062), ('events', 0.061), ('lemmata', 0.06), ('argument', 0.059), ('columbine', 0.058), ('plaemm', 0.058), ('plhemm', 0.058), ('modelling', 0.057), ('emissions', 0.057), ('state', 0.056), ('contributors', 0.054), ('cluster', 0.054), ('toronto', 0.052), ('vector', 0.052), ('probabilistic', 0.051), ('generative', 0.051), ('clusters', 0.05), ('template', 0.05), ('categorical', 0.049), ('semantics', 0.048), ('yt', 0.046), ('jackie', 0.045), ('cr', 0.044), ('extractive', 0.043), ('markov', 0.043), ('vh', 0.042), ('semantic', 0.042), ('operators', 0.042), ('topic', 0.042), ('catch', 0.041), ('covariance', 0.041), ('fung', 0.04), ('klscore', 0.039), ('lemm', 0.039), ('massacre', 0.039), ('ormoneit', 0.039), ('psaem', 0.039), ('pshem', 0.039), ('variables', 0.038), ('noun', 0.038), ('arguments', 0.038), ('vanderwende', 0.037), ('hyperparameters', 0.036), ('likewise', 0.036), ('criterion', 0.036), ('models', 0.036), ('louis', 0.036), ('thater', 0.036), ('extrinsic', 0.036), ('hidden', 0.035), ('contextualize', 0.034), ('timestep', 0.034), ('dispreferred', 0.034), ('haghighi', 0.034), ('rouge', 0.034), ('content', 0.033), ('lapata', 0.033), ('clustering', 0.033), ('barzilay', 0.033), ('kit', 0.032), ('chi', 0.032), ('wx', 0.032), ('asterisks', 0.032), ('oront', 0.032), ('murder', 0.032), ('filatova', 0.032), ('schank', 0.032), ('gruber', 0.032), ('owczarzak', 0.03), ('chambers', 0.029), ('kl', 0.029), ('mitchell', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.
2 0.17757891 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
Author: Oren Melamud ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates is widely addressed by computing distributional similarity scores between vectors of argument words. In this scheme, prior work typically refrained from learning rules for low frequency predicates associated with very sparse argument vectors due to expected low reliability. To improve the learning of such rules in an unsupervised way, we propose to lexically expand sparse argument word vectors with semantically similar words. Our evaluation shows that lexical expansion significantly improves performance in comparison to state-of-the-art baselines.
3 0.17592771 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
Author: Oren Melamud ; Jonathan Berant ; Ido Dagan ; Jacob Goldberger ; Idan Szpektor
Abstract: Automatic acquisition of inference rules for predicates has been commonly addressed by computing distributional similarity between vectors of argument words, operating at the word space level. A recent line of work, which addresses context sensitivity of rules, represented contexts in a latent topic space and computed similarity over topic vectors. We propose a novel two-level model, which computes similarities between word-level vectors that are biased by topic-level context representations. Evaluations on a naturallydistributed dataset show that our model significantly outperforms prior word-level and topic-level models. We also release a first context-sensitive inference rule set.
4 0.15660112 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou
Abstract: As a paratactic language, sentence-level argument extraction in Chinese suffers much from the frequent occurrence of ellipsis with regard to inter-sentence arguments. To resolve such problem, this paper proposes a novel global argument inference model to explore specific relationships, such as Coreference, Sequence and Parallel, among relevant event mentions to recover those intersentence arguments in the sentence, discourse and document layers which represent the cohesion of an event or a topic. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our global argument inference model over a state-of-the-art baseline. 1
5 0.14345756 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
Author: Kartik Goyal ; Sujay Kumar Jauhar ; Huiying Li ; Mrinmaya Sachan ; Shashank Srivastava ; Eduard Hovy
Abstract: In this paper we present a novel approach to modelling distributional semantics that represents meaning as distributions over relations in syntactic neighborhoods. We argue that our model approximates meaning in compositional configurations more effectively than standard distributional vectors or bag-of-words models. We test our hypothesis on the problem of judging event coreferentiality, which involves compositional interactions in the predicate-argument structure of sentences, and demonstrate that our model outperforms both state-of-the-art window-based word embeddings as well as simple approaches to compositional semantics pre- viously employed in the literature.
7 0.13160309 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
8 0.12514512 238 acl-2013-Measuring semantic content in distributional vectors
9 0.12514363 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
10 0.11704054 296 acl-2013-Recognizing Identical Events with Graph Kernels
11 0.11644334 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
12 0.11506508 333 acl-2013-Summarization Through Submodularity and Dispersion
13 0.11140093 224 acl-2013-Learning to Extract International Relations from Political Context
14 0.10233575 153 acl-2013-Extracting Events with Informal Temporal References in Personal Histories in Online Communities
15 0.10062008 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
16 0.099727884 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
17 0.09751007 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
18 0.095628418 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics
19 0.090100169 87 acl-2013-Compositional-ly Derived Representations of Morphologically Complex Words in Distributional Semantics
20 0.086240508 319 acl-2013-Sequential Summarization: A New Application for Timely Updated Twitter Trending Topics
topicId topicWeight
[(0, 0.225), (1, 0.061), (2, -0.0), (3, -0.158), (4, -0.025), (5, 0.055), (6, 0.061), (7, 0.109), (8, -0.199), (9, -0.02), (10, 0.007), (11, 0.011), (12, 0.081), (13, -0.003), (14, -0.038), (15, 0.048), (16, 0.026), (17, -0.174), (18, 0.111), (19, 0.048), (20, -0.01), (21, -0.116), (22, 0.045), (23, 0.01), (24, 0.007), (25, -0.076), (26, 0.026), (27, 0.006), (28, 0.009), (29, 0.026), (30, -0.013), (31, 0.032), (32, 0.019), (33, -0.017), (34, -0.027), (35, 0.106), (36, -0.013), (37, -0.003), (38, -0.026), (39, -0.027), (40, -0.01), (41, -0.035), (42, -0.006), (43, -0.095), (44, 0.015), (45, -0.002), (46, 0.045), (47, 0.053), (48, 0.004), (49, 0.091)]
simIndex simValue paperId paperTitle
same-paper 1 0.92169267 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.
2 0.69067234 353 acl-2013-Towards Robust Abstractive Multi-Document Summarization: A Caseframe Analysis of Centrality and Domain
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: In automatic summarization, centrality is the notion that a summary should contain the core parts of the source text. Current systems use centrality, along with redundancy avoidance and some sentence compression, to produce mostly extractive summaries. In this paper, we investigate how summarization can advance past this paradigm towards robust abstraction by making greater use of the domain of the source text. We conduct a series of studies comparing human-written model summaries to system summaries at the semantic level of caseframes. We show that model summaries (1) are more abstractive and make use of more sentence aggregation, (2) do not contain as many topical caseframes as system summaries, and (3) cannot be reconstructed solely from the source text, but can be if texts from in-domain documents are added. These results suggest that substantial improvements are unlikely to result from better optimizing centrality-based criteria, but rather more domain knowledge is needed.
3 0.67838979 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
Author: Kartik Goyal ; Sujay Kumar Jauhar ; Huiying Li ; Mrinmaya Sachan ; Shashank Srivastava ; Eduard Hovy
Abstract: In this paper we present a novel approach to modelling distributional semantics that represents meaning as distributions over relations in syntactic neighborhoods. We argue that our model approximates meaning in compositional configurations more effectively than standard distributional vectors or bag-of-words models. We test our hypothesis on the problem of judging event coreferentiality, which involves compositional interactions in the predicate-argument structure of sentences, and demonstrate that our model outperforms both state-of-the-art window-based word embeddings as well as simple approaches to compositional semantics pre- viously employed in the literature.
4 0.63551742 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
Author: Lu Wang ; Claire Cardie
Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.
5 0.61395139 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
Author: Rebecca J. Passonneau ; Emily Chen ; Weiwei Guo ; Dolores Perin
Abstract: The pyramid method for content evaluation of automated summarizers produces scores that are shown to correlate well with manual scores used in educational assessment of students’ summaries. This motivates the development of a more accurate automated method to compute pyramid scores. Of three methods tested here, the one that performs best relies on latent semantics.
6 0.61227489 333 acl-2013-Summarization Through Submodularity and Dispersion
7 0.59474635 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization
8 0.59206504 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
9 0.57887417 224 acl-2013-Learning to Extract International Relations from Political Context
10 0.57729036 178 acl-2013-HEADY: News headline abstraction through event pattern clustering
11 0.5674364 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
12 0.55741262 377 acl-2013-Using Supervised Bigram-based ILP for Extractive Summarization
13 0.54802412 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
14 0.54295146 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
15 0.53675622 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules
16 0.53664011 32 acl-2013-A relatedness benchmark to test the role of determiners in compositional distributional semantics
17 0.53194493 238 acl-2013-Measuring semantic content in distributional vectors
18 0.52981663 296 acl-2013-Recognizing Identical Events with Graph Kernels
19 0.51510644 157 acl-2013-Fast and Robust Compressive Summarization with Dual Decomposition and Multi-Task Learning
20 0.50796598 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
topicId topicWeight
[(0, 0.047), (6, 0.053), (11, 0.077), (14, 0.011), (15, 0.017), (24, 0.041), (26, 0.049), (28, 0.02), (35, 0.129), (42, 0.049), (48, 0.071), (58, 0.165), (64, 0.016), (70, 0.063), (88, 0.02), (90, 0.027), (95, 0.065)]
simIndex simValue paperId paperTitle
1 0.92994416 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints
Author: Alessandro Valitutti ; Hannu Toivonen ; Antoine Doucet ; Jukka M. Toivanen
Abstract: We propose a method for automated generation of adult humor by lexical replacement and present empirical evaluation results of the obtained humor. We propose three types of lexical constraints as building blocks of humorous word substitution: constraints concerning the similarity of sounds or spellings of the original word and the substitute, a constraint requiring the substitute to be a taboo word, and constraints concerning the position and context of the replacement. Empirical evidence from extensive user studies indicates that these constraints can increase the effectiveness of humor generation significantly.
same-paper 2 0.85379398 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: Generative probabilistic models have been used for content modelling and template induction, and are typically trained on small corpora in the target domain. In contrast, vector space models of distributional semantics are trained on large corpora, but are typically applied to domaingeneral lexical disambiguation tasks. We introduce Distributional Semantic Hidden Markov Models, a novel variant of a hidden Markov model that integrates these two approaches by incorporating contextualized distributional semantic vectors into a generative model as observed emissions. Experiments in slot induction show that our approach yields improvements in learning coherent entity clusters in a domain. In a subsequent extrinsic evaluation, we show that these improvements are also reflected in multi-document summarization.
3 0.82385767 351 acl-2013-Topic Modeling Based Classification of Clinical Reports
Author: Efsun Sarioglu ; Kabir Yadav ; Hyeong-Ah Choi
Abstract: Kabir Yadav Emergency Medicine Department The George Washington University Washington, DC, USA kyadav@ gwu . edu Hyeong-Ah Choi Computer Science Department The George Washington University Washington, DC, USA hcho i gwu . edu @ such as recommending the need for a certain medical test while avoiding intrusive tests or medical Electronic health records (EHRs) contain important clinical information about pa- tients. Some of these data are in the form of free text and require preprocessing to be able to used in automated systems. Efficient and effective use of this data could be vital to the speed and quality of health care. As a case study, we analyzed classification of CT imaging reports into binary categories. In addition to regular text classification, we utilized topic modeling of the entire dataset in various ways. Topic modeling of the corpora provides interpretable themes that exist in these reports. Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes. A binary topic model was also built as an unsupervised classification approach with the assumption that each topic corresponds to a class. And, finally an aggregate topic classifier was built where reports are classified based on a single discriminative topic that is determined from the training dataset. Our proposed topic based classifier system is shown to be competitive with existing text classification techniques and provides a more efficient and interpretable representation.
4 0.81640238 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao
Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.
5 0.76874846 275 acl-2013-Parsing with Compositional Vector Grammars
Author: Richard Socher ; John Bauer ; Christopher D. Manning ; Ng Andrew Y.
Abstract: Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.
6 0.76455599 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
7 0.76403874 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
8 0.75992972 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics
9 0.75988829 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
10 0.75780642 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
11 0.75776088 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering
12 0.75690746 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
13 0.75512505 22 acl-2013-A Structured Distributional Semantic Model for Event Co-reference
14 0.75369513 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
15 0.75302339 176 acl-2013-Grounded Unsupervised Semantic Parsing
16 0.75292397 172 acl-2013-Graph-based Local Coherence Modeling
17 0.75277078 175 acl-2013-Grounded Language Learning from Video Described with Sentences
19 0.75209773 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
20 0.75157851 224 acl-2013-Learning to Extract International Relations from Political Context