acl acl2012 acl2012-145 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
Reference: text
sentIndex sentText sentNum sentScore
1 Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. [sent-4, score-0.442]
2 In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. [sent-5, score-0.286]
3 Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. [sent-8, score-0.206]
4 1 Introduction Identifying the degree of semantic similarity [SS] between two sentences is at the core of many NLP applications that focus on sentence level semantics such as Machine Translation (Kauchak and Barzilay, 2006), Summarization (Zhou et al. [sent-10, score-0.333]
5 , 2003) can solve the two issues naturally by modeling the semantics of words and sentences simultaneously in the low-dimensional latent space. [sent-20, score-0.385]
6 We believe that the latent semantics approaches applied to date to the SS problem have not yielded positive results due to the deficient modeling of the sparsity in the semantic space. [sent-24, score-0.341]
7 SS operates in a very limited contextual setting where the sentences are typically very short to derive robust latent semantics. [sent-25, score-0.252]
8 Apart from the SS setting, robust modeling of the latent semantics of short sentences/texts is becom- ing a pressing need due to the pervasive presence of more bursty data sets such as Twitter feeds and SMS where short contexts are an inherent characteristic of the data. [sent-26, score-0.39]
9 In this paper, we propose to model the missing words (words that are not in the sentence), a feature that is typically overlooked in the text modeling literature, to address the sparseness issue for the SS task. [sent-27, score-0.442]
10 We define the missing words of a sentence as the whole vocabulary in a corpus minus the observed words in the sentence. [sent-28, score-0.592]
11 Our intuition is since observed words in a sentence are too few to tell us what the sentence is about, missing words can be used to tell us what the sentence is not about. [sent-29, score-0.72]
12 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi8c 6s4–872, and missing words make up the complete semantics profile of a sentence. [sent-32, score-0.569]
13 After analyzing the way traditional latent variable models (LSA, PLSA/LDA) handle missing words, we decide to model sentences using a weighted matrix factorization approach (Srebro and Jaakkola, 2003), which allows us to treat observed words and missing words differently. [sent-33, score-1.288]
14 We handle missing words using a weighting scheme that distinguishes missing words from observed words yielding robust latent vectors for sentences. [sent-34, score-1.176]
15 2 Limitations of Topic Models and LSA for Modeling Sentences Usually latent variable models aim to find a latent semantic profile for a sentence that is most relevant to the observed words. [sent-43, score-0.622]
16 By explicitly modeling missing words, we set another criterion to the latent semantics profile: it should not be related to the miss- × ing words in the sentence. [sent-44, score-0.733]
17 However, missing words are not as informative as observed words, hence the need for a model that does a good job of modeling missing words at the right level of emphasis/impact is central to completing the semantic picture for a sentence. [sent-45, score-0.983]
18 Given a corpus, the row entries of the matrix are the unique M words in the corpus, and the N columns are the sentence ids. [sent-47, score-0.215]
19 The yielded M N co-occurrence matrix X comprises tyhiee TdeFd-ID MF × ×va Nlue cso -ino cecauchrr Xij cell, namely tmhaptr iTseF-s IDF value of word wi in sentence sj. [sent-48, score-0.254]
20 For ease of exposition, we will illustrate the problem using a special case of the SS framework where the sentences are concept definitions in a dictionary such 865 as WordNet (Fellbaum, 1998) (WN). [sent-49, score-0.299]
21 Therefore, the sentence corresponding to the concept definition of bank#n#1 is a sparse vector in X containing the following observed words where Xij 0: the 0. [sent-50, score-0.41]
22 ) in matrix X that do not occur in the concept definition are considered missing words for the concept entry bank#n#1, thereby their Xij = 0 . [sent-64, score-0.76]
23 Topic models (PLSA/LDA) do not explicitly model missing words. [sent-65, score-0.374]
24 Therefore, PLSA finds a topic distribution f|ozr each concept definition that maximizes the log likelihood of the corpus X (LDA has a similar form): XXXijlogXP(zk|dj)P(wi|zk) Xi Xj (1) Xk In this formulation, missing words do not contribute to the estimation of sentence semantics, i. [sent-67, score-0.696]
25 , excluding missing words (Xij = 0) in equation 1 does not make a difference. [sent-69, score-0.442]
26 However, empirical results show that given a small number of observed words, usually topic models can only find one topic (most evident topic) for a sentence, e. [sent-70, score-0.186]
27 , the concept definitions of bank#n#1 and stock#n#1 are assigned the fi- nancial topic only without any further discernability. [sent-72, score-0.33]
28 This results in many sentences are assigned exactly the same semantics profile as long as they are pertaining/mentioned within the same domain/topic. [sent-73, score-0.193]
29 The reason is topic models try to learn a 100dimension latent vector (assume dimension K = 100) from very few data points (10 observed words on average). [sent-74, score-0.446]
30 It would be desirable if topic models can exploit missing words (a lot more data than observed words) to render more nuanced latent semantics, so that pairs of sentences in the same domain can be differentiable. [sent-75, score-0.796]
31 On the other hand, LSA explicitly models missing words but not at the right level of emphasis. [sent-76, score-0.428]
32 501Rm Table 1: Three possible latent vectors hypotheses for the definition of bank#n#1 nius norm of difference between the two matrices is minimized: vuutXiXj? [sent-83, score-0.306]
33 2 (2) In effect, LSA allows missing and observed words to equally impact the objective function. [sent-85, score-0.503]
34 Given the inherent short length of the sentences, LSA (equation 2) allows for much more potential influence from missing words rather than observed words (99. [sent-86, score-0.558]
35 Moreover, the true semantics of the concept definitions is actually related to some missing words, but such true semantics will not be favored by the objective function, since equation 2 allows for too strong an impact by = 0 for any missing word. [sent-89, score-1.226]
36 Therefore the LSA model, in the context of short texts, is allowing missing words to have a significant “uncontrolled” impact on the model. [sent-90, score-0.461]
37 1 An Example The three latent semantics profiles in table 1 illustrate our analysis for topic models and LSA. [sent-92, score-0.322]
38 We use Rov to denote the sum of relatedness between latent vector v and all observed words; similarly, Rmv is the sum of relatedness between the vector v and all missing words. [sent-94, score-0.757]
39 The first latent vector (generated by topic models) is chosen by maxi- Xˆij × mizing Robs = 600. [sent-95, score-0.275]
40 The second latent vector (found by LSA) has the maximum value of Robs − Rmiss = 95, but obviously the latent vector is no−t rRelated to bank#n#1 at all. [sent-97, score-0.463]
41 This is because LSA treats observed words and missing words equally the same, and due to the large number of missing words, the information of observed words is lost: Robs Rmiss ≈ −Rmiss. [sent-98, score-1.002]
42 1 Weighted Matrix Factorization The weighted matrix factorization [WMF] ap- × × proach is very similar to SVD, except that it allows for direct control on each matrix cell Xij. [sent-106, score-0.304]
43 P and Q) are optimized by minimizing the objective function: XXWij Xi Xj (P·,i · Q·,j − Xij)2 + λ||P| 22 + λ||Q| 22 (3) where λ is a free regularization factor, and the weight matrix W defines a weight for each cell in X. [sent-109, score-0.196]
44 Accordingly, P·,i is a K-dimension latent semantics vector profile for word wi; similarly, Q·,j is the K-dimension vector profile that represents the sentence sj. [sent-110, score-0.569]
45 ×× (3) we can compute the similarity of two sentences sj and sj0 using the cosine similarity between Q·,j, Q·,j0. [sent-112, score-0.252]
46 The latent vectors in P and Q are first randomly initialized, then can be computed iteratively by the following equations (derivation is omitted due to limited space, which can be found in (Srebro and Jaakkola, 2003)): QP· ,j i= ? [sent-113, score-0.221]
47 1 (choosing a latent vector that maximizes Robs − 0. [sent-119, score-0.218]
48 01 Rmiss) in the WMF framework, by W˜(i) assigning a sm ×a Rll weight for all the missing words and minimizing equation 3: Wi,j=? [sent-120, score-0.478]
49 it explicitly tells the model that in general all missing words should not be related to the sentence; 2. [sent-123, score-0.428]
50 Typically, a user rates only some of the items, hence, the RS system needs to predict the missing ratings. [sent-127, score-0.348]
51 Steck (2010) guesses a value for all the missing cells, and sets a small weight for those cells. [sent-128, score-0.411]
52 We have a full matrix X where missing words have a 0 value, while the missing ratings in RS are unavailable the values are unknown, hence R is not complete. [sent-130, score-0.954]
53 The LI06 data set consists of 65 pairs of noun definitions selected from the Collin Cobuild Dictionary. [sent-141, score-0.226]
54 A subset of 30 pairs is further selected by LI06 to render the similarity scores evenly distributed. [sent-142, score-0.181]
55 We believe that the ratings on a data set for SS should accommodate variable degrees of similarity with various ratings, however such a large scale set does not exist yet. [sent-148, score-0.197]
56 Therefore for purposes ofevaluat- ing our proposed approach we devise a new framework inspired by the LI06 data set in that it comprises concept definitions but on a large scale. [sent-149, score-0.309]
57 The intuition is that two definitions in different dic2http://www. [sent-152, score-0.185]
58 The SS algorithm has access to all the definitions in WordNet (WN). [sent-157, score-0.185]
59 , 2006), the SS algorithm should rank the equivalent WN definition as high as possible based on sentence similarity. [sent-159, score-0.193]
60 After preprocessing we obtain 13669 ON definitions mapped to 19655 WN definitions. [sent-162, score-0.185]
61 Clearly, it is very difficult to rank the one correct definition as highest out of all WN definitions (110,000 in total), hence we use ATOPd, area under the TOPKd(k) recall curve for an ON definition d, to measure the performance. [sent-165, score-0.43]
62 Let Nd be the number of aligned WN definitions for the ON definition d, and Ndk be the number of aligned WN definitions in the top-k list. [sent-168, score-0.455]
63 Then with a normalized k ∈ [0,1], TOPKd(k) and ATOPd is dae nfionremda as: TOPKd(k) = Ndk/Nd ATOPd=Z01TOPKd(k)dk (6) ATOPd computes the normalized rank (in the range of [0, 1]) of aligned WN definitions among all WN definitions, with value 0. [sent-169, score-0.256]
64 The similarity oftwo sentences is computed by cosine similarity (except N-gram). [sent-177, score-0.252]
65 The latent vector of a sentence is computed by: (1) using equation 4 in WTMF, or (2) summing up the latent vectors of all the constituent words weighted by Xij in LSA and LDA, similar to the work reported in (Mihalcea et al. [sent-182, score-0.628]
66 For LDA the latent vector of a word is computed by P(z|w). [sent-184, score-0.218]
67 For all dictionaries, we only keep the definitions without examples, and discard the mapping between sense ids and definitions. [sent-189, score-0.185]
68 All the latent variable models (LSA, LDA, WTMF) are built on the same set of corpus: WN+Wik+Brown (393, 666 sentences and 4, 262, 026 words). [sent-200, score-0.232]
69 2 Concept Definition Retrieval Among the 13669 ON definitions, 1000 definitions are randomly selected as a development set (dev) for picking best parameters in the models, and the rest is used as a test set (test). [sent-204, score-0.185]
70 The performance of each model is evaluated by the average ATOPd value over the 12669 definitions (test). [sent-205, score-0.212]
71 To compute ATOPd for an ON definition efficiently, we use the rank of the aligned WN definition among a random sample (size=1000) of WN definitions, to approximate its rank among all WN definitions. [sent-210, score-0.258]
72 weight wm TfoMr Fm,i swseing words and λ for regularization. [sent-230, score-0.361]
73 WTMF that models missing words using a small weight (model 7b with wm = 0. [sent-235, score-0.709]
74 This is because LDA only uses 10 observed words to infer a 100 dimension vector for a sentence, while WTMF takes advantage of much more missing words to learn more robust latent semantics vectors. [sent-237, score-0.91]
75 modeling missing words with equal weights as observed words (wm = 1) (LSA manner), and 2. [sent-243, score-0.568]
76 not modeling missing words at all (wm = 0) (LDA manner) in the context of WTMF model. [sent-244, score-0.442]
77 Both LDA and model 6 ignore missing words, with better ATOPtest scores achieved by LDA. [sent-246, score-0.348]
78 Model 5 and LSA are comparable, where missing words are used with a large weight. [sent-248, score-0.402]
79 05 wm Figure 2: missing words weight wm in WTMF 50 10K0 150 Figure 3: dimension K in WTMF and LDA that allowing for equal impact of both observed and missing words is not the correct characterization of the semantic space. [sent-261, score-1.564]
80 2 Analysis In these latent variable models, there are several essential parameters: weight of missing words wm, and dimension K. [sent-264, score-0.689]
81 Figure 2 shows the influence of wm on ATOPtest values. [sent-266, score-0.271]
82 We also measure the influence of the dimension K = {50, 75, 100, 125, 150} on LDA and WTMF iKn Figure 3, 7w5,h1er0e0 parameters f oonr LWDTAMF an are wm = 0. [sent-270, score-0.316]
83 As we can see in Figure 2, choosing the appropriate parameter wm could boost the performance significantly. [sent-280, score-0.271]
84 Since we do not have any tuning data for this task, we present Pearson’s correlation r for different values of wm in Table 3. [sent-281, score-0.356]
85 In addition, to demonstrate that wm does not overfit the 30 data points, we also evaluate on 870 Table0w3. [sent-282, score-0.271]
86 , 2010), we also include Spearman’s rank order correlation ρ, which is correlation of ranks of similarity values . [sent-289, score-0.24]
87 Note that r and ρ are much lower for 35 pairs set, since most of the sentence pairs have a very low similarity (the average similarity value is 0. [sent-290, score-0.399]
88 01 gives the best results on 30 pairs while on 35 pairs the peak values of r and ρ happens when wm = 0. [sent-294, score-0.38]
89 In general, the correlations in 30 pairs and in 35 pairs are consistent, which indicates wm = 0. [sent-296, score-0.353]
90 Using a smaller wm means the similarity score is computed mainly from semantics of the observed words. [sent-301, score-0.55]
91 In fact, from Figure 2 and Table 2 we see that wm = 0. [sent-303, score-0.271]
92 It indicates that the latent vectors induced by WTMF are able to not only identify same/similar sentences, but also identify the “correct” degree of dissimilar sentences. [sent-311, score-0.257]
93 We use the same pa- rameter setting used for the LI06 evaluation setting since both sets are human-rated sentence pairs (λ = 20, wm = 0. [sent-319, score-0.376]
94 Although there has been work modeling latent semantics for short texts (tweets) in LDA, the focus has been on exploiting additional features in Twitter, hence restricted to Twitter data. [sent-342, score-0.366]
95 In contrast, our approach relies solely on the information in the texts by modeling local missing words, and does not need any additional data, which renders our approach much more widely applicable. [sent-347, score-0.388]
96 7 Conclusions We explicitly model missing words to alleviate the sparsity problem in modeling short texts. [sent-348, score-0.498]
97 We also propose a new evaluation framework for sentence similarity that allows large scale tuning and testing. [sent-349, score-0.207]
98 Semantic text similarity using corpus-based word similarity and string similarity. [sent-398, score-0.226]
99 A comparative study of two short text semantic similarity measures. [sent-434, score-0.179]
100 Training and testing of recommender systems on data missing not at random. [sent-446, score-0.375]
wordName wordTfidf (topN-words)
[('wtmf', 0.476), ('missing', 0.348), ('wm', 0.271), ('ss', 0.239), ('lsa', 0.228), ('definitions', 0.185), ('lda', 0.175), ('latent', 0.171), ('cdr', 0.17), ('wn', 0.17), ('atopd', 0.136), ('xij', 0.114), ('similarity', 0.113), ('atop', 0.102), ('atoptest', 0.102), ('robs', 0.102), ('matrix', 0.097), ('semantics', 0.094), ('concept', 0.088), ('definition', 0.085), ('profile', 0.073), ('observed', 0.072), ('rmiss', 0.068), ('steck', 0.068), ('tsatsaronis', 0.068), ('sentence', 0.064), ('islam', 0.059), ('srebro', 0.059), ('wik', 0.059), ('shea', 0.059), ('zk', 0.059), ('bank', 0.058), ('topic', 0.057), ('words', 0.054), ('factorization', 0.052), ('topkd', 0.051), ('vectors', 0.05), ('ratings', 0.049), ('rs', 0.047), ('vector', 0.047), ('spearman', 0.045), ('dimension', 0.045), ('mihalcea', 0.045), ('inkpen', 0.044), ('wmf', 0.044), ('rank', 0.044), ('pairs', 0.041), ('paraphrase', 0.04), ('equation', 0.04), ('modeling', 0.04), ('semantic', 0.036), ('weight', 0.036), ('dissimilar', 0.036), ('relatedness', 0.036), ('comprises', 0.036), ('variable', 0.035), ('plsa', 0.034), ('jaakkola', 0.034), ('financial', 0.034), ('articial', 0.034), ('ballmer', 0.034), ('loan', 0.034), ('threat', 0.034), ('ramage', 0.033), ('hence', 0.031), ('weighted', 0.031), ('wi', 0.03), ('cells', 0.03), ('short', 0.03), ('tuning', 0.03), ('dj', 0.03), ('bandar', 0.03), ('hashtag', 0.03), ('zuhair', 0.03), ('pearson', 0.029), ('impact', 0.029), ('jin', 0.028), ('correlation', 0.028), ('tweets', 0.027), ('cell', 0.027), ('value', 0.027), ('kauchak', 0.027), ('render', 0.027), ('qp', 0.027), ('recommender', 0.027), ('keeley', 0.027), ('values', 0.027), ('explicitly', 0.026), ('sentences', 0.026), ('tiny', 0.025), ('diag', 0.025), ('zhou', 0.025), ('twitter', 0.025), ('wordnet', 0.025), ('robust', 0.025), ('ideal', 0.024), ('ij', 0.024), ('odni', 0.024), ('dev', 0.024), ('iarpa', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000013 145 acl-2012-Modeling Sentences in the Latent Space
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
2 0.25938118 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
3 0.14635584 56 acl-2012-Computational Approaches to Sentence Completion
Author: Geoffrey Zweig ; John C. Platt ; Christopher Meek ; Christopher J.C. Burges ; Ainur Yessenalina ; Qiang Liu
Abstract: This paper studies the problem of sentencelevel semantic coherence by answering SATstyle sentence completion questions. These questions test the ability of algorithms to distinguish sense from nonsense based on a variety of sentence-level phenomena. We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model; and methods that evaluate global coherence, such as latent semantic analysis. We evaluate these methods on a suite of practice SAT questions, and on a recently released sentence completion task based on data taken from five Conan Doyle novels. We find that by fusing local and global information, we can exceed 50% on this task (chance baseline is 20%), and we suggest some avenues for further research.
4 0.09838403 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
Author: Xinyan Xiao ; Deyi Xiong ; Min Zhang ; Qun Liu ; Shouxun Lin
Abstract: Previous work using topic model for statistical machine translation (SMT) explore topic information at the word level. However, SMT has been advanced from word-based paradigm to phrase/rule-based paradigm. We therefore propose a topic similarity model to exploit topic information at the synchronous rule level for hierarchical phrase-based translation. We associate each synchronous rule with a topic distribution, and select desirable rules according to the similarity of their topic distributions with given documents. We show that our model significantly improves the translation performance over the baseline on NIST Chinese-to-English translation experiments. Our model also achieves a better performance and a faster speed than previous approaches that work at the word level.
Author: Eric Xing
Abstract: Probabilistic topic models have recently gained much popularity in informational retrieval and related areas. Via such models, one can project high-dimensional objects such as text documents into a low dimensional space where their latent semantics are captured and modeled; can integrate multiple sources of information—to ”share statistical strength” among components of a hierarchical probabilistic model; and can structurally display and classify the otherwise unstructured object collections. However, to many practitioners, how topic models work, what to and not to expect from a topic model, how is it different from and related to classical matrix algebraic techniques such as LSI, NMF in NLP, how to empower topic models to deal with complex scenarios such as multimodal data, contractual text in social media, evolving corpus, or presence of supervision such as labeling and rating, how to make topic modeling computationally tractable even on webscale data, etc., in a principled way, remain unclear. In this tutorial, I will demystify the conceptual, mathematical, and computational issues behind all such problems surrounding the topic models and their applications by presenting a systematic overview of the mathematical foundation of topic modeling, and its connections to a number of related methods popular in other fields such as the LDA, admixture model, mixed membership model, latent space models, and sparse coding. Iwill offer a simple and unifying view of all these techniques under the framework multi-view latent space embedding, and online the roadmap of model extension and algorithmic design to3 ward different applications in IR and NLP. A main theme of this tutorial that tie together a wide range of issues and problems will build on the ”probabilistic graphical model” formalism, a formalism that exploits the conjoined talents of graph theory and probability theory to build complex models out of simpler pieces. Iwill use this formalism as a main aid to discuss both the mathematical underpinnings for the models and the related computational issues in a unified, simplistic, transparent, and actionable fashion. Jeju, Republic of Korea,T 8ut Jourliya 2l0 A1b2s.tr ?ac c2t0s1 o2f A ACssLo 2c0ia1t2io,n p faogre C 3o,mputational Linguistics
6 0.080245152 79 acl-2012-Efficient Tree-Based Topic Modeling
7 0.077477887 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation
8 0.075244531 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
9 0.074616216 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
10 0.070806399 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
11 0.066777498 205 acl-2012-Tweet Recommendation with Graph Co-Ranking
12 0.066514157 31 acl-2012-Authorship Attribution with Author-aware Topic Models
14 0.061973099 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
15 0.059532098 98 acl-2012-Finding Bursty Topics from Microblogs
16 0.059343144 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
17 0.056402884 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
18 0.055899225 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
19 0.055636227 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation
20 0.054664962 171 acl-2012-SITS: A Hierarchical Nonparametric Model using Speaker Identity for Topic Segmentation in Multiparty Conversations
topicId topicWeight
[(0, -0.172), (1, 0.071), (2, 0.082), (3, 0.017), (4, -0.113), (5, 0.085), (6, -0.012), (7, 0.031), (8, 0.002), (9, 0.027), (10, 0.039), (11, 0.003), (12, 0.129), (13, 0.083), (14, -0.009), (15, -0.001), (16, 0.085), (17, 0.029), (18, -0.056), (19, 0.034), (20, 0.145), (21, -0.14), (22, -0.109), (23, -0.002), (24, 0.03), (25, -0.012), (26, -0.18), (27, -0.065), (28, -0.057), (29, 0.046), (30, -0.091), (31, -0.069), (32, -0.059), (33, -0.102), (34, 0.116), (35, 0.012), (36, -0.062), (37, -0.122), (38, -0.025), (39, 0.113), (40, 0.105), (41, -0.086), (42, -0.082), (43, -0.321), (44, 0.119), (45, -0.071), (46, -0.119), (47, 0.172), (48, -0.066), (49, -0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.94568658 145 acl-2012-Modeling Sentences in the Latent Space
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
2 0.78448892 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
3 0.68682951 56 acl-2012-Computational Approaches to Sentence Completion
Author: Geoffrey Zweig ; John C. Platt ; Christopher Meek ; Christopher J.C. Burges ; Ainur Yessenalina ; Qiang Liu
Abstract: This paper studies the problem of sentencelevel semantic coherence by answering SATstyle sentence completion questions. These questions test the ability of algorithms to distinguish sense from nonsense based on a variety of sentence-level phenomena. We tackle the problem with two approaches: methods that use local lexical information, such as the n-grams of a classical language model; and methods that evaluate global coherence, such as latent semantic analysis. We evaluate these methods on a suite of practice SAT questions, and on a recently released sentence completion task based on data taken from five Conan Doyle novels. We find that by fusing local and global information, we can exceed 50% on this task (chance baseline is 20%), and we suggest some avenues for further research.
4 0.49468413 117 acl-2012-Improving Word Representations via Global Context and Multiple Word Prototypes
Author: Eric Huang ; Richard Socher ; Christopher Manning ; Andrew Ng
Abstract: Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are built with only local context and one representation per word. This is problematic because words are often polysemous and global context can also provide useful information for learning word meanings. We present a new neural network architecture which 1) learns word embeddings that better capture the semantics of words by incorporating both local and global document context, and 2) accounts for homonymy and polysemy by learning multiple embeddings per word. We introduce a new dataset with human judgments on pairs of words in sentential context, and evaluate our model on it, showing that our model outperforms competitive baselines and other neural language models. 1
5 0.38228899 181 acl-2012-Spectral Learning of Latent-Variable PCFGs
Author: Shay B. Cohen ; Karl Stratos ; Michael Collins ; Dean P. Foster ; Lyle Ungar
Abstract: We introduce a spectral learning algorithm for latent-variable PCFGs (Petrov et al., 2006). Under a separability (singular value) condition, we prove that the method provides consistent parameter estimates.
6 0.37954378 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model
7 0.35400689 86 acl-2012-Exploiting Latent Information to Predict Diffusions of Novel Topics on Social Networks
8 0.35050023 31 acl-2012-Authorship Attribution with Author-aware Topic Models
10 0.31914335 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
11 0.31047073 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
12 0.30693316 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
13 0.29655287 18 acl-2012-A Probabilistic Model for Canonicalizing Named Entity Mentions
15 0.27422121 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
16 0.2670761 163 acl-2012-Prediction of Learning Curves in Machine Translation
17 0.26447022 79 acl-2012-Efficient Tree-Based Topic Modeling
18 0.26437843 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation
19 0.2633318 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs
20 0.26062077 2 acl-2012-A Broad-Coverage Normalization System for Social Media Language
topicId topicWeight
[(25, 0.055), (26, 0.023), (28, 0.028), (30, 0.036), (37, 0.04), (39, 0.068), (57, 0.013), (58, 0.249), (59, 0.012), (74, 0.029), (82, 0.02), (85, 0.042), (90, 0.122), (91, 0.011), (92, 0.089), (94, 0.026), (99, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.7613467 145 acl-2012-Modeling Sentences in the Latent Space
Author: Weiwei Guo ; Mona Diab
Abstract: Sentence Similarity is the process of computing a similarity score between two sentences. Previous sentence similarity work finds that latent semantics approaches to the problem do not perform well due to insufficient information in single sentences. In this paper, we show that by carefully handling words that are not in the sentences (missing words), we can train a reliable latent variable model on sentences. In the process, we propose a new evaluation framework for sentence similarity: Concept Definition Retrieval. The new framework allows for large scale tuning and testing of Sentence Similarity models. Experiments on the new task and previous data sets show significant improvement of our model over baselines and other traditional latent variable models. Our results indicate comparable and even better performance than current state of the art systems addressing the problem of sentence similarity.
2 0.73068583 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars
Author: Elif Yamangil ; Stuart Shieber
Abstract: We present a Bayesian nonparametric model for estimating tree insertion grammars (TIG), building upon recent work in Bayesian inference of tree substitution grammars (TSG) via Dirichlet processes. Under our general variant of TIG, grammars are estimated via the Metropolis-Hastings algorithm that uses a context free grammar transformation as a proposal, which allows for cubic-time string parsing as well as tree-wide joint sampling of derivations in the spirit of Cohn and Blunsom (2010). We use the Penn treebank for our experiments and find that our proposal Bayesian TIG model not only has competitive parsing performance but also finds compact yet linguistically rich TIG representations of the data.
3 0.66314209 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
Author: Weiwei Guo ; Mona Diab
Abstract: In this paper we study unsupervised word sense disambiguation (WSD) based on sense definition. We learn low-dimensional latent semantic vectors of concept definitions to construct a more robust sense similarity measure wmfvec. Experiments on four all-words WSD data sets show significant improvement over the baseline WSD systems and LDA based similarity measures, achieving results comparable to state of the art WSD systems.
4 0.56700701 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment
Author: Asher Stern ; Ido Dagan
Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.
5 0.56607872 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers
Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater
Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.
6 0.5649361 167 acl-2012-QuickView: NLP-based Tweet Search
7 0.56473643 31 acl-2012-Authorship Attribution with Author-aware Topic Models
8 0.56463403 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning
9 0.56427687 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing
10 0.56316662 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
11 0.56010407 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information
12 0.55894274 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation
14 0.55639118 10 acl-2012-A Discriminative Hierarchical Model for Fast Coreference at Large Scale
15 0.55322289 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle
16 0.55253369 146 acl-2012-Modeling Topic Dependencies in Hierarchical Text Categorization
17 0.55163968 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
18 0.55140978 217 acl-2012-Word Sense Disambiguation Improves Information Retrieval
19 0.55116057 191 acl-2012-Temporally Anchored Relation Extraction
20 0.55114949 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons