emnlp emnlp2013 emnlp2013-44 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu
Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.
Reference: text
sentIndex sentText sentNum sentScore
1 , objects in the dataset that are similar to many other objects. [sent-11, score-0.368]
2 We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. [sent-13, score-0.417]
3 Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks. [sent-15, score-0.69]
4 jp unknown class label of a test object is predicted by a majority vote of the classes of its k most similar objects in the labeled training set. [sent-24, score-0.532]
5 , 2010b) have shown that if the feature space is high-dimensional, some objects in the dataset emerge as hubs; i. [sent-27, score-0.413]
6 , these objects frequently appear in the k nearest neighbors of other objects. [sent-29, score-0.549]
7 The emergence of hubs may deteriorate the performance of kNN classification and nearest neighbor search in general: • If hub objects exist in the training set, they have a strong cjehcatnsc eex to bine t a ek tNraNin ionfg many test oavbejects. [sent-30, score-1.121]
8 • In information retrieval, nearest neighbor Isenar cinhf ofirmndast objects einv tlh,e database that are most relevant, or similar, to user-provided queries. [sent-32, score-0.519]
9 The distance between data objects is not changed by centering, but their inner product and cosine are affected; see Section 3 for detail. [sent-42, score-0.589]
10 Specifically, we propose to measure the similarity of objects by the inner product (not distance or cosine) in the centered feature space. [sent-44, score-0.723]
11 Our approach is motivated by the observation that the objects similar to the data centroid tend to be- come hubs (Radovanovi c´ et al. [sent-45, score-0.877]
12 This observation suggests that the number of hubs may be reduced if we can define a similarity measure that makes all objects in a dataset equally similar to the centroid (Suzuki et al. [sent-47, score-1.019]
13 In Section 4, we analyze why hubs emerge under a simple probabilistic model of data, and also give an account of why they are suppressed by centering. [sent-50, score-0.417]
14 Using both synthetic and real datasets, we show that objects similar to the centroid also emerge as hubs in multi-cluster data (Section 5), so the application of centering is wider than expected. [sent-51, score-1.477]
15 To further reduce hubs, we also propose to move the origin of the space more aggressively towards hubs, through weighted centering (Section 6). [sent-52, score-0.792]
16 In Section 7, we show that centering and weighted centering are effective for natural language data. [sent-53, score-1.112]
17 For instance, centering forms a preprocessing step in principal component analysis and Fisher linear discriminant analysis. [sent-56, score-0.519]
18 In NLP, however, centering is seldom used; the use of cosine and inner product similarities is quite common, but they are nearly always used uncentered. [sent-57, score-0.789]
19 (2012) proposed the Mutual Proximity transformation that rescales distance measures to decrease hubs in a dataset. [sent-66, score-0.397]
20 In Section 7, we evaluate centering, Mutual Proximity, and Laplacian kernels in NLP tasks, and demonstrate that centering is equally or even more effective. [sent-70, score-0.565]
21 Section 4 presents a theoretical justification for using centering to reduce hubs, but this kind of analysis is missing for the Laplacian kernels. [sent-71, score-0.544]
22 Throughout this paper, we use the inner product hxi, xji as a measure pofa similarity ebe thtewe inenne xi raonddu xj. [sent-76, score-0.38]
23 Centering is a transformation in which the origin of the feature space is shifted to the data centroid ¯x =n1Xi=n1xi, (1) and object x is mapped to the centered feature vector xcent =x − x¯. [sent-82, score-0.668]
24 (2) The similarity between two objects x and x0 is now measured by hxcent, x0centi = hx − x¯, x0 − x¯i. [sent-83, score-0.412]
25 After centering, the inner product between any object and the data centroid (which is a zero vector because x¯ cent = ¯ x − ¯ x = 0) is uniformly 0; in other words, all objects i −n tx¯ h =e d 0a)ta isset u nhiafoverm an equal soitmhei-r larity to the centroid. [sent-84, score-0.877]
26 According to the observation × that the objects similar to the centroid become hubs (Radovanovi c´ et al. [sent-85, score-0.877]
27 , 2010a), we can expect hubs to be reduced after centering. [sent-86, score-0.372]
28 Intuitively, centering reduces hubs because it makes the length of the feature vector short for (hub) objects x that lie close to the data centroid ¯x ; see Eq. [sent-87, score-1.396]
29 And since we measure object similarity by inner product, shorter vectors tend to produce smaller similarity scores. [sent-89, score-0.518]
30 Hence objects close to the data centroid become less similar to other objects after centering, and no longer be hubs. [sent-90, score-0.824]
31 In Section 4, we analyze the effect of centering on hubness in more detail. [sent-91, score-0.641]
32 mTahteri symmetric em aantri nxH = I−(1 /n)11T is called centering matrix, because tHhe = ce I−nt(e1r/end) 1da1ta matrix Xcent = [xc1ent, · · · ,xcnent] can be computed by Xcent = XH (Mardia et al. [sent-94, score-0.585]
33 The Gram matrix Kcent of the centered feature vectors, whose (i, j) element holds the inner product hxicent, xcjenti, can be calculated from the original Gucrtam hx matrix Ki by Kcent = ? [sent-96, score-0.443]
34 , centering can be applied even if data matrix X is not available but the similarity of objects can be measured by a kernel function in an implicit feature space. [sent-105, score-1.031]
35 615 4 Theoretical analysis of the effect of centering on hubness We now analyze why objects most similar to the centroid tend to be hubs in the dataset, and give an explanation as to why centering may suppress the emergence of hubs. [sent-106, score-2.105]
36 1 Before centering Consider a dataset of m-dimensional feature vectors, with each vector x ∈ Rm generated independently µ. [sent-108, score-0.568]
37 fwroitmh a dcihst vreibcutotiron x w ∈it hR a finite mean vector In other words, objects x in this dataset are drawn from a distribution P(x), i. [sent-109, score-0.451]
38 Let h and ‘ be two fixed objects in the dataset, such that the inner product to the true mean is higher for h than for ‘, i. [sent-121, score-0.592]
39 (5) We are interested in which of h and ‘ is more similar to x (in terms of inner product), or in other words, the difference of two inner products z = hh, xi h‘, xi = hh ‘, xi. [sent-124, score-0.537]
40 Now, if we let h be the object in a given dataset with the highest similarity (inner product) to the mean and let ‘ be any other object in the set, then we see from the above discussion that h is likely to µ, µ have higher similarity to x, a test sample drawn from distribution P(x). [sent-139, score-0.618]
41 2 After centering Next let us investigate what happens if the dataset is centered. [sent-142, score-0.568]
42 After centering, the similarity of x with each of the two fixed objects h and ‘ are evaluated by hh x¯, x x¯i and h‘ x¯, x x¯i, respectively. [sent-145, score-0.563]
43 − − Their difference zcent = = = = − − 616 zcent is given by hh x¯, x hh ‘, x hh ‘, xi z hh ‘, − − − − − x¯i h‘ x¯, x x¯i hh ‘, x¯i x¯i. [sent-146, score-0.918]
44 By definition we have z ∼ Q(z), and since hh ‘, x¯i is a constant, − zcent =z − hh − ‘, x¯ i ∼ Q(z + hh − ‘, x¯ i). [sent-149, score-0.514]
45 µ In other words, the shape of the distribution does not change, but the mean is shifted to E[zcent] = E[z] − hh − ‘, x¯i = hh − ‘, µi− hh − ‘, x¯i = hh − ‘, − x¯i, µ, µ, where E[z] is given by Eq. [sent-150, score-0.714]
46 5 Hubs in multi-cluster data In this section, we discuss emergence of hubs when the data consists of multiple clusters. [sent-159, score-0.44]
47 How- ever, one might still argue that objects similar to the data centroid should hardly occur in that case. [sent-161, score-0.505]
48 Using both synthetic and real datasets, we demonstrate below that even in multi-cluster data, objects that are only slightly more similar to the data mean (centroid) may emerge as hubs. [sent-162, score-0.451]
49 (a), (d): scatter plot of the N10 value of objects and their similarity to centroid. [sent-167, score-0.475]
50 (c), (f): the number of times (y-axis) an object (whose ID is on the x-axis) appears in the 10 nearest neighbors of objects of the same cluster (black bars), and those of different clusters (magenta). [sent-170, score-0.729]
51 The von Mises-Fisher distribution is a distribution of unit vectors (it can roughly be thought of as a normal distribution on a unit hypersphere), so for objects (feature vectors) sampled from this distribution, inner product reduces to cosine similarity. [sent-172, score-0.826]
52 We 100 objects from each of the ten distributions (clusters), and made a dataset of 1,000 objects in total. [sent-173, score-0.716]
53 Note that even though all sampled objects reside on the surface of the unit hypersphere, the data centroid lies not on the surface but inside the hypersphere. [sent-189, score-0.529]
54 , object similarity is measured by raw inner product, not by cosine. [sent-192, score-0.395]
55 2 Correlation between hubness and centroid similarity The scatter plot in Figure 1(a) shows the correlation between the degree of hubness (N10) of an object and its inner product similarity to the data centroid. [sent-195, score-1.051]
56 The N10 value of an object is defined as the number of times the object appears in the 10 nearest neighbors of other objects in the dataset. [sent-196, score-0.849]
57 The similarity to the centroid is uniformly 0 as a result of centering, and no objects have an N10 value greater than 33. [sent-203, score-0.598]
58 3 Influence of hubs on objects in different clusters The kNN matrix of Figure 1(b) depicts the kNN relations with k = 10 among objects before centering. [sent-206, score-1.076]
59 If object x is in the 10 nearest neighbors of object y, a point is plotted at coordinates (x, y). [sent-208, score-0.53]
60 In this matrix, object IDs are sorted by the cluster the objects belong to. [sent-211, score-0.499]
61 Hence in the ideal case in which the k nearest neighbors of every object consist genuinely of objects from the same cluster, only the diagonal blocks would be colored, and off-diagonal areas would be left blank. [sent-212, score-0.699]
62 The presence of many warm colored vertical lines suggests that many hub objects appear in the 10 nearest neighbors of other objects that are not in the same cluster as the hubs. [sent-214, score-1.059]
63 Thus these hubs may have a strong influence on the kNN prediction of other objects. [sent-215, score-0.372]
64 Re618 call that N10 is the number oftimes an object appears in the 10 nearest neighbors of other objects. [sent-219, score-0.38]
65 The bar for each object is broken down by whether the object and its neighbors belong to the same cluster (black bar) or in different clusters (magenta bar). [sent-220, score-0.458]
66 In terms of kNN classification, having a large number of nearest neighbors with the same class improves the classification performance, so longer black bars and shorter magenta bars are more desirable. [sent-221, score-0.55]
67 Before centering (Figure 1(c)), hub objects with large N10 values are similar not only to objects belonging to the same cluster (as indicated by black bars), but also to objects belonging to different clusters (magenta bars). [sent-222, score-1.661]
68 After centering (Figure 1(f)), the number of tall magenta bars decreases. [sent-223, score-0.661]
69 7% of the 10 nearest neighbors of an object have the same class label as the object (as indicated by the ratio of the total height of black bars relative to that of all bars in Figure 1(c)). [sent-225, score-0.749]
70 We can observe the same trends as we saw in Figure 1 for the synthetic data: positive correlation between hubness (N10) and inner product with the data centroid before centering; hubs appearing in the nearest neigh- bors of many objects of different classes; and both are reduced after centering. [sent-238, score-1.385]
71 6 Hubness weighted centering Centering shifts the origin of the space to the data centroid, and objects similar to the centroid tend to become hubs. [sent-242, score-1.247]
72 Thus in a sense, centering can be interpreted as an operation that shifts the origin towards hubs. [sent-243, score-0.668]
73 In this section, we extrapolate this interpretation, product similarity to the data centroid product similarity to the data centroid Figure 2: Reuters Transcribed data. [sent-244, score-0.698]
74 To this end, we consider weighted centering, a variation of centering in which each object is associated with a weight, and the origin is shifted to the weighted mean of the data. [sent-246, score-1.013]
75 Specifically, we define the weight of an object as the sum of the similarities (inner products) between the object and all objects, regarding this sum as the index of how likely the object can be a hub. [sent-247, score-0.45]
76 1 Weighted centering In weighted centering, we associate weight wi to each object iin the dataset, and move the origin to the weighted centroid x¯weighted=Xi=n1wixi 619 where Pin=1 wi = 1 and 0 ≤ wi ≤ 1 for i = 1, . [sent-249, score-1.146]
77 Notice that the original centering formula (2) is recovered by letting wi = 1/n for all i= 1, . [sent-254, score-0.519]
78 Weighted centering can also be kernelized by using the weighted centering matrix H(w) = I 1wT − iinn place woefi gHh itend Eq. [sent-258, score-1.178]
79 2 Similarity-dependent weighting To move the origin towards hubs more aggressively, we place more weights on objects that are more likely to become hubs. [sent-262, score-0.834]
80 This likelihood is estimated by the similarity of individual objects to all objects in the data set. [sent-263, score-0.731]
81 Let di be the sum of the similarity between object xi and all objects in the dataset. [sent-264, score-0.603]
82 When γ > 0, weighted centering moves the origin closer to the objects with a large di than normal centering would. [sent-269, score-1.576]
83 7 Experiments We evaluated the effect of centering in two natural language tasks: word sense disambiguation (WSD) and document classification. [sent-270, score-0.578]
84 We are interested in whether hubs are actually reduced after centering, and whether the performance of kNN classification is improved. [sent-271, score-0.412]
85 , inner product of feature vectors normalized to unit length; Kcent denotes the centered similarity matrix computed by Eq. [sent-274, score-0.524]
86 Since the Laplacian kernels are defined for graph nodes, we computed them by taking the cosine similarity matrix K as the weighted adjacency (affinity) matrix of a graph. [sent-281, score-0.393]
87 2 Compared methods We applied kNN classification using cosine similarity K, and its four transformed similarity measures: centered similarity Kcent, its weighted variant Kweighted, Mutual Proximity and graph Laplacian kernels. [sent-299, score-0.53]
88 The sense of a test object was predicted by voting from the k training objects most similar to the test object, as measured by the respective similarity measures. [sent-300, score-0.596]
89 , either (unweighted) majority vote, or weighted vote in which votes from individual objects are weighted by their similarity score to the test objects. [sent-303, score-0.599]
90 , 2010a) and counted Nk(x), the number of times object x occurs in the kNN lists of other objects in the dataset (we fix k = 10 below). [sent-315, score-0.518]
91 The emergence of hubs in a dataset can then be quantified with skewness, defined as follows: SNk=Eh? [sent-316, score-0.489]
92 When hubs exist in a dataset, the distribution of Nk is expected to skew to the right, and yields a large SNk (Radovanovi c´ et al. [sent-320, score-0.404]
93 After centering (Kcent and Kweighted) skewness became markedly smaller than that of the noncentered cosine K. [sent-338, score-0.738]
94 In particular, weighted centering (Kweighted) slightly outperformed GAMBL, though the difference was small. [sent-340, score-0.593]
95 kNN classification was done in a standard way: The class of object x is predicted by the majority vote from k = 10 objects most similar to x, measured by a specified similarity measure. [sent-364, score-0.665]
96 Weighted centering (Kweighted) further improved F1 on the Mini Newsgroups data. [sent-372, score-0.519]
97 8 Conclusion We have shown that centering similarity matrices reduces the emergence of hubs in the data, and consequently improves the accuracy of nearest neighbor classification. [sent-373, score-1.252]
98 We have theoretically analyzed why objects most similar to the mean tend to make hubs, and also proved that centering cancels the bias in the distribution of inner products, and thus is expected 622 to reduce hubs. [sent-374, score-1.098]
99 Weighted centering shifts the origin towards hubs more aggressively, and further improved the classification performance in some cases. [sent-376, score-1.08]
100 In future work, we plan to exploit the class distribution in the dataset to make more effective similarity measures; notice that the hubness weighted centering of Section 6 is an unsupervised method, in the sense that class information was not used for determining weights. [sent-377, score-0.971]
wordName wordTfidf (topN-words)
[('centering', 0.519), ('hubs', 0.372), ('objects', 0.319), ('centroid', 0.186), ('knn', 0.181), ('skewness', 0.171), ('inner', 0.152), ('hh', 0.151), ('object', 0.15), ('nearest', 0.128), ('hub', 0.122), ('hubness', 0.122), ('kweighted', 0.122), ('laplacian', 0.118), ('origin', 0.118), ('kcent', 0.11), ('radovanovi', 0.11), ('neighbors', 0.102), ('similarity', 0.093), ('centered', 0.089), ('bars', 0.081), ('weighted', 0.074), ('gambl', 0.073), ('xcent', 0.073), ('neighbor', 0.072), ('product', 0.07), ('emergence', 0.068), ('matrix', 0.066), ('magenta', 0.061), ('mardia', 0.061), ('saerens', 0.061), ('zcent', 0.061), ('mean', 0.051), ('dataset', 0.049), ('nk', 0.049), ('cosine', 0.048), ('kernels', 0.046), ('emerge', 0.045), ('suzuki', 0.045), ('xi', 0.041), ('classification', 0.04), ('proximity', 0.039), ('vote', 0.039), ('colored', 0.039), ('wsd', 0.039), ('chebotarev', 0.037), ('hypersphere', 0.037), ('ikumi', 0.037), ('kazuo', 0.037), ('mini', 0.037), ('scatter', 0.037), ('schnitzer', 0.037), ('snk', 0.037), ('von', 0.036), ('synthetic', 0.036), ('kernel', 0.034), ('reuters', 0.034), ('sense', 0.034), ('els', 0.034), ('black', 0.033), ('shimbo', 0.032), ('distribution', 0.032), ('mutual', 0.031), ('shifts', 0.031), ('aggressively', 0.031), ('vectors', 0.03), ('cluster', 0.03), ('mihalcea', 0.029), ('fisher', 0.029), ('ten', 0.029), ('breakdown', 0.029), ('normal', 0.027), ('shifted', 0.027), ('transcribed', 0.027), ('bar', 0.026), ('plot', 0.026), ('transformation', 0.025), ('disambiguation', 0.025), ('reduce', 0.025), ('median', 0.025), ('move', 0.025), ('alexandros', 0.024), ('decadt', 0.024), ('eriksson', 0.024), ('fukumizu', 0.024), ('ivanovi', 0.024), ('lenz', 0.024), ('masand', 0.024), ('mirjana', 0.024), ('mishima', 0.024), ('nanopoulos', 0.024), ('qamar', 0.024), ('shamis', 0.024), ('shizuoka', 0.024), ('warmer', 0.024), ('xji', 0.024), ('navigli', 0.024), ('unit', 0.024), ('rada', 0.024), ('class', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu
Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.
2 0.098652817 78 emnlp-2013-Exploiting Language Models for Visual Recognition
Author: Dieu-Thu Le ; Jasper Uijlings ; Raffaella Bernardi
Abstract: The problem of learning language models from large text corpora has been widely studied within the computational linguistic community. However, little is known about the performance of these language models when applied to the computer vision domain. In this work, we compare representative models: a window-based model, a topic model, a distributional memory and a commonsense knowledge database, ConceptNet, in two visual recognition scenarios: human action recognition and object prediction. We examine whether the knowledge extracted from texts through these models are compatible to the knowledge represented in images. We determine the usefulness of different language models in aiding the two visual recognition tasks. The study shows that the language models built from general text corpora can be used instead of expensive annotated images and even outperform the image model when testing on a big general dataset.
3 0.094457947 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
Author: Nicholas FitzGerald ; Yoav Artzi ; Luke Zettlemoyer
Abstract: We present a new approach to referring expression generation, casting it as a density estimation problem where the goal is to learn distributions over logical expressions identifying sets of objects in the world. Despite an extremely large space of possible expressions, we demonstrate effective learning of a globally normalized log-linear distribution. This learning is enabled by a new, multi-stage approximate inference technique that uses a pruning model to construct only the most likely logical forms. We train and evaluate the approach on a new corpus of references to sets of visual objects. Experiments show the approach is able to learn accurate models, which generate over 87% of the expressions people used. Additionally, on the previously studied special case of single object reference, we show a 35% relative error reduction over previous state of the art.
Author: William Yang Wang ; Edward Lin ; John Kominek
Abstract: We propose a Laplacian structured sparsity model to study computational branding analytics. To do this, we collected customer reviews from Starbucks, Dunkin’ Donuts, and other coffee shops across 38 major cities in the Midwest and Northeastern regions of USA. We study the brand related language use through these reviews, with focuses on the brand satisfaction and gender factors. In particular, we perform three tasks: automatic brand identification from raw text, joint brand-satisfaction prediction, and joint brandgender-satisfaction prediction. This work extends previous studies in text classification by incorporating the dependency and interaction among local features in the form of structured sparsity in a log-linear model. Our quantitative evaluation shows that our approach which combines the advantages of graphical modeling and sparsity modeling techniques significantly outperforms various standard and stateof-the-art text classification algorithms. In addition, qualitative analysis of our model reveals important features of the language uses associated with the specific brands.
5 0.077502087 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
6 0.077178165 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
7 0.064169616 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
8 0.060957931 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts
9 0.050587967 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
11 0.04833208 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
12 0.046283297 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks
13 0.042655874 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching
14 0.040858895 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
15 0.04066686 98 emnlp-2013-Image Description using Visual Dependency Representations
16 0.038848471 11 emnlp-2013-A Multimodal LDA Model integrating Textual, Cognitive and Visual Modalities
17 0.037837055 67 emnlp-2013-Easy Victories and Uphill Battles in Coreference Resolution
18 0.037249643 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
19 0.036492173 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
20 0.036317814 138 emnlp-2013-Naive Bayes Word Sense Induction
topicId topicWeight
[(0, -0.122), (1, 0.04), (2, -0.048), (3, 0.011), (4, -0.009), (5, 0.126), (6, -0.004), (7, -0.047), (8, -0.09), (9, -0.039), (10, -0.131), (11, -0.017), (12, -0.009), (13, -0.0), (14, -0.007), (15, -0.049), (16, -0.058), (17, -0.001), (18, -0.042), (19, 0.048), (20, -0.003), (21, -0.042), (22, 0.051), (23, -0.013), (24, -0.144), (25, -0.057), (26, -0.093), (27, -0.043), (28, 0.002), (29, 0.018), (30, 0.058), (31, 0.123), (32, -0.042), (33, -0.088), (34, 0.115), (35, 0.027), (36, -0.017), (37, 0.048), (38, -0.071), (39, -0.054), (40, 0.079), (41, 0.235), (42, 0.071), (43, -0.022), (44, 0.077), (45, -0.033), (46, -0.044), (47, 0.089), (48, 0.016), (49, 0.107)]
simIndex simValue paperId paperTitle
same-paper 1 0.9508338 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu
Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.
2 0.59035277 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
Author: William Yang Wang ; Edward Lin ; John Kominek
Abstract: We propose a Laplacian structured sparsity model to study computational branding analytics. To do this, we collected customer reviews from Starbucks, Dunkin’ Donuts, and other coffee shops across 38 major cities in the Midwest and Northeastern regions of USA. We study the brand related language use through these reviews, with focuses on the brand satisfaction and gender factors. In particular, we perform three tasks: automatic brand identification from raw text, joint brand-satisfaction prediction, and joint brandgender-satisfaction prediction. This work extends previous studies in text classification by incorporating the dependency and interaction among local features in the form of structured sparsity in a log-linear model. Our quantitative evaluation shows that our approach which combines the advantages of graphical modeling and sparsity modeling techniques significantly outperforms various standard and stateof-the-art text classification algorithms. In addition, qualitative analysis of our model reveals important features of the language uses associated with the specific brands.
4 0.47021899 153 emnlp-2013-Predicting the Resolution of Referring Expressions from User Behavior
Author: Nikos Engonopoulos ; Martin Villalba ; Ivan Titov ; Alexander Koller
Abstract: We present a statistical model for predicting how the user of an interactive, situated NLP system resolved a referring expression. The model makes an initial prediction based on the meaning of the utterance, and revises it continuously based on the user’s behavior. The combined model outperforms its components in predicting reference resolution and when to give feedback.
5 0.4303017 17 emnlp-2013-A Walk-Based Semantically Enriched Tree Kernel Over Distributed Word Representations
Author: Shashank Srivastava ; Dirk Hovy ; Eduard Hovy
Abstract: In this paper, we propose a walk-based graph kernel that generalizes the notion of treekernels to continuous spaces. Our proposed approach subsumes a general framework for word-similarity, and in particular, provides a flexible way to incorporate distributed representations. Using vector representations, such an approach captures both distributional semantic similarities among words as well as the structural relations between them (encoded as the structure of the parse tree). We show an efficient formulation to compute this kernel using simple matrix operations. We present our results on three diverse NLP tasks, showing state-of-the-art results.
6 0.42717254 119 emnlp-2013-Learning Distributions over Logical Forms for Referring Expression Generation
8 0.39963722 78 emnlp-2013-Exploiting Language Models for Visual Recognition
9 0.38615423 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
10 0.38613078 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
11 0.38151437 137 emnlp-2013-Multi-Relational Latent Semantic Analysis
12 0.366932 37 emnlp-2013-Automatically Identifying Pseudepigraphic Texts
13 0.36524758 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion
15 0.31542927 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
16 0.29129454 182 emnlp-2013-The Topology of Semantic Knowledge
17 0.29010507 98 emnlp-2013-Image Description using Visual Dependency Representations
18 0.28802016 165 emnlp-2013-Scaling to Large3 Data: An Efficient and Effective Method to Compute Distributional Thesauri
19 0.2838755 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks
20 0.27461651 95 emnlp-2013-Identifying Multiple Userids of the Same Author
topicId topicWeight
[(3, 0.031), (6, 0.01), (9, 0.01), (18, 0.029), (22, 0.034), (30, 0.057), (50, 0.027), (51, 0.126), (57, 0.405), (66, 0.03), (71, 0.04), (75, 0.031), (96, 0.044), (97, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.71715212 44 emnlp-2013-Centering Similarity Measures to Reduce Hubs
Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu
Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.
2 0.61338454 191 emnlp-2013-Understanding and Quantifying Creativity in Lexical Composition
Author: Polina Kuznetsova ; Jianfu Chen ; Yejin Choi
Abstract: Why do certain combinations of words such as “disadvantageous peace ” or “metal to the petal” appeal to our minds as interesting expressions with a sense of creativity, while other phrases such as “quiet teenager”, or “geometrical base ” not as much? We present statistical explorations to understand the characteristics of lexical compositions that give rise to the perception of being original, interesting, and at times even artistic. We first examine various correlates of perceived creativity based on information theoretic measures and the connotation of words, then present experiments based on supervised learning that give us further insights on how different aspects of lexical composition collectively contribute to the perceived creativity.
3 0.36648238 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
4 0.35922024 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
Author: Jason Weston ; Antoine Bordes ; Oksana Yakhnenko ; Nicolas Usunier
Abstract: This paper proposes a novel approach for relation extraction from free text which is trained to jointly use information from the text and from existing knowledge. Our model is based on scoring functions that operate by learning low-dimensional embeddings of words, entities and relationships from a knowledge base. We empirically show on New York Times articles aligned with Freebase relations that our approach is able to efficiently use the extra information provided by a large subset of Freebase data (4M entities, 23k relationships) to improve over methods that rely on text features alone.
5 0.35911813 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
Author: Xinjie Zhou ; Xiaojun Wan ; Jianguo Xiao
Abstract: Microblog messages pose severe challenges for current sentiment analysis techniques due to some inherent characteristics such as the length limit and informal writing style. In this paper, we study the problem of extracting opinion targets of Chinese microblog messages. Such fine-grained word-level task has not been well investigated in microblogs yet. We propose an unsupervised label propagation algorithm to address the problem. The opinion targets of all messages in a topic are collectively extracted based on the assumption that similar messages may focus on similar opinion targets. Topics in microblogs are identified by hashtags or using clustering algorithms. Experimental results on Chinese microblogs show the effectiveness of our framework and algorithms.
6 0.35841691 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
7 0.3580144 143 emnlp-2013-Open Domain Targeted Sentiment
8 0.35799077 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
10 0.35760865 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
11 0.3571406 80 emnlp-2013-Exploiting Zero Pronouns to Improve Chinese Coreference Resolution
12 0.3569364 130 emnlp-2013-Microblog Entity Linking by Leveraging Extra Posts
13 0.35637566 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
14 0.35618705 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
15 0.35512424 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
17 0.3546426 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
18 0.35455909 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
19 0.35424629 16 emnlp-2013-A Unified Model for Topics, Events and Users on Twitter
20 0.35324302 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction