emnlp emnlp2012 emnlp2012-53 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
Reference: text
sentIndex sentText sentNum sentScore
1 higher-order modification in distributional semantics Gemma Boleda Linguistics Department University of Texas at Austin gemma . [sent-2, score-0.283]
2 Abstract Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. [sent-8, score-0.461]
3 Higher-order modification (that is, modification that cannot obviously be modeled as property intersection, in contrast to firstorder modification, which can) presents one such challenge, as we will detail in the next section. [sent-20, score-0.367]
4 Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls be given a well-motivated first-order or higher-order analysis; and 3) intensional adjectives (e. [sent-27, score-0.519]
5 Second, we test how five different composition functions that have been proposed in recent literature fare in predicting the attested properties of nominals modified by each type of adjective. [sent-32, score-0.284]
6 Section 4 describes the characteristics of the different types of adjectival modification, and Section 5, the results of the composition operations. [sent-37, score-0.265]
7 2 The semantics of adjectival modification Accounting for inference in language is an important concern of semantic theory. [sent-39, score-0.321]
8 Perhaps for this reason, within the formal semantics tradition the most influential classification of adjectives is based on the inferences they license (see (Parsons, 1970) and (Kamp, 1975) for early discussion). [sent-40, score-0.33]
9 First, so called intersective adjectives, such as (the literally used) white in white dress, yield the inference that both the property contributed by the adjective and that contributed by the noun hold of the individual described; in other words, a white dress is white and is a dress. [sent-42, score-1.55]
10 On the other extreme, intensional adjectives, such as former or alleged in former/alleged criminal, do not license the inference that either of the properties holds of the individual to which the modified nom1224 inal is ascribed. [sent-44, score-0.415]
11 Finally, subsective adjectives such as (the nonliterally-used) white in white wine, consitute an intermediate case: they license the inference that the property denoted by the noun holds of the individual being described, but not the property contributed by the adjective. [sent-51, score-1.108]
12 That is, white wine is not white but rather a color that we would probably call some shade of yellow. [sent-52, score-0.814]
13 This use of color terms, in general, is distinguished primarily by the fact that color serves as a proxy for another property that is related to color (e. [sent-53, score-1.001]
14 type of grape), though the color in question may or may not match the color identified by the adjective on the intersective use (see (G¨ ardenfors, 2000) and (Kennedy and McNally, 2010) for discussion and analysis). [sent-55, score-1.266]
15 This use of color terms can be modeled by property intersection in formal semantic models only if the term is previously disambiguated or allowed to depend on context for its precise denotation. [sent-58, score-0.479]
16 However, it is easily modeled if the adjective denotes a (higher-order) function from properties (e. [sent-59, score-0.252]
17 that denoted by wine) to properties (that denoted by white wine), since the output of the function denoted by the color term can be made to depend on the input it receives from the noun meaning. [sent-61, score-0.614]
18 Nonetheless, there is ample evidence in natural language that a firstorder analysis of the subsective color terms would be preferable, as they share more features with predicative adjectives such as happy than they do with adjectives such as former. [sent-62, score-1.151]
19 The trio of intersective color terms, subsective color terms, and intensional adjectives provides fertile ground for exploring the different composition functions that have been proposed for distributional semantic representations. [sent-63, score-2.318]
20 Most of these functions start from the assumption that composition takes pairs of vectors (e. [sent-64, score-0.347]
21 Such functions, insofar as they yield representations which strengthen distributional features shared by the component vectors, would be expected to model intersective modification. [sent-69, score-0.555]
22 Combining the two vectors with an additive or multiplicative operation should rightly yield a vector for white dress which assigns a higher frequency to wedding than to funeral. [sent-76, score-0.55]
23 Additive and multiplicative functions might also be expected to handle subsective modification with some success because these operations provide a natural account for how polysemy is resolved in meaning composition. [sent-77, score-0.7]
24 Thus, the vector that results from adding or multiplying the vector for white with that for dress should differ in crucial features from the one that results from combining the same vector for white with that for wine. [sent-78, score-0.618]
25 In contrast, it is not immedi- ately obvious how these operations would fare with intensional adjectives such as former. [sent-80, score-0.578]
26 In particular, it is not clear what specific distributional features of the adjective would capture the effect that the ad1225 jective has on the meaning of the resulting modified nominal. [sent-81, score-0.292]
27 On such models, the distributional properties of observed occurrences of adjective-noun pairs are used to induce the effect of adjectives on nouns. [sent-83, score-0.329]
28 There is also no a priori reason to think that it would fare more poorly at modeling the intersective and subsective adjectives than would additive or multiplicative analyses, given its generality. [sent-85, score-1.118]
29 3 Method We built a semantic space and tested the composition functions as specified in what follows. [sent-87, score-0.288]
30 2 Vocabulary The core vocabulary of the semantic space consists of the 8K most frequent nouns and the 4K most frequent adjectives from the corpus. [sent-108, score-0.292]
31 For each function, we define p as the composition of the adjective vector, u, and the noun vector, v, a nomenclature that follows Mitchell and Lapata (2010). [sent-127, score-0.432]
32 Additive (add) AN vectors were obtained by summing the corresponding adjective and noun vectors. [sent-128, score-0.383]
33 We also explored the effects of the additive model with normalized component adjective and noun vectors (addn). [sent-129, score-0.458]
34 p = u+ v (2) Multiplicative (mult) AN vectors were obtained by component-wise multiplication of the adjective and noun vectors in the non-reduced semantic space. [sent-130, score-0.571]
35 An AN vector is obtained by multiplying the weight matrix by the concatenation of the adjective and noun vectors, so that each dimension of the generated AN vector is a linear combination of dimensions of the corresponding adjective and noun vectors. [sent-146, score-0.707]
36 Coefficient matrix estimation is per- formed by feeding PLSR a set of input-output examples, where the input is given by concatenated adjective and noun vectors, and the output is the vector of the corresponding AN directly extracted from our 1227 semantic space. [sent-150, score-0.382]
37 The linear equation coefficients are estimated again using PLSR, and in the present implementation we use ridge regression generalized cross-validation (GCV) to automatically choose the optimal ridge parameter for each adjective (Golub et al. [sent-155, score-0.263]
38 The model is trained on all NAN vector pairs available in the semantic space for each adjective, and range from 100 to over 1K items across the adjectives we tested. [sent-158, score-0.295]
39 3 Datasets We built two datasets of adjective-noun phrases for the present research, one with color terms and one with intensional adjectives. [sent-160, score-0.682]
40 white photograph, for black and white photograph) or because the head noun was semantically transparent (white variety). [sent-166, score-0.482]
41 The remaining 369 ANs were tagged independently by the second and fourth authors of this paper, both native English speaker linguists, as intersective (e. [sent-167, score-0.417]
42 (to appear) for an analysis of the color term dataset from a multimodal perspective. [sent-183, score-0.326]
43 7 There were too few instances of idioms (17) for a quantitative analysis of the sort presented here, so these are collapsed with the subsective class in what follows. [sent-196, score-0.395]
44 8 The dataset as used here consists of 239 intersective and 130 subsective ANs. [sent-197, score-0.812]
45 The intensional dataset contains all ANs in the semantic space with a preselected list of 10 intensional adjectives, manually pruned by one of the authors of the paper to eliminate erroneous examples and to ensure that the adjective was being intensionally used. [sent-199, score-0.918]
46 9Alleged, one ofthe most prototypical intensional adjectives, is not considered here because it was not among the 700 most frequent adjectives in the space. [sent-220, score-0.547]
47 1228 Intersective Subsective Intensional white towelwhite wineartificial leg black sack black athlete former bassist green coat green politics likely suspect red disc red ant possible delay blue square blue state theoretical limit Table 1: Example ANs in the datasets. [sent-222, score-0.745]
48 4 Observed vectors We began by exploring the empirically observed vectors for the adjectives (A), nouns (N), and adjective-noun phrases (AN) in the datasets, as they are represented in the semantic space. [sent-223, score-0.546]
49 Note that we are working with the AN vectors directly harvested from the corpora (that is, based on the cooccurrence of, say, the phrase white towel with each of the 10K words in the space dimensions), without doing any composition. [sent-224, score-0.371]
50 AN vectors obtained by composition will be examined in the following section. [sent-225, score-0.289]
51 Figure 1 shows the distribution of the cosines between A, N, and AN vectors with intensional adjectives (I, white box), intersective uses of color terms (IE, lighter gray box), and subsective uses of color terms (S, darker gray box). [sent-227, score-2.414]
52 We find significant differences between the three types of adjectives in the similarity between AN and A vectors (middle graph of Figure 1). [sent-231, score-0.309]
53 The adjective and adjective-noun phrase vectors are nearer for 10The frequency of the adjectives in the datasets range from 3. [sent-232, score-0.506]
54 We report the cosines between the component adjective and noun vectors (cos(A,N)), between the observed AN and adjective vectors (cos(AN,A)), and between the observed AN and noun vectors (cos(AN,N)). [sent-241, score-1.063]
55 Each chart contains three boxplots with the distribution of the cosine scores (y-axis) for the intensional (I), intersective (IE), and subsective (S) types of ANs. [sent-242, score-1.187]
56 intersective uses than for subsective uses of color terms, a pattern that parallels the difference in the distance between component A and N vectors. [sent-249, score-1.171]
57 As for intensional adjectives, the middle graph shows that their AN vectors are quite distant from the corresponding A vectors, in sharp contrast to what we find with both intersective and subsective – 1229 color terms. [sent-252, score-1.588]
58 We hypothesize that the results for the intensional adjectives are due to the fact that they cannot plausibly be modeled as first order attributes (i. [sent-253, score-0.545]
59 being potential or apparent is not a property in the same sense that being white or yellow is) and thus typically do not restrict the nominal description per se, but rather provide information about whether or when the nominal description applies. [sent-255, score-0.308]
60 The result is that intensional adjectives should be even weaker than subsectively used adjectives, in comparison with the nouns with which they combine, in their ability to “pull” the AN vector in their direction. [sent-256, score-0.628]
61 An examination of the average distances among the nearest neighbors of the intensional and of the color adjectives in the distributional space supports our hypothesized account of their contrasting behaviors. [sent-259, score-0.996]
62 We predict that the nearest neighbors are more dispersed for adjectives that cannot be modeled as first-order properties (i. [sent-260, score-0.298]
63 , intensional adjectives), than for those that can (here, the color terms). [sent-262, score-0.656]
64 We find that the average cosine distance among the nearest ten neighbors of the intensional adjectives is 0. [sent-263, score-0.596]
65 001) than the average similarity among the nearest neighbors of the color adjectives, 0. [sent-266, score-0.38]
66 Finally, with respect to the distances between the adjective-noun and head noun vectors (right graph of Figure 1), there is no significant difference for the intersective vs. [sent-269, score-0.603]
67 This can be explained by the fact that both kinds of modifiers are subsective, that is, the fact that a white dress is a dress and that white wine is wine. [sent-271, score-0.701]
68 In contrast, intensional ANs are closer to their component Ns than are color ANs (the difference is qualitatively quite small, but significant even for the intersective vs. [sent-272, score-1.106]
69 intensional ANs according to a t-test, p-value = 0. [sent-273, score-0.33]
70 This effect, the inverse of what we find with the AN-A vectors, can similarly be explained by the fact that intensional adjectives do not restrict the descriptive content of the noun they modify, in contrast to both the intersective and subsective color ANs. [sent-275, score-1.723]
71 Finally, note that, contrary to predictions from some approaches in formal semantics, subsective color ANs and intensional ANs do not pattern together: subsective ANs are closer to their component As, and intensional ANs closer to their component Ns. [sent-279, score-1.884]
72 5 Composed vectors Since intersective modification is the point of comparison for both subsective and intensional modification, we first discuss the composed vectors for the intersective vs. [sent-282, score-1.972]
73 subsective uses of color terms, and then turn to intersective vs. [sent-283, score-1.138]
74 Table 2 provides a summary of the results with the observed data (obs) and the composition functions discussed in Section 3. [sent-287, score-0.266]
75 It is computed by finding the cosine between the composed AN vectors and all rows in the semantic space and then determining the rank in which the observed ANs are found. [sent-290, score-0.27]
76 11 The remaining columns report the differences in standardized (z-score) cosines between the vector built with each of the composition functions and the observed AN, A, and N vectors. [sent-291, score-0.377]
77 A positive value means that the cosines for intersective uses are higher, while a negative value means that the cosines for subsective uses are higher. [sent-292, score-0.944]
78 The first column reports the rank of the observed equivalent (ROE), the rest report the differences (∆) betwen the intersective and subsective uses ofcolor terms when comparing the composed AN with the observed vectors for: AN, adjective (A), noun (N). [sent-296, score-1.326]
79 In both cases, we find that these functions yield higher similarities for AN-A for the intersective than for the subsective uses of color terms, and a very slight (though still mildly significant) difference for the distance to the head noun. [sent-303, score-1.196]
80 This suggests that, for adjectival modification, providing a vector that is in the middle of the two component vectors (which is what normalized addition does) is a reasonable approximation of the observed vectors. [sent-305, score-0.333]
81 The non-normalized version also cannot account for these effects because the adjective vector, being much longer (as color terms are very frequent), totally dominates the AN, which results in no difference across uses when comparing to the adjective or to the noun. [sent-307, score-0.746]
82 A possible explanation for the ANA results is that lim learns from such a broad range of AN pairs that the impact of the distance between intersective vs. [sent-310, score-0.478]
83 subsective uses of color terms from their component adjectives is dampened. [sent-311, score-0.969]
84 All composition functions except for alm find intersective uses easier to model. [sent-313, score-0.798]
85 This is shown in the positive values in column ∆:AN, which mean that the similarity between observed and composed AN vectors is greater for intersective than for subsective ANs. [sent-314, score-0.998]
86 The subsective uses are specific to the nouns with which the color terms combine, and the exact interpretation of the adjective varies across those nouns. [sent-316, score-1.011]
87 In contrast, the interpretation associated with intersective use is consistent across a larger variety of nouns, and in that sense should be predominantly reflected in the adjective’s vector. [sent-317, score-0.442]
88 And indeed, alm is the only function that shows no difference in difficulty (distance) between the predicted and observed AN vectors for intersective vs. [sent-319, score-0.73]
89 Both mult and alm seem to account for the observed patterns in color terms. [sent-321, score-0.545]
90 However, an examination of the nearest neighbors of the composed ANs suggest that alm captures the semantics of adjective composition in this case to a larger extent than mult. [sent-322, score-0.644]
91 For instance, the NN for blue square (intersective) are the following according to mul: blue, red, official colour, traditional colour, blue number, yellow; while alm yields the following: blue square, red square, blue circle, blue triangle, blue pattern, yellow circle. [sent-323, score-0.543]
92 Similarly, for green politics (subsective) mul yields: pleasant land, green business, green politics, green issue, green strategy, green product, while alm yields green politics, green movement, political agenda, environmental movement, progressive government, political initiative. [sent-324, score-0.851]
93 2 Intensional modification Table 3 contains the results of the composition functions comparing the behavior of intersective color ANs and intensional ANs. [sent-326, score-1.446]
94 As noted above, we expect more difficulty in modeling intensional modification vs. [sent-334, score-0.476]
95 The difference with the results in the previous subsection is that in this case the alm function does present a higher difficulty in modeling intensional ANs, unlike with the color terms. [sent-337, score-0.81]
96 This points to a qualitative difference between subsective and intensional adjectives that could be evidence for a first-order analysis of subsective color terms. [sent-338, score-1.635]
97 Again, alm seems to be capturing relevant semantic aspects of composition with intensional adjectives. [sent-344, score-0.689]
98 Our results also show that alm performs better than lim, but it is worth observing that it does so at the expense of modeling each adjective as a completely different function. [sent-348, score-0.351]
99 However, the linguistic literature and the present results suggest that it might be useful to try a compromise between alm and lim, training one matrix for each subclass of adjectives under analysis. [sent-352, score-0.381]
100 Beyond the new data it offers regarding the comparative ability of the different composition functions to account for different kinds of adjectival modification, the study presented here underscores the complexity of modification as a semantic phenomenon. [sent-353, score-0.527]
wordName wordTfidf (topN-words)
[('intersective', 0.417), ('subsective', 0.395), ('intensional', 0.33), ('color', 0.326), ('ans', 0.221), ('adjective', 0.197), ('white', 0.193), ('adjectives', 0.189), ('composition', 0.169), ('alm', 0.154), ('modification', 0.146), ('vectors', 0.12), ('wine', 0.102), ('adjectival', 0.096), ('roe', 0.088), ('baroni', 0.081), ('green', 0.08), ('dress', 0.075), ('distributional', 0.072), ('noun', 0.066), ('cosines', 0.066), ('modifiers', 0.063), ('lim', 0.061), ('zamparelli', 0.06), ('functions', 0.058), ('politics', 0.057), ('guevara', 0.055), ('blue', 0.053), ('multiplicative', 0.047), ('vector', 0.045), ('bassist', 0.044), ('dilation', 0.044), ('lmi', 0.044), ('plsr', 0.044), ('semantics', 0.043), ('additive', 0.042), ('formal', 0.042), ('nouns', 0.042), ('cos', 0.04), ('yellow', 0.04), ('observed', 0.039), ('matrix', 0.038), ('mitchell', 0.038), ('median', 0.037), ('semantic', 0.036), ('lapata', 0.034), ('component', 0.033), ('addn', 0.033), ('insofar', 0.033), ('leg', 0.033), ('ridge', 0.033), ('towel', 0.033), ('multiplication', 0.032), ('erk', 0.032), ('dimensions', 0.031), ('operations', 0.031), ('red', 0.031), ('former', 0.03), ('black', 0.03), ('tradition', 0.03), ('properties', 0.029), ('fare', 0.028), ('prototypical', 0.028), ('wedding', 0.028), ('compositional', 0.028), ('composed', 0.027), ('neighbors', 0.027), ('nearest', 0.027), ('modeled', 0.026), ('license', 0.026), ('mult', 0.026), ('firstorder', 0.026), ('nominal', 0.026), ('terms', 0.026), ('matrices', 0.025), ('interpretation', 0.025), ('space', 0.025), ('cosine', 0.023), ('property', 0.023), ('meaning', 0.023), ('grefenstette', 0.022), ('dl', 0.022), ('multiplying', 0.022), ('boleda', 0.022), ('boxplots', 0.022), ('evert', 0.022), ('gemma', 0.022), ('geometry', 0.022), ('golub', 0.022), ('intersectively', 0.022), ('kennedy', 0.022), ('mcnally', 0.022), ('mevik', 0.022), ('obs', 0.022), ('photograph', 0.022), ('rumor', 0.022), ('subsectively', 0.022), ('synthese', 0.022), ('underscores', 0.022), ('vecchi', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
2 0.18884805 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
3 0.13704175 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
4 0.056264952 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret
Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.
5 0.050092552 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
Author: Mathias Verbeke ; Vincent Van Asch ; Roser Morante ; Paolo Frasconi ; Walter Daelemans ; Luc De Raedt
Abstract: Evidence-based medicine is an approach whereby clinical decisions are supported by the best available findings gained from scientific research. This requires efficient access to such evidence. To this end, abstracts in evidence-based medicine can be labeled using a set of predefined medical categories, the socalled PICO criteria. This paper presents an approach to automatically annotate sentences in medical abstracts with these labels. Since both structural and sequential information are important for this classification task, we use kLog, a new language for statistical relational learning with kernels. Our results show a clear improvement with respect to state-of-the-art systems.
6 0.047639858 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
7 0.046049178 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
8 0.042433891 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
9 0.04238857 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence
10 0.037628297 61 emnlp-2012-Grounded Models of Semantic Representation
11 0.037326414 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
12 0.033417307 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
13 0.032395504 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
14 0.03211385 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
15 0.030490879 81 emnlp-2012-Learning to Map into a Universal POS Tagset
16 0.029900223 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
17 0.028424701 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
18 0.028377978 49 emnlp-2012-Exploring Topic Coherence over Many Models and Many Topics
19 0.027464248 112 emnlp-2012-Resolving Complex Cases of Definite Pronouns: The Winograd Schema Challenge
20 0.025790211 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification
topicId topicWeight
[(0, 0.124), (1, 0.032), (2, -0.002), (3, 0.067), (4, 0.057), (5, 0.108), (6, 0.078), (7, 0.071), (8, 0.158), (9, 0.065), (10, -0.28), (11, 0.039), (12, -0.011), (13, -0.055), (14, 0.051), (15, -0.122), (16, 0.104), (17, -0.093), (18, 0.069), (19, -0.015), (20, 0.008), (21, 0.03), (22, -0.128), (23, 0.013), (24, -0.031), (25, 0.054), (26, -0.064), (27, 0.05), (28, -0.054), (29, 0.052), (30, -0.004), (31, 0.015), (32, -0.001), (33, 0.091), (34, -0.021), (35, -0.032), (36, -0.08), (37, 0.064), (38, 0.073), (39, 0.053), (40, 0.196), (41, -0.054), (42, -0.044), (43, 0.092), (44, -0.079), (45, 0.019), (46, -0.125), (47, 0.067), (48, -0.131), (49, -0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.95909828 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
2 0.73485374 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
Author: William Blacoe ; Mirella Lapata
Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.
3 0.66991192 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces
Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng
Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.
4 0.42268297 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret
Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.
5 0.37122372 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt
Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.
6 0.35617384 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories
7 0.3088119 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
8 0.25794882 61 emnlp-2012-Grounded Models of Semantic Representation
9 0.24687552 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
10 0.22626705 29 emnlp-2012-Concurrent Acquisition of Word Meaning and Lexical Categories
11 0.22420768 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
12 0.18608934 118 emnlp-2012-Source Language Adaptation for Resource-Poor Machine Translation
13 0.18279932 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
14 0.17334895 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure
15 0.16548008 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
16 0.16151465 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
17 0.16051207 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence
18 0.15949598 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
19 0.15679361 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
20 0.15435135 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
topicId topicWeight
[(2, 0.015), (16, 0.027), (34, 0.036), (45, 0.052), (60, 0.061), (63, 0.04), (64, 0.026), (65, 0.444), (70, 0.013), (74, 0.03), (76, 0.084), (79, 0.011), (80, 0.012), (86, 0.024), (95, 0.026)]
simIndex simValue paperId paperTitle
1 0.97486866 40 emnlp-2012-Ensemble Semantics for Large-scale Unsupervised Relation Extraction
Author: Bonan Min ; Shuming Shi ; Ralph Grishman ; Chin-Yew Lin
Abstract: Discovering significant types of relations from the web is challenging because of its open nature. Unsupervised algorithms are developed to extract relations from a corpus without knowing the relations in advance, but most of them rely on tagging arguments of predefined types. Recently, a new algorithm was proposed to jointly extract relations and their argument semantic classes, taking a set of relation instances extracted by an open IE algorithm as input. However, it cannot handle polysemy of relation phrases and fails to group many similar (“synonymous”) relation instances because of the sparseness of features. In this paper, we present a novel unsupervised algorithm that provides a more general treatment of the polysemy and synonymy problems. The algorithm incorporates various knowledge sources which we will show to be very effective for unsupervised extraction. Moreover, it explicitly disambiguates polysemous relation phrases and groups synonymous ones. While maintaining approximately the same precision, the algorithm achieves significant improvement on recall compared to the previous method. It is also very efficient. Experiments on a realworld dataset show that it can handle 14.7 million relation instances and extract a very large set of relations from the web. Ralph Grishman1 Chin-Yew Lin2 2Microsoft Research Asia Beijing, China { shumings cyl } @mi cro s o ft . com , that has many applications in answering factoid questions, building knowledge bases and improving search engine relevance. The web has become a massive potential source of such relations. However, its open nature brings an open-ended set of relation types. To extract these relations, a system should not assume a fixed set of relation types, nor rely on a fixed set of relation argument types. The past decade has seen some promising solutions, unsupervised relation extraction (URE) algorithms that extract relations from a corpus without knowing the relations in advance. However, most algorithms (Hasegawa et al., 2004, Shinyama and Sekine, 2006, Chen et. al, 2005) rely on tagging predefined types of entities as relation arguments, and thus are not well-suited for the open domain. Recently, Kok and Domingos (2008) proposed Semantic Network Extractor (SNE), which generates argument semantic classes and sets of synonymous relation phrases at the same time, thus avoiding the requirement of tagging relation arguments of predefined types. However, SNE has 2 limitations: 1) Following previous URE algorithms, it only uses features from the set of input relation instances for clustering. Empirically we found that it fails to group many relevant relation instances. These features, such as the surface forms of arguments and lexical sequences in between, are very sparse in practice. In contrast, there exist several well-known corpus-level semantic resources that can be automatically derived from a source corpus and are shown to be useful for generating the key elements of a relation: its 2 argument semantic classes and a set of synonymous phrases. For example, semantic classes can be derived from a source corpus with contextual distributional simi1 Introduction Relation extraction aims at discovering semantic larity and web table co-occurrences. The “synonymy” 1 problem for clustering relation instances relations between entities. It is an important task * Work done during an internship at Microsoft Research Asia 1027 LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in02t C7–o1n0f3e7re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a.l ? Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls could potentially be better solved by adding these resources. 2) SNE assumes that each entity or relation phrase belongs to exactly one cluster, thus is not able to effectively handle polysemy of relation phrases2. An example of a polysemous phrase is be the currency of as in 2 triples
same-paper 2 0.91194391 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
Author: Gemma Boleda ; Eva Maria Vecchi ; Miquel Cornudella ; Louise McNally
Abstract: Adjectival modification, particularly by expressions that have been treated as higherorder modifiers in the formal semantics tradition, raises interesting challenges for semantic composition in distributional semantic models. We contrast three types of adjectival modifiers intersectively used color terms (as in white towel, clearly first-order), subsectively used color terms (white wine, which have been modeled as both first- and higher-order), and intensional adjectives (former bassist, clearly higher-order) and test the ability of different composition strategies to model their behavior. In addition to opening up a new empirical domain for research on distributional semantics, our observations concerning the attested vectors for the different types of adjectives, the nouns they modify, and the resulting – – noun phrases yield insights into modification that have been little evident in the formal semantics literature to date.
3 0.81671697 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge
Author: Lev Ratinov ; Dan Roth
Abstract: We explore the interplay of knowledge and structure in co-reference resolution. To inject knowledge, we use a state-of-the-art system which cross-links (or “grounds”) expressions in free text to Wikipedia. We explore ways of using the resulting grounding to boost the performance of a state-of-the-art co-reference resolution system. To maximize the utility of the injected knowledge, we deploy a learningbased multi-sieve approach and develop novel entity-based features. Our end system outperforms the state-of-the-art baseline by 2 B3 F1 points on non-transcript portion of the ACE 2004 dataset.
4 0.52396154 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
Author: Jayant Krishnamurthy ; Tom Mitchell
Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.
5 0.51403075 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing
Author: Hui Yang
Abstract: Taxonomies can serve as browsing tools for document collections. However, given an arbitrary collection, pre-constructed taxonomies could not easily adapt to the specific topic/task present in the collection. This paper explores techniques to quickly derive task-specific taxonomies supporting browsing in arbitrary document collections. The supervised approach directly learns semantic distances from users to propose meaningful task-specific taxonomies. The approach aims to produce globally optimized taxonomy structures by incorporating path consistency control and usergenerated task specification into the general learning framework. A comparison to stateof-the-art systems and a user study jointly demonstrate that our techniques are highly effective. .
6 0.44299564 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities
7 0.42844412 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
8 0.42698672 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
9 0.42536169 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
10 0.41798285 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic
11 0.41534898 103 emnlp-2012-PATTY: A Taxonomy of Relational Patterns with Semantic Types
12 0.40706137 97 emnlp-2012-Natural Language Questions for the Web of Data
13 0.40506461 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction
14 0.3993066 62 emnlp-2012-Identifying Constant and Unique Relations by using Time-Series Text
15 0.39638707 85 emnlp-2012-Local and Global Context for Supervised and Unsupervised Metonymy Resolution
16 0.39186135 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
17 0.39150423 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
18 0.3862502 72 emnlp-2012-Joint Inference for Event Timeline Construction
19 0.38036516 100 emnlp-2012-Open Language Learning for Information Extraction
20 0.37859601 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation