emnlp emnlp2013 emnlp2013-129 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yanchuan Sim ; Brice D. L. Acree ; Justin H. Gross ; Noah A. Smith
Abstract: We seek to measure political candidates’ ideological positioning from their speeches. To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. We then represent the speeches of U.S. Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. The results are validated against a set of preregistered, domain expertauthored hypotheses.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We seek to measure political candidates’ ideological positioning from their speeches. [sent-5, score-0.704]
2 To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. [sent-6, score-0.858]
3 Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). [sent-9, score-0.231]
4 We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. [sent-10, score-0.497]
5 1 Introduction The artful use of language is central to politics, and the language of politicians has attracted considerable interest among scholars of political communication and rhetoric (Charteris-Black, 2005; Hart, 2009; Deirmeier et al. [sent-12, score-0.258]
6 In American politics, candidates for office give speeches and write books and manifestos expounding their ideas. [sent-17, score-0.315]
7 Every political season, however, there are accusations of candidates “flipflopping” on issues, with opinion shows, late-night comedies, and talk radio hosts replaying clips of candidates contradicting earlier statements. [sent-18, score-0.45]
8 The expectation follows directly from long-standing and widely influential theories of political competition that are collectively referred to in their simplest form as the “median voter theorem” (Hotelling, 1929; Black, 1948; Downs, 1957). [sent-30, score-0.258]
9 Do political candidates in fact stray ideologically at opportune moments? [sent-32, score-0.367]
10 More specifically, can we measure candidates’ ideological positions from their prose at different times? [sent-33, score-0.455]
11 Following much work on classifying the political ideology expressed by a piece of text (Laver et al. [sent-34, score-0.678]
12 , 2008), we start from the assumption that a candidate’s choice of words and phrases reflects a deliberate attempt to signal common cause with a target audience, and as a broader strategy, to respond to political competitors. [sent-36, score-0.258]
13 Our central hypothesis is that, despite candidates’ intentional vagueness, differences in position—among candidates or over time—can be automatically detected and described as proportions of ideologies expressed in a speech. [sent-37, score-0.515]
14 In this work, we operationalize ideologies in a novel empirical way, exploiting political writings published in explicitly ideological books and magazines (§2). [sent-38, score-1.113]
15 1 The corpus then serves as evidence for 1We consider general positions in terms of broad ideological groups that are widely discussed in current political discourse (e. [sent-39, score-0.713]
16 tc ho2d0s13 in A Nsasotucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag uesis 9ti1c–s101, CENTER-LEFT PROG FESALRIGVOEUFSTLE FTBAC KEGNRTOERUN-DRIGHTIEGLHTIFOAURSLI PBOEGPRHUT LAIRSTN Figure 1: Ideology tree showing the labels for the ideological corpus in §2. [sent-43, score-0.456]
17 These lexicons are used, in turn, to create a lowdimensional representation of political speeches: a speech is a sequence of cues interspersed with lags. [sent-48, score-0.393]
18 In other words, a speech is represented as a series alternating between cues signaling ideological positions and uninteresting filler. [sent-50, score-0.562]
19 Our main contribution is a probabilistic technique for inferring proportions of ideologies expressed by a candidate (§3). [sent-51, score-0.469]
20 The inputs to the model are the cue-lag representation oinfp a speech a mndo a ldo amrea tihnespecific topology relating ideologies to each other. [sent-52, score-0.348]
21 The topology tree (shown in Figure 1) encoding the closeness of different ideologies and, by extension, the odds oftransitioning between them within a speech. [sent-53, score-0.384]
22 2 First Stage: Cue Extraction We first present a data-driven technique for automatically constructing “cue lexicons” from texts labeled with ideologies by domain experts. [sent-58, score-0.307]
23 1 Ideological Corpus We start with a collection of contemporary political writings whose authors are perceived as representative of one particular ideology. [sent-64, score-0.303]
24 A political science domain expert who is a co-author of this work manually labeled each element in a collection of 112 books and 10 magazine with one of three coarse ideologies: LE F T, RI G H T , or CE N T E R. [sent-67, score-0.375]
25 1 Table 1 summarizes key details about the ideological corpus. [sent-69, score-0.42]
26 In addition to ideology labels, individual chapters within the books were manually tagged with topics that the chapter was about. [sent-70, score-0.525]
27 3We cannot claim that these texts are “pure” examples of the ideologies they are labeled with (i. [sent-74, score-0.307]
28 2 Cue Discovery Model We use the ideological corpus to infer ideological cues: terms that are strongly associated with an ideology. [sent-84, score-0.868]
29 Because our ideologies are organized hierarchically, we required a technique that can account for multiple effects within a single text. [sent-85, score-0.352]
30 These attributes include both the ideology and its coarsened version (e. [sent-92, score-0.453]
31 • ηic, the coarse ideology effect, which takes different values for LE F T, RI G H T, and CE N T E R. [sent-103, score-0.42]
32 • ηif , the fine ideology effect, which takes different vηalues for the fine-grained ideologies corresponding to the leaves in Fig. [sent-104, score-0.727]
33 We are only interested, however, in the ideological attributes I⊂ A. [sent-120, score-0.453]
34 For an ideological a itdtreibouloteg ic ∈ I, we teask eI t ⊂he Ate. [sent-121, score-0.42]
35 rm Fso wri athn positive ceallem atternibtsu toef ith ∈is vector atkoe e b teh eth tee cues fitohr ideology i; call this set L (i) and let L = Si∈I L (i). [sent-122, score-0.527]
36 Because political texts use a fair amount of multiword jargon, we initially represented each document as a bag of unigrams, bigrams, and trigrams, ignoring the fact that these “overlap” with each other. [sent-123, score-0.258]
37 Selected features were used to classify the ideologies of held-out documents from our corpus. [sent-131, score-0.307]
38 5 Cue Lexicon We ran SAGE on the the full ideological book corpus, including topic effects, and setting λ = 30, obtained a set of |L | = 8, 483 cue terms. [sent-150, score-0.641]
39 s T hases soucpiaptleedwith various ideologies and a heatmap of similarities among SAGE vectors. [sent-152, score-0.307]
40 About 70% of ideologies were correctly matched by experts, with relatively few confusions between LE F T and RI G H T . [sent-154, score-0.307]
41 3 Second Stage: Cue-Lag Ideological Proportions The main contribution of this paper is a technique for measuring ideology proportions in the prose of political candidates. [sent-156, score-0.803]
42 We adopt a Bayesian approach that manages our uncertainty about the cue lexicon L, the tendencies of political speakers to “flipflop” among ideological types, and the relative “distances” among different ideologies. [sent-157, score-0.873]
43 Moreover, the ability to draw inferences about individual policy-makers’ ideologies from their votes on proposed legislation is severely limited by institutional constraints on the types of legislation that is actually subject to recorded votes. [sent-160, score-0.39]
44 1 Political Speeches Corpus We gathered transcribed speeches given by candidates of the two main parties (Democrats and Republicans) during the 2008 and 2012 Presidential election seasons. [sent-162, score-0.367]
45 ” Table 2 presents a breakdown of the candidates and speeches in our corpus. [sent-167, score-0.266]
46 2 Cue-Lag Representation Our measurement model only considers ideological cues; other terms are treated as filler. [sent-169, score-0.42]
47 The representation is a sequence of alternating cues (elements from the ideological lexicon L) and integer “lags” (counts of non-cue terms falling between two cues). [sent-171, score-0.527]
48 ‡For Romney, we have 13 speeches which he gave in the period 2008-201 1 (between his withdrawal from the 2008 elections and before the commencement of the 2012 elections). [sent-176, score-0.31]
49 While these speeches are not technically part of the regular Presidential election campaign, they can be seen as his preparation towards the 2012 elections, which is particularly interesting as Romney has been accused of having inconsistent viewpoints. [sent-177, score-0.25]
50 When two cues partially overlap, we treat them as consecutive cue terms and set the lag to 0. [sent-181, score-0.385]
51 3 CLIP: An Ideology HMM The model we use to infer ideologies, cue-lag ideological proportions (CLIP), is a hidden Markov model. [sent-184, score-0.573]
52 The emission from a state consists of (i) a cue from L and (ii) a lag value. [sent-187, score-0.339]
53 gwaBrloesuwtfihnactr→−eh1d (b) Emit cue term Wt from the lexicon L and lag Lt based on the emission distribution, discussed in §3. [sent-216, score-0.339]
54 1 Ideology Topology and Transition Parameterization CLIP assumes that each cue term uttered by a politician is generated from a hidden state corresponding to an ideology. [sent-224, score-0.259]
55 The ideologies are organized into a tree based on their hierarchical relationships; see Fig. [sent-225, score-0.343]
56 The ideology tree is used in defining the transition distribution in the HMM, but not to directly define the topology of the HMM. [sent-228, score-0.561]
57 Importantly, each state may transition to any other state, but the transition distribution is defined using the graph, so that ideologies that are closer to each other will tend to be more likely to transition to each other. [sent-229, score-0.531]
58 To transition between two states si and sj, a walk must be taken in the tree from vertex si to vertex sj. [sent-230, score-0.28]
59 96 In order to capture the intuition that a longer lag after a cue term should increase the entropy over the next ideology state, we introduce a restart probability, which is conditioned on the length of the most recent lag, ‘. [sent-241, score-0.788]
60 For each state s, we let the probability of emitting cue w be denoted by ψs,w; ψs is a multinomial distribution over the entire lexicon L. [sent-253, score-0.227]
61 This allows our approach to handle ambiguous cues that can associate with more than one ideology, and also to associate a cue with a different ideology than our cue discovery method proposed, if the signal from the data is sufficiently strong. [sent-254, score-0.917]
62 Specifically, we seek to measure differences in time spent in each ideology state. [sent-263, score-0.444]
63 On the other hand, we expect that a speaker draws from his ideological lexicon similarly across different epochs—there is a single ψ shared between different epochs. [sent-265, score-0.42]
64 In order to manage uncertainty about the parameters of CLIP, to incorporate prior beliefs based on our ideology-specific cue lexicons {L (i)}i, and to oalulrow id sharing poefc isftiacti csuticea l e strength across conditions, we adopt a Bayesian approach to inference. [sent-266, score-0.255]
65 For the cue emission distribution associated with ideology i, ψsi , we use an informed Dirichlet prior with two different values, βcue for cues in L (i), and a smaller βdef for those in L \ L (i). [sent-269, score-0.751]
66 At each Gibbs step, we resample the ideology state and restart indicator variable for every cue term in every speech. [sent-272, score-0.737]
67 For each candidate, we collected 5,000 posterior samples which we use to infer his/her ideological proportions. [sent-275, score-0.448]
68 In order to determine the amount of time a candidate spends in each ideology, we denote the unit of time in terms ofhalfthe lag before and after each cue 8This implies that a term can, in the posterior distribution, be associated with an ideology iof whose L(i) it was not a member. [sent-276, score-0.793]
69 , when a candidate draws a cue term from ideology iduring timestep t, we say that he spends (Lt−1 + Lt) amount of time in ideology i. [sent-280, score-1.169]
70 Averaging over all the samples returned by our sampler and normalizing it by the length of the documents in 21 each epoch, we obtain a candidate’s expected ideological proportions within the epoch. [sent-281, score-0.545]
71 In the case of recent political speeches, however, we are doubtful that such judgments can be made objectively at a fine-grained level. [sent-283, score-0.258]
72 While we are confident about gross categorization of books and magazines in our ideological corpus (§2. [sent-284, score-0.503]
73 This eliminates Markovian dependencies between ideologies at nearby timesteps, but still uses the ideology tree in defining the probabilities of each state through θ. [sent-299, score-0.795]
74 In MIX, there are no temporal effects between cue terms, although the structure of our ideology tree encourages the speaker to draw from coarse-grained ideologies over fine-grained ideologies. [sent-305, score-1.003]
75 On the other hand, the strong Markovian dependency between states in NORES would encourage the model to stay local within the ideology tree. [sent-306, score-0.42]
76 In our experiments, we will see how that the ideology tree and the ran• dom treatment of restarting both contribute to our model’s inferences. [sent-307, score-0.482]
77 Sanity checks (S1–3) CLIP correctly identifies sixteen LE F T/RI G H T alignments of primary candidates (S1, S2), but is unable to determine one candidate’s orientation; it finds Jon Huntsman to spend roughly equal proportions of speech-time drawing on LE F T and RI G H T cue terms. [sent-310, score-0.475]
78 Name interference When we looked at the cue terms actually used in the speeches, we found one systematic issue: the inclusion of candidates’ names as cue terms. [sent-319, score-0.39]
79 Terms mentioning John McCain are associated with the RI G H T , so that Obama’s mentions of his opponent are taken as evidence for rightward positioning; in total, mentions of McCain contributed 4% absolute to Obama’s RI G H T ideological proportion. [sent-320, score-0.42]
80 Similarly, barack obama and presid obama are LE F T cues (though senat obama is a RI G H T cue). [sent-321, score-0.399]
81 Strong hypotheses P1 and P2 CLIP and the variants making use of the ideology tree were in agreement on most of the strong primary hypotheses. [sent-323, score-0.596]
82 This model did no worse than other variants here and much better than one: HMM had 10 inconsistencies out of 32 opportunities, suggesting the importance of the ideology tree. [sent-328, score-0.42]
83 “Etch-a-Sketch” hypotheses Hypotheses P3, P4, P5, P7, and P8 are all concerned with differences between the primary and general elections: successful primary candidates are expected to “move to the center. [sent-330, score-0.295]
84 ” A visualization of CLIP’s proportions for McCain, Romney, and Obama is shown in Figure 4, with their speeches grouped together by different epochs. [sent-331, score-0.308]
85 Likewise, in P7, the difference between McCain drawing from FAR RI G H T, P O P U L I T S and L I E R TAR I between the 2008 primary and B AN general elections is only 2% and highly uncertain, with a 95% credible interval of 44–50% during the primary (vs. [sent-334, score-0.271]
86 Fine-grained ideologies Fine-grained ideologies are expected to account for smaller proportions, so that making predictions about them is quite difficult. [sent-336, score-0.614]
87 This is especially true for primary elections, where a broader palette of ideologies is expected to be drawn from, but we have fewer speeches from each candidate. [sent-337, score-0.562]
88 CLIP’s inferences on the corpus of political speeches can be browsed at http : / /www . [sent-340, score-0.472]
89 CLIP’s deviations from the hypotheses are suggestive of potential improvements to cue extraction (§2), but also of incorrect hypotheses. [sent-346, score-0.288]
90 We plan to use CLIP as a text analysis method to support substantive inquiry in political science, such as following trends in expressed ideology over time. [sent-351, score-0.678]
91 6 Related Work As early as the 1960s, there has been research on modeling ideological beliefs using automated systems (Abelson and Carroll, 1965; Carbonell, 1978; Sack, 1994). [sent-352, score-0.452]
92 These early works model ideology at a sophisticated level, involving the actors, actions and goals; they require manually constructed knowledge bases. [sent-353, score-0.42]
93 Poole and Rosenthal (1985) used congressional roll call data to demonstrate the ideological divide in Congress, and provided a methodology for measuring ideological positions. [sent-354, score-0.95]
94 Gerrish and Blei (201 1; 2012) augmented the methodology with text from congressional bills using probabilistic models to uncover lawmakers’ positions on specific political issues, putting them on a left-right spectrum, while Thomas et al. [sent-355, score-0.338]
95 , 2009; Gentzkow and Shapiro, 2010); determine the political leanings of a document or author (Laver et al. [sent-359, score-0.258]
96 7 Conclusions We introduced CLIP, a domain-informed, Bayesian model of ideological proportions in political language. [sent-364, score-0.803]
97 We showed how ideological cues could be discovered from a lightly labeled corpus of ideological writings, then incorporated into CLIP. [sent-365, score-0.947]
98 Extracting policy positions from political texts using words as data. [sent-505, score-0.293]
99 A joint topic and perspective model for ideological discourse. [sent-509, score-0.42]
100 A preliminary investigation into sentiment analysis of informal political discourse. [sent-522, score-0.258]
wordName wordTfidf (topN-words)
[('ideological', 0.42), ('ideology', 0.42), ('clip', 0.313), ('ideologies', 0.307), ('political', 0.258), ('cue', 0.195), ('speeches', 0.183), ('elections', 0.127), ('proportions', 0.125), ('cues', 0.107), ('sage', 0.104), ('mccain', 0.091), ('obama', 0.089), ('lag', 0.083), ('candidates', 0.083), ('ri', 0.074), ('primary', 0.072), ('presidential', 0.069), ('hypotheses', 0.068), ('magazine', 0.068), ('romney', 0.068), ('election', 0.067), ('roll', 0.065), ('transition', 0.064), ('le', 0.063), ('sj', 0.063), ('restart', 0.058), ('poole', 0.057), ('gentzkow', 0.052), ('nores', 0.052), ('walk', 0.052), ('books', 0.049), ('politics', 0.046), ('bac', 0.045), ('congressional', 0.045), ('writings', 0.045), ('effects', 0.045), ('topology', 0.041), ('republican', 0.041), ('gerrish', 0.041), ('lags', 0.041), ('hmm', 0.04), ('gro', 0.039), ('huntsman', 0.039), ('laver', 0.039), ('legislative', 0.039), ('monroe', 0.039), ('republicans', 0.039), ('timestep', 0.039), ('candidate', 0.037), ('markovian', 0.036), ('campaign', 0.036), ('si', 0.036), ('tree', 0.036), ('positions', 0.035), ('magazines', 0.034), ('mitt', 0.034), ('parties', 0.034), ('lt', 0.033), ('attributes', 0.033), ('state', 0.032), ('term', 0.032), ('beliefs', 0.032), ('chapters', 0.031), ('rosenthal', 0.031), ('hart', 0.031), ('inferences', 0.031), ('ro', 0.03), ('emission', 0.029), ('supplementary', 0.029), ('infer', 0.028), ('lexicons', 0.028), ('vertex', 0.028), ('talk', 0.026), ('book', 0.026), ('abelson', 0.026), ('burt', 0.026), ('chapel', 0.026), ('deirmeier', 0.026), ('gingrich', 0.026), ('hillard', 0.026), ('ideologically', 0.026), ('legislation', 0.026), ('perry', 0.026), ('positioning', 0.026), ('preregistered', 0.026), ('ptree', 0.026), ('restarting', 0.026), ('rick', 0.026), ('roderick', 0.026), ('shapiro', 0.026), ('spends', 0.026), ('wapmi', 0.026), ('issues', 0.026), ('deviations', 0.025), ('barack', 0.025), ('economic', 0.025), ('topics', 0.025), ('moderate', 0.024), ('spent', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches
Author: Yanchuan Sim ; Brice D. L. Acree ; Justin H. Gross ; Noah A. Smith
Abstract: We seek to measure political candidates’ ideological positioning from their speeches. To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. We then represent the speeches of U.S. Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. The results are validated against a set of preregistered, domain expertauthored hypotheses.
2 0.17094287 121 emnlp-2013-Learning Topics and Positions from Debatepedia
Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith
Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.
3 0.05454208 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
Author: Zhengyan He ; Shujie Liu ; Yang Song ; Mu Li ; Ming Zhou ; Houfeng Wang
Abstract: Entity disambiguation works by linking ambiguous mentions in text to their corresponding real-world entities in knowledge base. Recent collective disambiguation methods enforce coherence among contextual decisions at the cost of non-trivial inference processes. We propose a fast collective disambiguation approach based on stacking. First, we train a local predictor g0 with learning to rank as base learner, to generate initial ranking list of candidates. Second, top k candidates of related instances are searched for constructing expressive global coherence features. A global predictor g1 is trained in the augmented feature space and stacking is employed to tackle the train/test mismatch problem. The proposed method is fast and easy to implement. Experiments show its effectiveness over various algorithms on several public datasets. By learning a rich semantic relatedness measure be- . tween entity categories and context document, performance is further improved.
4 0.053975709 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
Author: Bowei Zou ; Guodong Zhou ; Qiaoming Zhu
Abstract: Scope detection is a key task in information extraction. This paper proposes a new approach for tree kernel-based scope detection by using the structured syntactic parse information. In addition, we have explored the way of selecting compatible features for different part-of-speech cues. Experiments on the BioScope corpus show that both constituent and dependency structured syntactic parse features have the advantage in capturing the potential relationships between cues and their scopes. Compared with the state of the art scope detection systems, our system achieves substantial improvement.
5 0.046025123 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication
Author: Byron C. Wallace ; Thomas A Trikalinos ; M. Barton Laws ; Ira B. Wilson ; Eugene Charniak
Abstract: We develop a novel generative model of conversation that jointly captures both the topical content and the speech act type associated with each utterance. Our model expresses both token emission and state transition probabilities as log-linear functions of separate components corresponding to topics and speech acts (and their interactions). We apply this model to a dataset comprising annotated patient-physician visits and show that the proposed joint approach outperforms a baseline univariate model.
6 0.042654317 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels
7 0.042013839 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
8 0.040582739 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
9 0.040417768 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
10 0.039796088 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
11 0.038717493 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
12 0.03775863 160 emnlp-2013-Relational Inference for Wikification
13 0.037309684 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
14 0.036466084 89 emnlp-2013-Gender Inference of Twitter Users in Non-English Contexts
15 0.036071531 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
16 0.035007626 9 emnlp-2013-A Log-Linear Model for Unsupervised Text Normalization
17 0.034947019 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability
18 0.033978861 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering
19 0.033976678 147 emnlp-2013-Optimized Event Storyline Generation based on Mixture-Event-Aspect Model
20 0.032719359 66 emnlp-2013-Dynamic Feature Selection for Dependency Parsing
topicId topicWeight
[(0, -0.133), (1, 0.028), (2, -0.049), (3, -0.011), (4, -0.011), (5, -0.01), (6, 0.024), (7, 0.003), (8, 0.02), (9, -0.03), (10, -0.04), (11, -0.036), (12, -0.068), (13, 0.092), (14, 0.025), (15, 0.038), (16, 0.079), (17, 0.027), (18, 0.002), (19, -0.01), (20, 0.018), (21, 0.049), (22, -0.038), (23, 0.063), (24, 0.037), (25, 0.016), (26, 0.071), (27, -0.005), (28, -0.018), (29, 0.006), (30, 0.031), (31, -0.0), (32, -0.103), (33, 0.049), (34, 0.057), (35, -0.088), (36, 0.144), (37, -0.08), (38, -0.029), (39, -0.042), (40, -0.182), (41, 0.014), (42, 0.077), (43, -0.143), (44, 0.095), (45, 0.313), (46, 0.141), (47, -0.079), (48, 0.105), (49, 0.103)]
simIndex simValue paperId paperTitle
same-paper 1 0.94245905 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches
Author: Yanchuan Sim ; Brice D. L. Acree ; Justin H. Gross ; Noah A. Smith
Abstract: We seek to measure political candidates’ ideological positioning from their speeches. To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. We then represent the speeches of U.S. Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. The results are validated against a set of preregistered, domain expertauthored hypotheses.
2 0.67382038 121 emnlp-2013-Learning Topics and Positions from Debatepedia
Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith
Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.
3 0.44032943 26 emnlp-2013-Assembling the Kazakh Language Corpus
Author: Olzhas Makhambetov ; Aibek Makazhanov ; Zhandos Yessenbayev ; Bakhyt Matkarimov ; Islam Sabyrgaliyev ; Anuar Sharafudinov
Abstract: This paper presents the Kazakh Language Corpus (KLC), which is one of the first attempts made within a local research community to assemble a Kazakh corpus. KLC is designed to be a large scale corpus containing over 135 million words and conveying five stylistic genres: literary, publicistic, official, scientific and informal. Along with its primary part KLC comprises such parts as: (i) annotated sub-corpus, containing segmented documents encoded in the eXtensible Markup Language (XML) that marks complete morphological, syntactic, and structural characteristics of texts; (ii) as well as a sub-corpus with the annotated speech data. KLC has a web-based corpus management system that helps to navigate the data and retrieve necessary information. KLC is also open for contributors, who are willing to make suggestions, donate texts and help with annotation of existing materials.
4 0.4326916 133 emnlp-2013-Modeling Scientific Impact with Topical Influence Regression
Author: James Foulds ; Padhraic Smyth
Abstract: When reviewing scientific literature, it would be useful to have automatic tools that identify the most influential scientific articles as well as how ideas propagate between articles. In this context, this paper introduces topical influence, a quantitative measure of the extent to which an article tends to spread its topics to the articles that cite it. Given the text of the articles and their citation graph, we show how to learn a probabilistic model to recover both the degree of topical influence of each article and the influence relationships between articles. Experimental results on corpora from two well-known computer science conferences are used to illustrate and validate the proposed approach.
5 0.40455586 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations
Author: Silvia Pareti ; Tim O'Keefe ; Ioannis Konstas ; James R. Curran ; Irena Koprinska
Abstract: Direct quotations are used for opinion mining and information extraction as they have an easy to extract span and they can be attributed to a speaker with high accuracy. However, simply focusing on direct quotations ignores around half of all reported speech, which is in the form of indirect or mixed speech. This work presents the first large-scale experiments in indirect and mixed quotation extraction and attribution. We propose two methods of extracting all quote types from news articles and evaluate them on two large annotated corpora, one of which is a contribution of this work. We further show that direct quotation attribution methods can be successfully applied to indirect and mixed quotation attribution.
6 0.36657807 100 emnlp-2013-Improvements to the Bayesian Topic N-Gram Models
7 0.36169049 6 emnlp-2013-A Generative Joint, Additive, Sequential Model of Topics and Speech Acts in Patient-Doctor Communication
8 0.31670907 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries
9 0.30540362 178 emnlp-2013-Success with Style: Using Writing Style to Predict the Success of Novels
10 0.2961196 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
11 0.2754721 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!
12 0.27130026 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
13 0.26881698 54 emnlp-2013-Decipherment with a Million Random Restarts
14 0.26071903 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
15 0.25179476 101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization
16 0.24040337 18 emnlp-2013-A temporal model of text periodicities using Gaussian Processes
17 0.2304154 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
18 0.22783504 188 emnlp-2013-Tree Kernel-based Negation and Speculation Scope Detection with Structured Syntactic Parse Features
19 0.22563669 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types
20 0.22543646 46 emnlp-2013-Classifying Message Board Posts with an Extracted Lexicon of Patient Attributes
topicId topicWeight
[(3, 0.048), (9, 0.023), (18, 0.029), (22, 0.039), (30, 0.073), (31, 0.028), (36, 0.01), (50, 0.011), (51, 0.151), (52, 0.011), (53, 0.32), (66, 0.041), (71, 0.035), (75, 0.048), (77, 0.011), (90, 0.013), (96, 0.022)]
simIndex simValue paperId paperTitle
1 0.81198317 54 emnlp-2013-Decipherment with a Million Random Restarts
Author: Taylor Berg-Kirkpatrick ; Dan Klein
Abstract: This paper investigates the utility and effect of running numerous random restarts when using EM to attack decipherment problems. We find that simple decipherment models are able to crack homophonic substitution ciphers with high accuracy if a large number of random restarts are used but almost completely fail with only a few random restarts. For particularly difficult homophonic ciphers, we find that big gains in accuracy are to be had by running upwards of 100K random restarts, which we accomplish efficiently using a GPU-based parallel implementation. We run a series of experiments using millions of random restarts in order to investigate other empirical properties of decipherment problems, including the famously uncracked Zodiac 340.
same-paper 2 0.7989502 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches
Author: Yanchuan Sim ; Brice D. L. Acree ; Justin H. Gross ; Noah A. Smith
Abstract: We seek to measure political candidates’ ideological positioning from their speeches. To accomplish this, we infer ideological cues from a corpus of political writings annotated with known ideologies. We then represent the speeches of U.S. Presidential candidates as sequences of cues and lags (filler distinguished only by its length in words). We apply a domain-informed Bayesian HMM to infer the proportions of ideologies each candidate uses in each campaign. The results are validated against a set of preregistered, domain expertauthored hypotheses.
3 0.77320278 185 emnlp-2013-Towards Situated Dialogue: Revisiting Referring Expression Generation
Author: Rui Fang ; Changsong Liu ; Lanbo She ; Joyce Y. Chai
Abstract: In situated dialogue, humans and agents have mismatched capabilities of perceiving the shared environment. Their representations of the shared world are misaligned. Thus referring expression generation (REG) will need to take this discrepancy into consideration. To address this issue, we developed a hypergraph-based approach to account for group-based spatial relations and uncertainties in perceiving the environment. Our empirical results have shown that this approach outperforms a previous graph-based approach with an absolute gain of 9%. However, while these graph-based approaches perform effectively when the agent has perfect knowledge or perception of the environment (e.g., 84%), they perform rather poorly when the agent has imperfect perception of the environment (e.g., 45%). This big performance gap calls for new solutions to REG that can mediate a shared perceptual basis in situated dialogue.
4 0.71362489 26 emnlp-2013-Assembling the Kazakh Language Corpus
Author: Olzhas Makhambetov ; Aibek Makazhanov ; Zhandos Yessenbayev ; Bakhyt Matkarimov ; Islam Sabyrgaliyev ; Anuar Sharafudinov
Abstract: This paper presents the Kazakh Language Corpus (KLC), which is one of the first attempts made within a local research community to assemble a Kazakh corpus. KLC is designed to be a large scale corpus containing over 135 million words and conveying five stylistic genres: literary, publicistic, official, scientific and informal. Along with its primary part KLC comprises such parts as: (i) annotated sub-corpus, containing segmented documents encoded in the eXtensible Markup Language (XML) that marks complete morphological, syntactic, and structural characteristics of texts; (ii) as well as a sub-corpus with the annotated speech data. KLC has a web-based corpus management system that helps to navigate the data and retrieve necessary information. KLC is also open for contributors, who are willing to make suggestions, donate texts and help with annotation of existing materials.
5 0.53742999 121 emnlp-2013-Learning Topics and Positions from Debatepedia
Author: Swapna Gottipati ; Minghui Qiu ; Yanchuan Sim ; Jing Jiang ; Noah A. Smith
Abstract: We explore Debatepedia, a communityauthored encyclopedia of sociopolitical debates, as evidence for inferring a lowdimensional, human-interpretable representation in the domain of issues and positions. We introduce a generative model positing latent topics and cross-cutting positions that gives special treatment to person mentions and opinion words. We evaluate the resulting representation’s usefulness in attaching opinionated documents to arguments and its consistency with human judgments about positions.
6 0.50514954 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
7 0.50405252 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
8 0.50090563 149 emnlp-2013-Overcoming the Lack of Parallel Data in Sentence Compression
9 0.49869221 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
10 0.49776605 143 emnlp-2013-Open Domain Targeted Sentiment
11 0.49764621 36 emnlp-2013-Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach
12 0.4967941 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
13 0.49638939 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
14 0.49594399 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
15 0.49521863 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction
16 0.49480423 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech
17 0.49438289 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
18 0.49408373 152 emnlp-2013-Predicting the Presence of Discourse Connectives
19 0.49385506 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
20 0.49304265 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction