emnlp emnlp2012 emnlp2012-117 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amit Goyal ; Hal Daume III ; Graham Cormode
Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.
Reference: text
sentIndex sentText sentNum sentScore
1 We describe 10 sketch methods, including existing and novel variants. [sent-5, score-0.513]
2 We evaluate several sketches on three important NLP problems. [sent-7, score-0.511]
3 Our experiments show that one sketch performs best for all the three tasks. [sent-8, score-0.513]
4 For example, storing approximate counts (Talbot and Osborne, 2007; Van Durme and Lall, 2009a; Goyal and Daum e´ III, 2011a), computing approximate as1093 Graham Cormode AT&T; Labs–Research graham@ re search att . [sent-16, score-0.238]
5 All these problems ultimately depend on approximate counts of items (such as n-grams, word pairs and word-context pairs). [sent-22, score-0.27]
6 Recently in NLP, we (Goyal and Daum e´ III, 2011a) demonstrated that a version of the Count-Min sketch (Cormode and Muthukrishnan, 2004) accurately solves three largescale NLP problems using small bounded memory footprint. [sent-26, score-0.513]
7 However, there are several other sketch algorithms, and it is not clear why this instance should be preferred amongst these. [sent-27, score-0.513]
8 In this work, we conduct a systematic study and compare many sketch techniques which answer point queries with focus on large-scale NLP tasks. [sent-28, score-0.538]
9 While sketches have been evaluated within the database community for finding frequent items (Cormode and Hadjieleftheriou, 2008) and join-size estimation (Rusu and Dobra, 2007), this is the first comparative study for NLP problems. [sent-29, score-0.564]
10 Our work includes three contributions: (1) We propose novel variants of existing sketches by extending the idea of conservative update to them. [sent-30, score-0.871]
11 , 2004) with conservative update (COUNT-CU) and Count-mean-min sketch with conservative update LParnogcue agdein Lgesa ornf tihneg, 2 p0a1g2e Jso 1in09t C3–o1n1f0e3re,n Jce ju on Is Elanmdp,ir Kicoarlea M,e 1t2h–o1d4s J iunly N 2a0tu1r2a. [sent-32, score-1.185]
12 The motivation behind proposing new sketches is inspired by the success of Count-Min sketch with conservative update in our earlier work (Goyal and Daum e´ III, 2011a). [sent-35, score-1.36]
13 (3) We use sketches to solve three important NLP problems. [sent-39, score-0.511]
14 Our experiments show that sketches can be very ef- fective for these tasks, and that the best results are obtained using the “conservative update” technique. [sent-40, score-0.511]
15 Across all the three tasks, one sketch (CM-CU) performs best. [sent-41, score-0.513]
16 2 Sketches In this section, we review existing sketch algorithms from the literature, and propose novel variants based on the idea of conservative update (Estan and Varghese, 2002). [sent-42, score-0.873]
17 xN), each item x (where x is drawn from some domain U) is mapped via hash functions into a small sketch vector that records frequency information. [sent-47, score-0.791]
18 Thus, the sketch does not store the items explicitly, but only information about the frequency distribution. [sent-48, score-0.687]
19 Twho-ed depth ido ndaeln aortreasy t shke[ number of hash functions employed by the sketch algorithms. [sent-59, score-0.724]
20 Every sketch has two operations: UPDATE and QUERY to update and estimate the count of an item. [sent-61, score-0.719]
21 Moreover, sketches can be combined: given two sketches s1 and s2 computed (using the same parameters w and d, and same set of d hash functions) over different inputs, a sketch of the combined input is obtained by adding the individual sketches, entry-wise. [sent-63, score-1.71]
22 The time to perform the COMBINE operation on sketches is O(d w), independent ofo othpeer adtaitoan si ozne. [sent-64, score-0.511]
23 s kTehticsh property de n×ab wle)s, sketches to be implemented in distributed setting, × where each machine computes the sketch over a small portion of the corpus and makes it scalable to large datasets. [sent-65, score-1.024]
24 1 Existing sketch algorithms This section describes sketches from the literature: Count-Min sketch (CM): The CM (Cormode and Muthukrishnan, 2004) sketch has been used effectively for many large scale problems across several areas, such as Security (Schechter et al. [sent-67, score-2.05]
25 The sketch stores an array of size d w counters, along with d shtaosrhe sfu annc atrioranys (drawn dfr ×om w a pairwise-independent family), one for each row of the array. [sent-71, score-0.6]
26 UPDATE: For each new item “x” with count c, the sketch is updated as: sk[k, hk (x)] ← sk[k, hk (x)] + c, ∀1 ≤ k ≤ d. [sent-80, score-0.801]
27 Count-mean-min (CMM): The motivation behind the CMM (Deng and Rafiei, 2007) sketch is to provide an unbiased estimator for Count-Min (CM) sketch. [sent-95, score-0.513]
28 The construction of CMM sketch is identical to the CM sketch, while the QUERY procedure differs. [sent-96, score-0.513]
29 value over the d counters (indexed by d hash functions), CMM deducts the value of estimated noise from each of the d counters, and return the median of the d residues. [sent-98, score-0.282]
30 d residues can overestimate more than the original CM sketch min estimate (fˆ2), so we return min (fˆ1,fˆ2) as the final estimate for CMM sketch. [sent-101, score-0.565]
31 CMM gives the same theoretical guarantees as Count sketch (below). [sent-102, score-0.513]
32 , 2004): COUNT (aka Fast-AGMS) keeps two hash functions for each row, hk maps items onto [1, w] , and gk maps items onto {−1,+1}. [sent-104, score-0.473]
33 unt c: sk[k, hk (x)] ← sk[k, hk (x)] + c · gk (x) , ∀1 ≤ k ≤ d. [sent-106, score-0.269]
34 2 Conservative Update sketch algorithms In this section, we propose novel variants of existing ? [sent-115, score-0.537]
35 (PiN=1 fi2)1/2 sketches (see Section 2) by combining them with the conservative update process (Estan and Varghese, 2002). [sent-116, score-0.847]
36 The idea of conservative update (also known as Minimal Increase (Cohen and Matias, 2003)) is to only increase counts in the sketch by the minimum amount needed to ensure the estimate remains accurate. [sent-117, score-0.975]
37 It can easily be applied to Count-Min (CM) sketch and Spectral Bloom Filters (SBF) to further improve the estimate of a point query. [sent-118, score-0.513]
38 Goyal and Daum e´ III (201 1a) showed that CM sketch with 1095 conservative update reduces the amount of overestimation error by a factor of at least 1. [sent-119, score-0.961]
39 Note that while conservative update for CM and SBF never increases the error, there is no guaranteed improvement. [sent-121, score-0.336]
40 When a large corpus is being summarized in a distributed setting, we can apply conservative update on each sketch independently before combining the sketches together (see “Sketch Operations” in Section 2). [sent-123, score-1.36]
41 Count-Min sketch with conservative update × (CM-CU): The QUERY procedure for CM-CU (Cormode, 2009; Goyal and Daum e´ III, 2011a) is identical to Count-Min. [sent-124, score-0.849]
42 However, to UPDATE an item “x” with frequency c, we first compute the frequency c(x) of this item from the existing data structure (∀1 ≤ k ≤ d, c(x) = mink sk[k, hk (x)]) and tthuree ec o(∀u1nts ≤ are updated according to: sk[k, hk (x)] ← max{sk[k, hk (x)] , c(x) + c} (∗). [sent-125, score-0.473]
43 Spectral Bloom Filters with conservative update (SBF-CU): The QUERY procedure for SBF-CU (Cohen and Matias, 2003) is identical to SBF. [sent-128, score-0.336]
44 ith conservative update (CMM-CU): We propose a new variant to reduce the over-estimation error for CMM sketch. [sent-131, score-0.411]
45 However, due to conservative update, each row of the sketch is not updated for every updPate, hence the sum of counts over each row (Pi sk[k,i], ∀1 ≤ k ≤ d) is not equal to inpPut size N. [sent-133, score-0.911]
46 Count sketch with conservative update (COUNTCU): We propose a new variant to reduce overestimation error for the COUNT sketch. [sent-136, score-0.961]
47 Lossy counting with conservative update (LCUWS): LCU-WS (Goyal and Daum e´ III, 2011b) was proposed to reduce the amount of over-estimation error for CM-CU sketch, without incurring too much under-estimation error. [sent-142, score-0.445]
48 The size of each window is equal to size of the sketch i. [sent-145, score-0.61]
49 vNe update II (LCU-SWS): This is a variant of the previous scheme, where the counts of the sketch are decreased more conservatively. [sent-159, score-0.783]
50 Since all sketches have probabilistic bounds, we report average results over these 10 chunks. [sent-171, score-0.511]
51 To compare error in various sketch counts, first we compute the exact counts of all the word pairs. [sent-175, score-0.764]
52 Third, we query sketches to generate approximate counts of all the word pairs. [sent-177, score-0.748]
53 Recall, we do not store the word pairs explicitly in sketches but only a compact summary of the associated counts. [sent-178, score-0.6]
54 20×3106 We fix the size of each sketch to be w = and d = 3. [sent-179, score-0.543]
55 We keep the size of sketches equal to allow fair comparison among them. [sent-180, score-0.541]
56 Prior work (Deng and Rafiei, 2007; Goyal and Daum e´ III, 2011a) showed with fixed sketch size, a small number of hash functions (d=number of hash functions) with large w (or range) give rise to small error over counts. [sent-181, score-0.974]
57 Because different sketches have different accuracy behavior on low, mid, and high frequency counts, making this distinction based on frequency lets us determine the regions in which different sketches perform best. [sent-184, score-1.156]
58 Since various sketches result in different errors on low, mid, and high frequency counts, we plot the results with a linear error scale (Fig. [sent-188, score-0.703]
59 (1) Count-Min (CM) and 8 CM CM−CU (a) Focusing on low frequency counts (b) Focusing on mid and high frequency counts Figure 1: = 20×3106 = Comparing several sketches for input size of 75 million word pairs. [sent-193, score-1.091]
60 All items with same exact count are put in one bucket and we plot Mean Relative Error on the y-axis with exact counts on the x-axis. [sent-195, score-0.366]
61 Using conservative update with CM (CM-CU) and SBF (SBF-CU) reduces the MRE by a factor of 1. [sent-197, score-0.336]
62 (2) COUNT has better MRE than CM-CU and using conservative update with COUNT (COUNT-CU) further reduces the MRE. [sent-200, score-0.336]
63 (3) CMM has better MRE than COUNT and using conservative update with CMM (CMM-CU) further reduces the MRE. [sent-201, score-0.336]
64 (4) Lossy counting with conservative update variants (LCU-SWS, LCU-WS) have comparable MRE to COUNT-CU and CMMCU respectively. [sent-202, score-0.394]
65 From Figure 1(b), we observe that, CM, COUNT, COUNT-CU, CMM, CMM-CU sketches have worse MRE than CM-CU, LCUSWS, and LCU-WS for mid and high frequency counts. [sent-206, score-0.685]
66 For NLP tasks where we can allow error on mid and high frequency counts but not on low frequency 1097 2 eErv 1 ro 1. [sent-210, score-0.466]
67 1r50uefrquncy10ountsfwo1rdL0Cp2Ua−iWrsS(loUgEs1c0a3le) Figure 2: Compare several sketches on over-estimation and under-estimation errors with respect to exact counts. [sent-212, score-0.586]
68 Hence we breakdown the error into over-estimation (OE) and under-estimation (UE) errors for the six best-performing sketches (COUNT, COUNT-CU, CMM, CMM-CU, LCU-SWS, and LCU-WS). [sent-216, score-0.611]
69 (2) LCU-WS has less over-estimation than LCU-SWS but with more under-estimation error on mid frequency counts. [sent-228, score-0.249]
70 In this experiment, we compare the word pairs association rankings obtained using PMI and LLR from several sketches and exact word pair counts. [sent-236, score-0.641]
71 In Figure 3(a), we compute the recall for CMCU, COUNT-CU, CMM-CU, LCU-SWS, and LCU-WS sketches at several top-K thresholds of word pairs for approximate PMI ranking. [sent-238, score-0.602]
72 For top-K values less than 750, all sketches except COUNT-CU have comparable recall. [sent-241, score-0.511]
73 The is because PMI is sensitive to low frequency counts (Church and Hanks, 1989), over-estimation of the counts of low frequency word pairs can make their approximate PMI scores worse. [sent-243, score-0.525]
74 For top-K values less than 1000, all the sketches have comparable recall. [sent-245, score-0.511]
75 1 Experimental Setup We study three important NLP applications, and compare the three best-performing sketches: CountMin sketch with conservative update (CM-CU), Count-mean-min with conservative update (CMMCU), and Lossy counting with conservative update (LCU-WS). [sent-255, score-1.555]
76 The above mentioned 3 sketches are selected from 10 sketches (see Section 2) considering these sketches make errors on different ranges of the counts: low, mid and, high frequency counts as seen in our intrinsic evaluations in Section 3. [sent-256, score-1.858]
77 The goal of this experiment is to show the effectiveness of sketches on large-scale language processing tasks. [sent-257, score-0.511]
78 This allows us to have a fair comparison among different sketches, and to more directly see the impact of different choices of sketch on the task outcome. [sent-260, score-0.513]
79 Of course, sketches are still broadly applicable to many NLP problems where we want to count (many) items or compute associations: e. [sent-261, score-0.626]
80 To evaluate against the exact counts, we compute exact counts for only those word pairs that are present in the test sets. [sent-293, score-0.261]
81 4, we plot the cumulative proportion of true frequency counts of all word pairs (from the three tests) in Gigaword (GW). [sent-296, score-0.253]
82 In Table 1, we vary the size of all sketches (50 million (M), 100M, 200M, 500M and 1 billion (1B) counters) with 3 hash functions to compare them against the exact counts. [sent-301, score-0.865]
83 8 GB uncompressed space to maintain the exact counts on the disk. [sent-303, score-0.243]
84 Table 1 shows that with sketches of size > 200M on all the three test sets, CM-CU and LCU-WS are comparable to exact. [sent-304, score-0.541]
85 93WS Table 2: Evaluating Semantic Orientation on accuracy metric using several sketches of 2 billion counters against exact. [sent-314, score-0.648]
86 We fix the size of all sketches to 2 billion (2B) counters with 5 hash functions. [sent-331, score-0.853]
87 We store exact counts of all words in a hash table for both GW and GWB50. [sent-332, score-0.405]
88 Next, we compare the sketches against the exact counts over two different size corpora. [sent-358, score-0.717]
89 8B word pair types (It takes 16 GB uncompressed disk space to store exact counts of all the unique word pair types. [sent-373, score-0.267]
90 First, we store exact counts of all words and contexts in a hash table from GW. [sent-395, score-0.405]
91 6 GB uncompressed disk space to store exact counts of all these unique word-context pair types. [sent-400, score-0.267]
92 In Table 3, we vary the size of all sketches across 10 million (M), 50M, and 200M counters with 3 hash functions. [sent-404, score-0.856]
93 5 Conclusion In this work, we systematically studied the problem of estimating point queries using different sketch algorithms. [sent-414, score-0.538]
94 As far as we know, this represents the first comparative study to demonstrate the relative behavior of sketches in the context of NLP applications. [sent-415, score-0.511]
95 We proposed two novel sketch variants: Count sketch (Charikar et al. [sent-416, score-1.026]
96 , 2004) with conservative update (COUNT-CU) and Count-mean-min sketch with conservative update (CMM-CU). [sent-417, score-1.185]
97 An improved data stream summary: The count-min sketch and its applications. [sent-481, score-0.542]
98 Approximate scalable bounded space sketch for large data NLP. [sent-529, score-0.513]
99 Lossy conservative update (LCU) sketch: Succinct approximate count storage. [sent-533, score-0.454]
100 One sketch for all: Theory and application of conditional random sampling. [sent-555, score-0.513]
wordName wordTfidf (topN-words)
[('sketch', 0.513), ('sketches', 0.511), ('conservative', 0.192), ('hash', 0.175), ('cmm', 0.161), ('sk', 0.158), ('update', 0.144), ('goyal', 0.135), ('mre', 0.128), ('counts', 0.126), ('sbf', 0.124), ('hk', 0.113), ('llr', 0.112), ('counters', 0.107), ('mid', 0.107), ('cm', 0.104), ('pmi', 0.1), ('gw', 0.095), ('cormode', 0.077), ('error', 0.075), ('gb', 0.075), ('lossy', 0.074), ('daum', 0.073), ('frequency', 0.067), ('bloom', 0.067), ('count', 0.062), ('approximate', 0.056), ('query', 0.055), ('durme', 0.055), ('store', 0.054), ('items', 0.053), ('selectional', 0.053), ('exact', 0.05), ('underestimation', 0.05), ('nlp', 0.049), ('charikar', 0.048), ('distributional', 0.048), ('streaming', 0.045), ('graham', 0.045), ('gigaword', 0.045), ('rankings', 0.045), ('iii', 0.044), ('gk', 0.043), ('talbot', 0.041), ('orientation', 0.041), ('pointwise', 0.041), ('spectral', 0.039), ('cu', 0.039), ('lall', 0.039), ('filters', 0.038), ('cmmcu', 0.037), ('lcuws', 0.037), ('overestimation', 0.037), ('uncompressed', 0.037), ('window', 0.037), ('functions', 0.036), ('turney', 0.036), ('pairs', 0.035), ('counting', 0.034), ('million', 0.033), ('inquirer', 0.032), ('estan', 0.032), ('array', 0.032), ('size', 0.03), ('billion', 0.03), ('chambers', 0.03), ('maintain', 0.03), ('ashwin', 0.029), ('stream', 0.029), ('mutual', 0.028), ('amit', 0.028), ('overestimate', 0.027), ('randomized', 0.026), ('errors', 0.025), ('ue', 0.025), ('matias', 0.025), ('oe', 0.025), ('deng', 0.025), ('littman', 0.025), ('plot', 0.025), ('aggarwal', 0.025), ('bytes', 0.025), ('countcu', 0.025), ('countmin', 0.025), ('dwork', 0.025), ('lcusws', 0.025), ('mediank', 0.025), ('rafiei', 0.025), ('residues', 0.025), ('schechter', 0.025), ('sing', 0.025), ('swtucw', 0.025), ('usenix', 0.025), ('row', 0.025), ('queries', 0.025), ('hal', 0.025), ('low', 0.024), ('van', 0.024), ('log', 0.024), ('variants', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP
Author: Amit Goyal ; Hal Daume III ; Graham Cormode
Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.
2 0.4077259 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
Author: Amit Goyal ; Hal Daume III ; Raul Guerra
Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.
3 0.11363214 120 emnlp-2012-Streaming Analysis of Discourse Participants
Author: Benjamin Van Durme
Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.
4 0.064904816 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
Author: Paramveer Dhillon ; Jordan Rodu ; Michael Collins ; Dean Foster ; Lyle Ungar
Abstract: Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-theart parser. Our approach gives us a moderate reduction in error of up to 4.6% over the baseline re-ranker. .
5 0.049297668 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
6 0.043318871 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence
7 0.040736713 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
8 0.038592793 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
9 0.037252039 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
10 0.03691671 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
11 0.036578789 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
12 0.035408311 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
13 0.035263136 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
14 0.035027821 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition
15 0.033931542 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
16 0.03302389 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
17 0.030943144 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
18 0.030790139 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
19 0.030596625 55 emnlp-2012-Forest Reranking through Subtree Ranking
20 0.030349787 97 emnlp-2012-Natural Language Questions for the Web of Data
topicId topicWeight
[(0, 0.136), (1, -0.002), (2, -0.006), (3, 0.083), (4, 0.02), (5, -0.012), (6, 0.123), (7, 0.054), (8, 0.269), (9, 0.168), (10, -0.083), (11, -0.477), (12, 0.102), (13, 0.085), (14, -0.123), (15, 0.354), (16, -0.062), (17, -0.059), (18, -0.032), (19, -0.01), (20, 0.244), (21, 0.1), (22, 0.021), (23, 0.057), (24, -0.05), (25, -0.085), (26, 0.049), (27, -0.017), (28, -0.033), (29, -0.107), (30, -0.014), (31, -0.086), (32, -0.107), (33, 0.022), (34, -0.009), (35, -0.014), (36, 0.035), (37, -0.047), (38, 0.017), (39, 0.024), (40, 0.001), (41, -0.035), (42, -0.058), (43, -0.018), (44, -0.007), (45, -0.036), (46, 0.03), (47, 0.005), (48, 0.02), (49, 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.96850938 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP
Author: Amit Goyal ; Hal Daume III ; Graham Cormode
Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.
2 0.91210485 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
Author: Amit Goyal ; Hal Daume III ; Raul Guerra
Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.
3 0.38008898 120 emnlp-2012-Streaming Analysis of Discourse Participants
Author: Benjamin Van Durme
Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.
4 0.19620442 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
Author: Paramveer Dhillon ; Jordan Rodu ; Michael Collins ; Dean Foster ; Lyle Ungar
Abstract: Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-theart parser. Our approach gives us a moderate reduction in error of up to 4.6% over the baseline re-ranker. .
5 0.17613478 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt
Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.
6 0.16975044 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context
7 0.16456179 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
8 0.15392932 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation
9 0.1327558 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
10 0.12723531 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
11 0.12487263 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions
12 0.11723246 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion
13 0.11633012 80 emnlp-2012-Learning Verb Inference Rules from Linguistically-Motivated Evidence
14 0.11476361 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts
15 0.11207459 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings
16 0.11153286 86 emnlp-2012-Locally Training the Log-Linear Model for SMT
17 0.10741748 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
18 0.10691071 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules
19 0.10514393 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
20 0.10350395 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
topicId topicWeight
[(2, 0.019), (6, 0.332), (8, 0.032), (9, 0.027), (14, 0.017), (16, 0.019), (25, 0.014), (34, 0.081), (45, 0.015), (60, 0.099), (63, 0.027), (64, 0.019), (65, 0.024), (70, 0.017), (74, 0.038), (76, 0.045), (80, 0.022), (86, 0.02), (95, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.75816089 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP
Author: Amit Goyal ; Hal Daume III ; Graham Cormode
Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.
2 0.6804353 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media
Author: Bo Pang ; Sujith Ravi
Abstract: The question “how predictable is English?” has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in con- versational scenarios, to better understand predictability of user generated English.
3 0.53115153 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP
Author: Amit Goyal ; Hal Daume III ; Raul Guerra
Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.
4 0.41619819 120 emnlp-2012-Streaming Analysis of Discourse Participants
Author: Benjamin Van Durme
Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.
5 0.40052798 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP
Author: Taylor Berg-Kirkpatrick ; David Burkett ; Dan Klein
Abstract: We investigate two aspects of the empirical behavior of paired significance tests for NLP systems. First, when one system appears to outperform another, how does significance level relate in practice to the magnitude of the gain, to the size of the test set, to the similarity of the systems, and so on? Is it true that for each task there is a gain which roughly implies significance? We explore these issues across a range of NLP tasks using both large collections of past systems’ outputs and variants of single systems. Next, once significance levels are computed, how well does the standard i.i.d. notion of significance hold up in practical settings where future distributions are neither independent nor identically distributed, such as across domains? We explore this question using a range of test set variations for constituency parsing.
6 0.39961383 24 emnlp-2012-Biased Representation Learning for Domain Adaptation
7 0.39950913 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?
8 0.39850384 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
9 0.39711925 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
10 0.39708501 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
11 0.39430535 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis
12 0.39414248 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation
13 0.39360499 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging
14 0.39287898 22 emnlp-2012-Automatically Constructing a Normalisation Dictionary for Microblogs
15 0.39252102 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
16 0.39229548 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
17 0.39188743 111 emnlp-2012-Regularized Interlingual Projections: Evaluation on Multilingual Transliteration
18 0.39111036 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
19 0.39107272 129 emnlp-2012-Type-Supervised Hidden Markov Models for Part-of-Speech Tagging with Incomplete Tag Dictionaries
20 0.39063373 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation