emnlp emnlp2012 emnlp2012-52 knowledge-graph by maker-knowledge-mining

52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP


Source: pdf

Author: Amit Goyal ; Hal Daume III ; Raul Guerra

Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). [sent-7, score-0.886]

2 The later two approaches construct nearest-neighbor graphs between word pairs by computing nearest neighbors between word pairs from large corpora. [sent-14, score-0.401]

3 These nearest neighbors form the edges of the graph, with weights given by the distributional similarity (Turney and Pantel, 2010) between terms. [sent-15, score-0.439]

4 (2009) compute distributional similarity between 500 million terms over a 200 billion words in 50 hours using 100 quad-core nodes, explicitly storing a similarity matrix between 500 million terms. [sent-23, score-0.213]

5 In this work, we propose Fast Large-Scale Approximate Graph (FLAG) construction, a system that constructs a fast large-scale approximate nearest-neighbor graph from a large text corpus. [sent-24, score-0.183]

6 (2005) approach stored an enormous matrix of all unique words and their contexts in main memory, which is infeasible for very large data sets. [sent-32, score-0.215]

7 A more efficient online framework to locality sensitive hashing (Van Durme and Lall, 2010; Van Durme and Lall, 2011) computes distributional similarity in a streaming setting. [sent-33, score-0.308]

8 Unfortunately, their approach can handle only additive features like raw-counts, and not non-linear association scores like pointwise mutual information (PMI), which generates better context vectors for distributional similarity (Ravichandran et al. [sent-34, score-0.237]

9 The main motivation for using the CM sketch comes from its linearity property (see last paragraph of Section 2) which makes CM sketch to be implemented in distributed setting for large data sets. [sent-43, score-0.851]

10 Our intrinsic and extrinsic experiments demonstrate the effectiveness of distributed onlinePMI. [sent-45, score-0.192]

11 After generating context vectors from distributed online-PMI algorithm, our goal is to use them to find fast approximate nearest neighbors for all words. [sent-46, score-0.718]

12 2 Count-Min sketch The Count-Min (CM) sketch (Cormode and Muthukrishnan, 2004) belongs to a class of ‘sketch’ algorithms that represents a large data set with a compact summary, typically much smaller than the full size of the input by processing the data in one pass. [sent-54, score-0.72]

13 The following surveys comprehensively review the streaming literature (Rusu and Dobra, 1070 2007; Cormode and Hadjieleftheriou, 2008) and sketch techniques (Charikar et al. [sent-55, score-0.401]

14 , 2012), we conducted a systematic study and compare many sketch techniques which answer point queries with focus on large-scale NLP tasks. [sent-59, score-0.345]

15 In that paper, we empirically demonstrated that CM sketch performs the best among all the sketches on three large-scale NLP tasks. [sent-60, score-0.436]

16 CM sketch uses hashing to store the approximate frequencies of all items from the large data set onto a small sketch vector that can be updated and queried in constant time. [sent-61, score-0.927]

17 The depth d denotes the number of pairwise-independent hash functions employed by the CM sketch; and the width w denotes the range of the hash functions. [sent-71, score-0.232]

18 UPDATE: For each new item “x” with count c, the sketch is updated as: sketch[k, hk (x)] ← sketch[k, hk (x)] +c, ∀1 ≤ k ≤ d. [sent-84, score-0.457]

19 Therefore, to answer the point query (QUERY (x)), CM returns the minimum over all the d positions indexed by the hash functions. [sent-86, score-0.218]

20 d ×Twhis) property enables sketches to be implemented in distributed setting, where each machine computes the sketch over a small portion of the corpus and makes it scalable to large datasets. [sent-94, score-0.552]

21 The idea ofconservative update (Estan and Varghese, 2002) is to only increase counts in the sketch by the minimum amount needed to ensure that the estimate remains accurate. [sent-95, score-0.411]

22 We (Goyal and Daum e´ III, 2011) used CM sketch with conservative update (CM-CU sketch) to show that the update reduces the amount of over-estimation error by a factor of at least 1. [sent-96, score-0.415]

23 However, to UPDATE an item “x” with frequency c, first we compute the frequency c(x) of this item from the existing data structure: (∀1 ≤ k ≤ d, c(x) = mink sketch[k, hk (x)]) and the counts are updated according to: sketch[k, hk (x)] ← max{sketch[k, hk (x)] , c(x) + c}. [sent-99, score-0.199]

24 3 FLAG: Fast Large-Scale Approximate Graph Construction We describe a system, FLAG, for generating a nearest neighbor graph from a large corpus. [sent-102, score-0.324]

25 For every node (word), our system returns top l approximate nearest neighbors, which implicitly defines the graph. [sent-103, score-0.323]

26 Second, we project all the words with context vector size d onto k random vectors and then binarize these random projection vectors (Section 3. [sent-112, score-0.337]

27 Fourth, using the output of variants of PLEB, we generate a small set of potential nearest neighbors for every word “z” (Section 3. [sent-116, score-0.361]

28 From this small set, we can compute the Hamming distance between every word “z” and its potential nearest neighbors to return the l nearest-neighbors for all unique words. [sent-118, score-0.388]

29 After one pass over the data set, it returns the context vectors for all query words. [sent-122, score-0.22]

30 The original online-PMI algorithm was used to find the top-d verbs for a query verb using the highest approximate onlinePMI values using a Talbot-Osborne-Morris-Bloom1 (TOMB) Counter (Van Durme and Lall, 2009a). [sent-123, score-0.206]

31 Unfortunately, this algorithm is prohibitively slow when computing contexts for all words, rather than just a small query set. [sent-124, score-0.193]

32 First, we use Count-Min with conservative update (CMCU) sketch (Goyal and Daum e´ III, 2011) instead of TOMB. [sent-127, score-0.38]

33 Second, we store the counts of words (“z”), contexts (“y”) and word-context pairs all together in 1TOMB is a variant of CM sketch which focuses on reducing the bit size of each counter (in addition to the number of counters) at the cost of incurring more error in the counts. [sent-130, score-0.646]

34 This stores counts of all words (“z”), contexts (“y”) and word- context pairs in the CM-CU sketch, and store top-d contexts for each word in priority queues. [sent-139, score-0.299]

35 In third step, we merge all the sketches using linearity property to sum the counts of the words, contexts and word-context pairs. [sent-140, score-0.258]

36 In the last step, we use the single merged sketch and merged top-d contexts list to generate the final distributed online-PMI top-d contexts list. [sent-142, score-0.643]

37 It takes around one day to compute context vectors for all the words from a chunk of 10 million sentences using first step of distributed online-PMI. [sent-143, score-0.233]

38 For the second step, the merging of sketches is fast, since sketches are two dimensional array data structures (we used the sketch of size 2 billion counters with 3 hash functions). [sent-146, score-0.702]

39 The last step to generate the final top-d contexts list is again embarrassingly parallel and fast and takes couple of hours to generate the top-d contexts for all the words from all the chunks. [sent-148, score-0.231]

40 The downside ofthe distributed online-PMI is that it splits the data into small chunks and loses information about the global best contexts for a word over all the chunks. [sent-150, score-0.236]

41 The algorithm locally computes the best contexts for each chunk, that can be bad if the algorithm misses out globally good contexts and that can affect the accuracy of downstream application. [sent-151, score-0.182]

42 2) by using distributed online-PMI, we do not lose any significant information about global contexts and perform comparable to offline-PMI over an intrinsic and extrinsic evaluation. [sent-153, score-0.283]

43 2 Dimensionality Reduction from RD to Rk We are given context vectors for Z words, our goal is to use k random projections to project the context vectors from RD to Rk. [sent-155, score-0.326]

44 vd rroedtuurncts tbheek random projections xfot,r ∀ eka,cPh word “z”. [sent-164, score-0.19]

45 We store the k random projections forP Pall words (mapped to integers) as a matrix A of size of k Z. [sent-165, score-0.325]

46 eTgheer sm) aesch aan miasmtri xd Aesc orfib seizde a obfo kve × generates random projections by implicitly creating a random projection matrix from a set of {−1, +1}. [sent-166, score-0.334]

47  Smallest to Largest 1 2 · · · Z z1 z2 · · · zZ Figure 1: First matrix pairs the words 1· · · Z and their random projection values. [sent-188, score-0.184]

48 Second matrix sorts each row by the random projection vFailrusets m fratormix s pmaairlsle tshte eto w largest. [sent-189, score-0.219]

49 Next we create a binary matrix B using matrix A by taking sign of each of the entries of the matrix A. [sent-199, score-0.264]

50 2) in such a manner that finding nearest neighbors is fast. [sent-207, score-0.361]

51 1 Independent Random Projections (IRP) Here, we describe a naive baseline approach to arrange nearest neighbors next to each other by us1073 ing Independent Random Projections (IRP). [sent-214, score-0.361]

52 First for matrix A, we pair the words z1 · · · zZ and their random projection values as shown· i·nz Fig. [sent-216, score-0.184]

53 Second, we sort the elements of each row of matrix A by their random projection values from smallest to largest (shown in Fig. [sent-218, score-0.219]

54 The sorting operation puts all the nearest neighbor words (for each k independent projections) next to each other. [sent-221, score-0.408]

55 After sorting the matrix A, we throw away the projection values leaving only the words (see Fig. [sent-222, score-0.263]

56 To search for a word in matrix A in constant time, we create another matrix C of size (k Z) (see Fig. [sent-224, score-0.206]

57 The improved PLEB algorithm puts in operation all k random projections together. [sent-232, score-0.184]

58 The sorting operation puts all the nearest neighbor words (using k projections together) next to each other for all the permutations. [sent-235, score-0.523]

59 In our implementation of PLEB, we have a matrix A of size (p Z) similar to the first matrix in Fig. [sent-238, score-0.206]

60 1(a) is that bit vectors of size k are used for sorting rather than using scalar projection values. [sent-241, score-0.293]

61 1(c) after sorting, bit vectors are discarded and a matrix C of size (p Z) is used to map the words × × 1a · · · Ztri to Cth oeifr s siozerte (pd position uisne tdh teo om matarpix t hAe. [sent-243, score-0.24]

62 Ina pPLprEoBa generating o rafn Ado amnd permutations kan ×d sorting the bit vectors of size k involves worse preprocessing time than using IRP. [sent-246, score-0.232]

63 However, spending more time in pre-processing leads to finding better approximate nearest neighbors. [sent-247, score-0.323]

64 To search a word “z”, first, we can look up matrix C to locate the k positions where “z” is stored in matrix A. [sent-262, score-0.212]

65 Once, we find “z” in each row, we can select b (beam parameter) neighbors (b/2 neighbors from left and b/2 neighbors from right of the query word. [sent-265, score-0.528]

66 This search procedure produces a set of bk (IRP) or bp (PLEB and FAST-PLEB) potential nearest neighbors for a query word “z”. [sent-269, score-0.463]

67 Next, we compute Hamming distance between query word “z” and the set of potential nearest neighbors from matrix B to return l closest nearest neighbors. [sent-270, score-0.797]

68 1074 4 Experiments We evaluate our system FLAG for fast large-scale approximate graph construction. [sent-273, score-0.183]

69 Second, we compare the approximate nearest neighbors lists generated by FLAG against the exact nearest neighbor lists. [sent-275, score-0.844]

70 Finally, we show the quality of our approximate similarity lists generated by FLAG from the web corpus. [sent-276, score-0.198]

71 2 Evaluating Distributed online-PMI Experimental Setup: First we do an intrinsic evaluation to quantitatively evaluate the distributed online-PMI vectors against the offline-PMI vectors computed from Gigaword (GW). [sent-289, score-0.336]

72 In the first pass, we store the counts of words, contexts and word-context pairs computed from GW in the Count-Min with conservative update (CM-CU) sketch. [sent-292, score-0.214]

73 We use the CM-CU sketch of size 2 billion counters (bounded 8 GB memory) with 3 hash functions. [sent-293, score-0.52]

74 As FLAG is go1075 ing to use the context vectors to find nearest neigh- bors, we also throw away all those words which have ≤ 50 contexts associated with them. [sent-317, score-0.432]

75 3 Evaluating Approximate Nearest Neighbor Experimental Setup: To evaluate approximate nearest neighbor similarity lists generated by FLAG, we conduct three experiments. [sent-333, score-0.492]

76 For each word, both exact and approximate methods return l = 100 nearest neighbors. [sent-336, score-0.388]

77 The exact similarity lists for 447 test words is computed by calculating cosine similarity between 447 test words with respect to all other words. [sent-337, score-0.179]

78 ) approximate nearest neighbor similarity lists against the exact similarity lists. [sent-339, score-0.577]

79 Evaluation Metric: We use two kinds of measures, recall and Pearson’s correlation to measure the overlap in the approximate and exact similarity lists. [sent-344, score-0.189]

80 Intuitively, recall (R) captures the number of exact nearest neighbors over GW data set. [sent-345, score-0.399]

81 45ρ097 Table 5: Evaluation results on comparing LSH, IRP, PLEB, and FAST-PLEB with k = 3000, b = 40, p = 1000 and q exact nearest neighbors across three different data sets: GW, GWB50, and GWB100. [sent-373, score-0.399]

82 nearest neighbors that are found in both the lists and then Pearson’s (ρ) correlation captures if the relative order of these lists is preserved in both the similarity lists. [sent-376, score-0.502]

83 Results: For the first experiment, we evaluate IRP, PLEB, and FAST-PLEB against the exact nearest neighbor similarity lists. [sent-378, score-0.379]

84 We vary the approximate nearest neighbor beam parameter b = {20, 30, 40, 50, 100} that controls the number obf = =clo {se2s0t, neighbors f1o0r0 a w thoartd cwonithtr respect ntou meabcehr independent random projection. [sent-380, score-0.575]

85 For FAST-PLEB, we set q = 10 (q << k) that is the number of random bits selected out of k to generate p permuted bit vectors of size q. [sent-382, score-0.225]

86 However, using large b implies generating a long potential nearest neighbor list close to the size of the unique context vectors. [sent-387, score-0.324]

87 If we focus on the gray color row with b = 40 (This will have comparatively small potential list and return nearest neighbors in less time), IRP has worse recall with best pre-processing time. [sent-388, score-0.423]

88 Even though FLAG’s approximate nearest neighbor algorithm has less recall with respect to exact but still the quality of these nearest neighbor lists is excellent. [sent-403, score-0.777]

89 We report average timing results (averaged over 10 runs and 447 words) to find top 100 nearest neighbors for single query word. [sent-414, score-0.463]

90 Comparing first and second rows show that LSH is 87 times faster than computing exact top-100 (cosine similarity) nearest neighbors. [sent-416, score-0.284]

91 The graph has 106, 733 nodes (words), with each node having 100 edges that denote the top l = 100 approximate nearest neighbors associated with each node. [sent-423, score-0.495]

92 1 Google Sets Problem Google Sets problem (Ghahramani and Heller, 2005) can be defined as: given a set of query words, return top t similar words with respect to query words. [sent-430, score-0.231]

93 To evaluate the quality of our approximate large-scale graph, we return top 25 words which have best aggregated similarity scores with respect to query words. [sent-431, score-0.28]

94 We take 5 classes and their query terms (McIntosh and Curran, 2008) shown in Table 8 and our goal is to learn 25 new words which are similar with these 5 query words. [sent-432, score-0.204]

95 6 Conclusion We proposed a system, FLAG which constructs fast large-scale approximate graphs from large data sets. [sent-457, score-0.193]

96 An improved data stream summary: The count-min sketch and its applications. [sent-505, score-0.376]

97 Approximate scalable bounded space sketch for large data NLP. [sent-538, score-0.345]

98 Approximate nearest neighbors: towards removing the curse of dimensionality. [sent-555, score-0.219]

99 One sketch for all: Theory and application of conditional random sampling. [sent-577, score-0.38]

100 Randomized algorithms and nlp: using locality sensitive hash function for high speed noun clustering. [sent-602, score-0.214]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pleb', 0.516), ('sketch', 0.345), ('flag', 0.304), ('irp', 0.264), ('nearest', 0.219), ('lsh', 0.164), ('neighbors', 0.142), ('distributed', 0.116), ('hash', 0.116), ('projections', 0.115), ('approximate', 0.104), ('goyal', 0.103), ('query', 0.102), ('charikar', 0.092), ('cm', 0.092), ('contexts', 0.091), ('sketches', 0.091), ('matrix', 0.088), ('vectors', 0.088), ('durme', 0.084), ('cormode', 0.082), ('sorting', 0.08), ('hashing', 0.076), ('neighbor', 0.075), ('buffer', 0.072), ('concrete', 0.072), ('ravichandran', 0.068), ('gw', 0.067), ('hamming', 0.066), ('lall', 0.062), ('locality', 0.061), ('projection', 0.061), ('store', 0.057), ('hk', 0.056), ('streaming', 0.056), ('gb', 0.053), ('fastpleb', 0.053), ('fast', 0.049), ('daum', 0.048), ('lists', 0.047), ('similarity', 0.047), ('pmi', 0.046), ('linearity', 0.045), ('intrinsic', 0.044), ('ashwin', 0.041), ('indyk', 0.041), ('balls', 0.04), ('mcintosh', 0.04), ('onlinepmi', 0.04), ('tomb', 0.04), ('vd', 0.04), ('graphs', 0.04), ('pantel', 0.039), ('bits', 0.038), ('turney', 0.038), ('exact', 0.038), ('van', 0.037), ('sensitive', 0.037), ('pointwise', 0.037), ('stored', 0.036), ('update', 0.035), ('random', 0.035), ('row', 0.035), ('mutual', 0.034), ('throw', 0.034), ('puts', 0.034), ('rusu', 0.034), ('cd', 0.034), ('bit', 0.034), ('hal', 0.033), ('counter', 0.032), ('pearson', 0.032), ('graham', 0.032), ('extrinsic', 0.032), ('distributional', 0.031), ('nlp', 0.031), ('zz', 0.031), ('velikovich', 0.031), ('motwani', 0.031), ('muthukrishnan', 0.031), ('counts', 0.031), ('stream', 0.031), ('graph', 0.03), ('amit', 0.03), ('gigaword', 0.03), ('pass', 0.03), ('size', 0.03), ('chunks', 0.029), ('chunk', 0.029), ('polarity', 0.029), ('priority', 0.029), ('counters', 0.029), ('return', 0.027), ('faster', 0.027), ('achlioptas', 0.026), ('enidn', 0.026), ('heller', 0.026), ('stoc', 0.026), ('thelen', 0.026), ('variant', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

Author: Amit Goyal ; Hal Daume III ; Raul Guerra

Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.

2 0.4077259 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP

Author: Amit Goyal ; Hal Daume III ; Graham Cormode

Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.

3 0.13387819 120 emnlp-2012-Streaming Analysis of Discourse Participants

Author: Benjamin Van Durme

Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.

4 0.064742528 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

Author: Yanyan Zhao ; Bing Qin ; Ting Liu

Abstract: This paper focuses on the task of collocation polarity disambiguation. The collocation refers to a binary tuple of a polarity word and a target (such as ⟨long, battery life⟩ or ⟨long, ast atratrguep⟩t) (, siunc whh aisch ⟨ ltohneg s,en btatitmeernyt l iofrei⟩en otrat ⟨iolonn gof, tshtaer polarity wwohirdch (“long”) changes along owniothf different targets (“battery life” or “startup”). To disambiguate a collocation’s polarity, previous work always turned to investigate the polarities of its surrounding contexts, and then assigned the majority polarity to the collocation. However, these contexts are limited, thus the resulting polarity is insufficient to be reliable. We therefore propose an unsupervised three-component framework to expand some pseudo contexts from web, to help disambiguate a collocation’s polarity.Without using any additional labeled data, experiments , show that our method is effective.

5 0.061174124 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

Author: William Blacoe ; Mirella Lapata

Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.

6 0.060004141 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

7 0.05965846 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

8 0.059493247 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

9 0.058832083 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context

10 0.058134846 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions

11 0.058088537 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

12 0.05296804 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

13 0.050510414 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion

14 0.04770986 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

15 0.047639858 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics

16 0.043736409 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

17 0.040087491 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

18 0.03963742 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

19 0.038731832 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

20 0.038334254 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.159), (1, 0.004), (2, 0.01), (3, 0.101), (4, 0.035), (5, 0.003), (6, 0.146), (7, 0.063), (8, 0.296), (9, 0.177), (10, -0.1), (11, -0.482), (12, 0.14), (13, 0.091), (14, -0.11), (15, 0.342), (16, -0.036), (17, -0.043), (18, -0.01), (19, -0.005), (20, 0.185), (21, 0.101), (22, -0.013), (23, 0.069), (24, -0.065), (25, -0.087), (26, 0.016), (27, -0.017), (28, -0.038), (29, -0.084), (30, -0.009), (31, -0.053), (32, -0.085), (33, -0.028), (34, -0.028), (35, -0.016), (36, 0.044), (37, -0.022), (38, 0.005), (39, 0.031), (40, 0.023), (41, -0.028), (42, -0.029), (43, 0.029), (44, 0.001), (45, -0.036), (46, -0.012), (47, 0.011), (48, -0.015), (49, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96059984 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP

Author: Amit Goyal ; Hal Daume III ; Graham Cormode

Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.

same-paper 2 0.9516142 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

Author: Amit Goyal ; Hal Daume III ; Raul Guerra

Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.

3 0.39544958 120 emnlp-2012-Streaming Analysis of Discourse Participants

Author: Benjamin Van Durme

Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.

4 0.24745904 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

Author: Wen-tau Yih ; Geoffrey Zweig ; John Platt

Abstract: Existing vector space models typically map synonyms and antonyms to similar word vectors, and thus fail to represent antonymy. We introduce a new vector space representation where antonyms lie on opposite sides of a sphere: in the word vector space, synonyms have cosine similarities close to one, while antonyms are close to minus one. We derive this representation with the aid of a thesaurus and latent semantic analysis (LSA). Each entry in the thesaurus a word sense along with its synonyms and antonyms is treated as a “document,” and the resulting document collection is subjected to LSA. The key contribution of this work is to show how to assign signs to the entries in the co-occurrence matrix on which LSA operates, so as to induce a subspace with the desired property. – – We evaluate this procedure with the Graduate Record Examination questions of (Mohammed et al., 2008) and find that the method improves on the results of that study. Further improvements result from refining the subspace representation with discriminative training, and augmenting the training data with general newspaper text. Altogether, we improve on the best previous results by 11points absolute in F measure.

5 0.23296919 79 emnlp-2012-Learning Syntactic Categories Using Paradigmatic Representations of Word Context

Author: Mehmet Ali Yatbaz ; Enis Sert ; Deniz Yuret

Abstract: We investigate paradigmatic representations of word context in the domain of unsupervised syntactic category acquisition. Paradigmatic representations of word context are based on potential substitutes of a word in contrast to syntagmatic representations based on properties of neighboring words. We compare a bigram based baseline model with several paradigmatic models and demonstrate significant gains in accuracy. Our best model based on Euclidean co-occurrence embedding combines the paradigmatic context representation with morphological and orthographic features and achieves 80% many-to-one accuracy on a 45-tag 1M word corpus.

6 0.2032728 25 emnlp-2012-Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation

7 0.19641954 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics

8 0.19217624 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

9 0.18988453 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion

10 0.17632411 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

11 0.17450561 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

12 0.16276847 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

13 0.15879323 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

14 0.15048823 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

15 0.1472473 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering

16 0.14699276 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

17 0.14523138 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

18 0.14231282 132 emnlp-2012-Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets

19 0.13088055 26 emnlp-2012-Building a Lightweight Semantic Model for Unsupervised Information Extraction on Short Listings

20 0.12875538 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.016), (6, 0.081), (8, 0.25), (9, 0.025), (10, 0.011), (14, 0.018), (16, 0.029), (25, 0.014), (34, 0.067), (45, 0.02), (60, 0.076), (63, 0.031), (64, 0.015), (65, 0.02), (70, 0.023), (74, 0.028), (76, 0.071), (80, 0.011), (86, 0.036), (95, 0.051)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.71419001 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

Author: Amit Goyal ; Hal Daume III ; Raul Guerra

Abstract: Many natural language processing problems involve constructing large nearest-neighbor graphs. We propose a system called FLAG to construct such graphs approximately from large data sets. To handle the large amount of data, our algorithm maintains approximate counts based on sketching algorithms. To find the approximate nearest neighbors, our algorithm pairs a new distributed online-PMI algorithm with novel fast approximate nearest neighbor search algorithms (variants of PLEB). These algorithms return the approximate nearest neighbors quickly. We show our system’s efficiency in both intrinsic and extrinsic experiments. We further evaluate our fast search algorithms both quantitatively and qualitatively on two NLP applications.

2 0.55615872 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP

Author: Amit Goyal ; Hal Daume III ; Graham Cormode

Abstract: Many NLP tasks rely on accurate statistics from large corpora. Tracking complete statistics is memory intensive, so recent work has proposed using compact approximate “sketches” of frequency distributions. We describe 10 sketch methods, including existing and novel variants. We compare and study the errors (over-estimation and underestimation) made by the sketches. We evaluate several sketches on three important NLP problems. Our experiments show that one sketch performs best for all the three tasks.

3 0.5069949 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media

Author: Bo Pang ; Sujith Ravi

Abstract: The question “how predictable is English?” has long fascinated researchers. While prior work has focused on formal English typically used in news articles, we turn to texts generated by users in online settings that are more informal in nature. We are motivated by a novel application scenario: given the difficulty of typing on mobile devices, can we help reduce typing effort with message completion, especially in conversational settings? We propose a method for automatic response completion. Our approach models both the language used in responses and the specific context provided by the original message. Our experimental results on a large-scale dataset show that both components help reduce typing effort. We also perform an information-theoretic study in this setting and examine the entropy of user-generated content, especially in con- versational scenarios, to better understand predictability of user generated English.

4 0.47082487 120 emnlp-2012-Streaming Analysis of Discourse Participants

Author: Benjamin Van Durme

Abstract: Inferring attributes of discourse participants has been treated as a batch-processing task: data such as all tweets from a given author are gathered in bulk, processed, analyzed for a particular feature, then reported as a result of academic interest. Given the sources and scale of material used in these efforts, along with potential use cases of such analytic tools, discourse analysis should be reconsidered as a streaming challenge. We show that under certain common formulations, the batchprocessing analytic framework can be decomposed into a sequential series of updates, using as an example the task of gender classification. Once in a streaming framework, and motivated by large data sets generated by social media services, we present novel results in approximate counting, showing its applicability to space efficient streaming classification.

5 0.44612134 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum

Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.

6 0.44590011 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

7 0.44584775 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

8 0.44393557 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

9 0.44119632 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

10 0.43551269 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

11 0.43541372 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

12 0.43539968 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

13 0.43481725 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?

14 0.43281811 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

15 0.43233585 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

16 0.4323093 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

17 0.43218148 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

18 0.43208966 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

19 0.43085846 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

20 0.42826238 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation