acl acl2013 acl2013-381 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Sean Moran ; Victor Lavrenko ; Miles Osborne
Abstract: We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. [sent-10, score-0.487]
2 Previous approaches assign a constant number of bits per hyperplane. [sent-11, score-0.278]
3 This neglects the fact that a subset of hyperplanes may be more informative than others. [sent-12, score-0.385]
4 Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven non-uniform bit allocation across hyperplanes. [sent-13, score-0.389]
5 Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets. [sent-14, score-0.61]
6 1 Introduction The task of retrieving the nearest neighbours to a given query document permeates the field of Natural Language Processing (NLP). [sent-15, score-0.444]
7 Nearest neighbour search has been used for applications as diverse as automatically detecting document translation pairs for the purposes of training a statistical machine translation system (SMT) (Krstovski and Smith, 2011), the large-scale generation of noun similarity lists (Ravichandran et al. [sent-16, score-0.218]
8 Approximate nearest neighbour (ANN) search using hashing techniques has recently gained prominence within NLP. [sent-19, score-0.644]
9 The hashing-based approach maps the data into a substantially more compact representation referred to as a fingerprint, that is more efficient for performing similarity computations. [sent-20, score-0.078]
10 The resulting compact binary representation radically reduces memory requirements while also permitting fast sub-linear time retrieval of approximate nearest neighbours. [sent-21, score-0.588]
11 Hashing-based ANN techniques generally comprise two main steps: a projection stage followed by a quantisation stage. [sent-22, score-0.693]
12 The projection stage performs a neighbourhood preserving embedding, mapping the input data into a lower-dimensional representation. [sent-23, score-0.334]
13 The quantisation stage subsequently reduces the cardinality of this representation by converting the real-valued projections to binary. [sent-24, score-0.777]
14 Quantisation is a lossy transformation which can have a significant impact on the resulting quality of the binary encoding. [sent-25, score-0.085]
15 Previous work has quantised each projected dimension into a uniform number of bits (Indyk and Motwani, 1998) (Kong and Li, 2012) (Kong et al. [sent-26, score-0.37]
16 We demonstrate that uniform allocation of bits is sub-optimal and propose a data-driven scheme for variable bit allocation. [sent-29, score-0.728]
17 Our approach is distinct from previous work in that it provides a general objective function for bit allocation. [sent-30, score-0.235]
18 VBQ makes no assumptions on the data and, in addition to LSH, it applies to a broad range of other projection functions. [sent-31, score-0.178]
19 2 Related Work Locality sensitive hashing (LSH) (Indyk and Motwani, 1998) is an example of an approximate nearest neighbour search technique that has been widely used within the field ofNLP to preserve the Cosine distances between documents (Charikar, 2002). [sent-32, score-0.737]
20 LSH for cosine distance draws a large number of random hyperplanes within the input feature space, effectively dividing the space into non-overlapping regions (or buckets). [sent-33, score-0.475]
21 Each hyperplane contributes one bit to the encoding, the value (0 or 1) of which is determined by comput753 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-34, score-0.399]
22 Figure 1: Left: Data points with identical shapes are 1-NN. [sent-192, score-0.105]
23 Two hyperplanes h1, h2 are shown alongside their associated normal vectors (n1, n2). [sent-193, score-0.509]
24 Right top: Projection of points onto the normal vectors n1 and n2 of the hyperplanes (arrows denote projections). [sent-194, score-0.548]
25 Right middle: Positioning of the points along normal vector n2. [sent-195, score-0.162]
26 Three quantisation thresholds (t1, t2, t3, and consequently 2 bits) can maintain the neighbourhood structure. [sent-196, score-0.657]
27 Right bottom: the high degree of mixing between the 1-NN means that this hyperplane (h1) is likely to have 0 bits assigned (and therefore be discarded entirely). [sent-197, score-0.479]
28 ing the dot product of a data-point (x) with the normal vector to the hyperplane (ni): that is, if x. [sent-198, score-0.273]
wordName wordTfidf (topN-words)
[('quantisation', 0.463), ('hyperplanes', 0.331), ('lsh', 0.293), ('bits', 0.278), ('bit', 0.235), ('nearest', 0.222), ('vbq', 0.199), ('neighbour', 0.185), ('neighbours', 0.152), ('hyperplane', 0.134), ('indyk', 0.132), ('neighbourhood', 0.132), ('edinburgh', 0.121), ('motwani', 0.117), ('hashing', 0.108), ('normal', 0.103), ('projection', 0.1), ('sean', 0.097), ('approximate', 0.09), ('informatics', 0.076), ('inf', 0.074), ('projections', 0.074), ('stage', 0.067), ('uniform', 0.063), ('school', 0.062), ('variable', 0.062), ('uk', 0.061), ('points', 0.059), ('differentiated', 0.059), ('lossy', 0.059), ('charikar', 0.059), ('kong', 0.058), ('gouws', 0.054), ('neglects', 0.054), ('moran', 0.053), ('broad', 0.052), ('allocation', 0.052), ('permitting', 0.051), ('ofnlp', 0.051), ('positioning', 0.051), ('datadriven', 0.051), ('dubbed', 0.051), ('buckets', 0.051), ('compact', 0.051), ('optimally', 0.048), ('radically', 0.048), ('shapes', 0.046), ('alongside', 0.045), ('ravichandran', 0.045), ('lavrenko', 0.045), ('reduces', 0.044), ('locality', 0.043), ('cardinality', 0.041), ('ann', 0.038), ('cosine', 0.038), ('scheme', 0.038), ('arrows', 0.038), ('prominence', 0.038), ('comprise', 0.037), ('victor', 0.036), ('mixing', 0.036), ('dot', 0.036), ('draws', 0.036), ('retrieving', 0.036), ('dividing', 0.035), ('preserving', 0.035), ('osborne', 0.035), ('regions', 0.035), ('embedding', 0.035), ('ac', 0.035), ('preserve', 0.034), ('field', 0.034), ('right', 0.034), ('exact', 0.034), ('search', 0.033), ('miles', 0.033), ('image', 0.033), ('returning', 0.032), ('gained', 0.032), ('maintain', 0.032), ('distances', 0.031), ('converting', 0.031), ('discarded', 0.031), ('contributes', 0.03), ('subsequently', 0.03), ('thresholds', 0.03), ('vectors', 0.03), ('requirements', 0.03), ('ni', 0.029), ('projected', 0.029), ('stephan', 0.028), ('entirely', 0.028), ('representation', 0.027), ('techniques', 0.026), ('transformation', 0.026), ('assumptions', 0.026), ('fraction', 0.026), ('possibility', 0.025), ('onto', 0.025), ('retrieval', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 381 acl-2013-Variable Bit Quantisation for LSH
Author: Sean Moran ; Victor Lavrenko ; Miles Osborne
Abstract: We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.
2 0.084064752 97 acl-2013-Cross-lingual Projections between Languages from Different Families
Author: Mo Yu ; Tiejun Zhao ; Yalong Bai ; Hao Tian ; Dianhai Yu
Abstract: Cross-lingual projection methods can benefit from resource-rich languages to improve performances of NLP tasks in resources-scarce languages. However, these methods confronted the difficulty of syntactic differences between languages especially when the pair of languages varies greatly. To make the projection method well-generalize to diverse languages pairs, we enhance the projection method based on word alignments by introducing target-language word representations as features and proposing a novel noise removing method based on these word representations. Experiments showed that our methods improve the performances greatly on projections between English and Chinese.
3 0.079243779 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
Author: David Kauchak
Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.
4 0.074587114 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
Author: Kai Liu ; Yajuan Lu ; Wenbin Jiang ; Qun Liu
Abstract: This paper describes a novel strategy for automatic induction of a monolingual dependency grammar under the guidance of bilingually-projected dependency. By moderately leveraging the dependency information projected from the parsed counterpart language, and simultaneously mining the underlying syntactic structure of the language considered, it effectively integrates the advantages of bilingual projection and unsupervised induction, so as to induce a monolingual grammar much better than previous models only using bilingual projection or unsupervised induction. We induced dependency gram- mar for five different languages under the guidance of dependency information projected from the parsed English translation, experiments show that the bilinguallyguided method achieves a significant improvement of 28.5% over the unsupervised baseline and 3.0% over the best projection baseline on average.
5 0.06588982 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
Author: Sujith Ravi
Abstract: In this paper, we propose a new Bayesian inference method to train statistical machine translation systems using only nonparallel corpora. Following a probabilistic decipherment approach, we first introduce a new framework for decipherment training that is flexible enough to incorporate any number/type of features (besides simple bag-of-words) as side-information used for estimating translation models. In order to perform fast, efficient Bayesian inference in this framework, we then derive a hash sampling strategy that is inspired by the work of Ahmed et al. (2012). The new translation hash sampler enables us to scale elegantly to complex models (for the first time) and large vocab- ulary/corpora sizes. We show empirical results on the OPUS data—our method yields the best BLEU scores compared to existing approaches, while achieving significant computational speedups (several orders faster). We also report for the first time—BLEU score results for a largescale MT task using only non-parallel data (EMEA corpus).
6 0.055755161 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text
7 0.039654322 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
8 0.037883945 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis
9 0.033063855 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
10 0.029760618 219 acl-2013-Learning Entity Representation for Entity Disambiguation
11 0.028035205 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
12 0.027782856 103 acl-2013-DISSECT - DIStributional SEmantics Composition Toolkit
13 0.027475847 334 acl-2013-Supervised Model Learning with Feature Grouping based on a Discrete Constraint
14 0.027112383 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
15 0.0268966 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
16 0.026862729 382 acl-2013-Variational Inference for Structured NLP Models
17 0.026739113 294 acl-2013-Re-embedding words
18 0.026143353 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
19 0.025893649 55 acl-2013-Are Semantically Coherent Topic Models Useful for Ad Hoc Information Retrieval?
20 0.025827983 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
topicId topicWeight
[(0, 0.078), (1, -0.01), (2, 0.011), (3, -0.015), (4, -0.01), (5, -0.028), (6, 0.004), (7, -0.004), (8, -0.021), (9, -0.027), (10, -0.004), (11, -0.059), (12, 0.023), (13, 0.037), (14, -0.03), (15, 0.001), (16, -0.019), (17, -0.01), (18, 0.027), (19, 0.004), (20, 0.042), (21, 0.039), (22, -0.02), (23, -0.013), (24, -0.028), (25, -0.005), (26, 0.061), (27, 0.015), (28, -0.055), (29, 0.009), (30, -0.012), (31, -0.01), (32, 0.017), (33, 0.03), (34, 0.028), (35, -0.029), (36, 0.062), (37, 0.033), (38, 0.016), (39, 0.017), (40, -0.024), (41, -0.014), (42, -0.001), (43, 0.07), (44, -0.044), (45, 0.074), (46, -0.029), (47, -0.001), (48, -0.087), (49, 0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.92255384 381 acl-2013-Variable Bit Quantisation for LSH
Author: Sean Moran ; Victor Lavrenko ; Miles Osborne
Abstract: We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.
2 0.6313979 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text
Author: Ryan Georgi ; Fei Xia ; William D. Lewis
Abstract: As most of the world’s languages are under-resourced, projection algorithms offer an enticing way to bootstrap the resources available for one resourcepoor language from a resource-rich language by means of parallel text and word alignment. These algorithms, however, make the strong assumption that the language pairs share common structures and that the parse trees will resemble one another. This assumption is useful but often leads to errors in projection. In this paper, we will address this weakness by using trees created from instances of Interlinear Glossed Text (IGT) to discover patterns of divergence between the lan- guages. We will show that this method improves the performance of projection algorithms significantly in some languages by accounting for divergence between languages using only the partial supervision of a few corrected trees.
3 0.62129492 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
Author: Kai Liu ; Yajuan Lu ; Wenbin Jiang ; Qun Liu
Abstract: This paper describes a novel strategy for automatic induction of a monolingual dependency grammar under the guidance of bilingually-projected dependency. By moderately leveraging the dependency information projected from the parsed counterpart language, and simultaneously mining the underlying syntactic structure of the language considered, it effectively integrates the advantages of bilingual projection and unsupervised induction, so as to induce a monolingual grammar much better than previous models only using bilingual projection or unsupervised induction. We induced dependency gram- mar for five different languages under the guidance of dependency information projected from the parsed English translation, experiments show that the bilinguallyguided method achieves a significant improvement of 28.5% over the unsupervised baseline and 3.0% over the best projection baseline on average.
4 0.58838701 97 acl-2013-Cross-lingual Projections between Languages from Different Families
Author: Mo Yu ; Tiejun Zhao ; Yalong Bai ; Hao Tian ; Dianhai Yu
Abstract: Cross-lingual projection methods can benefit from resource-rich languages to improve performances of NLP tasks in resources-scarce languages. However, these methods confronted the difficulty of syntactic differences between languages especially when the pair of languages varies greatly. To make the projection method well-generalize to diverse languages pairs, we enhance the projection method based on word alignments by introducing target-language word representations as features and proposing a novel noise removing method based on these word representations. Experiments showed that our methods improve the performances greatly on projections between English and Chinese.
5 0.54795957 194 acl-2013-Improving Text Simplification Language Modeling Using Unsimplified Text Data
Author: David Kauchak
Abstract: In this paper we examine language modeling for text simplification. Unlike some text-to-text translation tasks, text simplification is a monolingual translation task allowing for text in both the input and output domain to be used for training the language model. We explore the relationship between normal English and simplified English and compare language models trained on varying amounts of text from each. We evaluate the models intrinsically with perplexity and extrinsically on the lexical simplification task from SemEval 2012. We find that a combined model using both simplified and normal English data achieves a 23% improvement in perplexity and a 24% improvement on the lexical simplification task over a model trained only on simple data. Post-hoc analysis shows that the additional unsimplified data provides better coverage for unseen and rare n-grams.
6 0.50687033 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
7 0.49265262 322 acl-2013-Simple, readable sub-sentences
8 0.4625735 128 acl-2013-Does Korean defeat phonotactic word segmentation?
9 0.44657603 3 acl-2013-A Comparison of Techniques to Automatically Identify Complex Words.
10 0.44596916 217 acl-2013-Latent Semantic Matching: Application to Cross-language Text Categorization without Alignment Information
11 0.42620748 349 acl-2013-The mathematics of language learning
12 0.405132 371 acl-2013-Unsupervised joke generation from big data
13 0.39813873 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
14 0.39576453 72 acl-2013-Bridging Languages through Etymology: The case of cross language text categorization
15 0.39513978 172 acl-2013-Graph-based Local Coherence Modeling
16 0.39182079 31 acl-2013-A corpus-based evaluation method for Distributional Semantic Models
17 0.39033359 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
18 0.37430909 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
19 0.37229028 249 acl-2013-Models of Semantic Representation with Visual Attributes
20 0.36929381 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
topicId topicWeight
[(0, 0.033), (6, 0.039), (11, 0.059), (24, 0.057), (26, 0.024), (35, 0.104), (36, 0.397), (42, 0.047), (48, 0.028), (70, 0.025), (88, 0.031), (90, 0.028), (95, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.80299717 381 acl-2013-Variable Bit Quantisation for LSH
Author: Sean Moran ; Victor Lavrenko ; Miles Osborne
Abstract: We introduce a scheme for optimally allocating a variable number of bits per LSH hyperplane. Previous approaches assign a constant number of bits per hyperplane. This neglects the fact that a subset of hyperplanes may be more informative than others. Our method, dubbed Variable Bit Quantisation (VBQ), provides a datadriven non-uniform bit allocation across hyperplanes. Despite only using a fraction of the available hyperplanes, VBQ outperforms uniform quantisation by up to 168% for retrieval across standard text and image datasets.
2 0.57037342 224 acl-2013-Learning to Extract International Relations from Political Context
Author: Brendan O'Connor ; Brandon M. Stewart ; Noah A. Smith
Abstract: We describe a new probabilistic model for extracting events between major political actors from news corpora. Our unsupervised model brings together familiar components in natural language processing (like parsers and topic models) with contextual political information— temporal and dyad dependence—to infer latent event classes. We quantitatively evaluate the model’s performance on political science benchmarks: recovering expert-assigned event class valences, and detecting real-world conflict. We also conduct a small case study based on our model’s inferences. A supplementary appendix, and replication software/data are available online, at: http://brenocon.com/irevents
3 0.52701759 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
Author: Markus Gartner ; Gregor Thiele ; Wolfgang Seeker ; Anders Bjorkelund ; Jonas Kuhn
Abstract: We present ICARUS, a versatile graphical search tool to query dependency treebanks. Search results can be inspected both quantitatively and qualitatively by means of frequency lists, tables, or dependency graphs. ICARUS also ships with plugins that enable it to interface with tool chains running either locally or remotely.
4 0.35713464 231 acl-2013-Linggle: a Web-scale Linguistic Search Engine for Words in Context
Author: Joanne Boisson ; Ting-Hui Kao ; Jian-Cheng Wu ; Tzu-Hsi Yen ; Jason S. Chang
Abstract: In this paper, we introduce a Web-scale linguistics search engine, Linggle, that retrieves lexical bundles in response to a given query. The query might contain keywords, wildcards, wild parts of speech (PoS), synonyms, and additional regular expression (RE) operators. In our approach, we incorporate inverted file indexing, PoS information from BNC, and semantic indexing based on Latent Dirichlet Allocation with Google Web 1T. The method involves parsing the query to transforming it into several keyword retrieval commands. Word chunks are retrieved with counts, further filtering the chunks with the query as a RE, and finally displaying the results according to the counts, similarities, and topics. Clusters of synonyms or conceptually related words are also provided. In addition, Linggle provides example sentences from The New York Times on demand. The current implementation of Linggle is the most functionally comprehensive, and is in principle language and dataset independent. We plan to extend Linggle to provide fast and convenient access to a wealth of linguistic information embodied in Web scale datasets including Google Web 1T and Google Books Ngram for many major languages in the world. 1
5 0.35652244 185 acl-2013-Identifying Bad Semantic Neighbors for Improving Distributional Thesauri
Author: Olivier Ferret
Abstract: Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the identification of the neighbors of a thesaurus entry that are not semantically linked to this entry. This identification relies on a discriminative classifier trained from unsupervised selected examples for building a distributional model of the entry in texts. Its bad neighbors are found by applying this classifier to a representative set of occurrences of each of these neighbors. We evaluate the interest of this method for a large set of English nouns with various frequencies.
6 0.35487437 172 acl-2013-Graph-based Local Coherence Modeling
7 0.35480425 238 acl-2013-Measuring semantic content in distributional vectors
8 0.35459256 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
9 0.3532401 121 acl-2013-Discovering User Interactions in Ideological Discussions
10 0.35223046 341 acl-2013-Text Classification based on the Latent Topics of Important Sentences extracted by the PageRank Algorithm
11 0.35088554 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
12 0.35088515 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
13 0.35087246 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple
14 0.35052252 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
15 0.34991252 347 acl-2013-The Role of Syntax in Vector Space Models of Compositional Semantics
16 0.34976262 215 acl-2013-Large-scale Semantic Parsing via Schema Matching and Lexicon Extension
17 0.34896106 58 acl-2013-Automated Collocation Suggestion for Japanese Second Language Learners
18 0.34790814 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
19 0.34725764 4 acl-2013-A Context Free TAG Variant
20 0.34712636 169 acl-2013-Generating Synthetic Comparable Questions for News Articles