emnlp emnlp2013 emnlp2013-142 knowledge-graph by maker-knowledge-mining

142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries


Source: pdf

Author: Marius Pasca

Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 1600 Amphitheatre Parkway Mountain View, California 94043 mars @ google . [sent-2, score-0.037]

2 com Abstract This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. [sent-3, score-1.203]

3 The class labels are more numerous and more diverse than those produced by current extraction methods. [sent-4, score-0.825]

4 Also extracted are representative sets of instances (singapore, united kingdom) for the class labels. [sent-5, score-0.66]

5 1 Introduction Motivation: As more semantic constraints are added, concepts like companies become more specific, e. [sent-6, score-0.494]

6 , companies that are in the software business, and have been started in a garage. [sent-8, score-0.738]

7 The sets of instances associated with the classes become smaller; the class labels used to concisely describe the meaning of more specific concepts tend to become longer. [sent-9, score-0.997]

8 In fact, fine-grained class labels such as “software companies started in a garage ” are often complex noun phrases, since they must somehow summarize multiple semantic constraints. [sent-10, score-1.472]

9 , “software companies started in a garage ”) class labels, virtually all class labels acquired from text by previous extraction methods (Etzioni et al. [sent-15, score-2.004]

10 Indeed, instances and class labels that are relatively complex nouns are known to be difficult to detect and pick out precisely from surrounding text (Downey et al. [sent-18, score-0.87]

11 This and other challenges associated 403 with large-scale extraction from Web text (Etzioni et al. [sent-20, score-0.042]

12 , 2011) cause the extracted class labels to usually follow a rigid modifiers-plus-nouns format. [sent-21, score-0.815]

13 The format covers nouns ( “companies”) possibly preceded by one or many modifiers ( “software companies ”, “computer security software companies ”). [sent-22, score-1.219]

14 , 2005), “strong acids ” (Pantel and Pennacchiotti, 2006), “prestigious private schools ” (Van Durme and Pas ¸ca, 2008), “aquatic birds” (Kozareva and Hovy, 2010). [sent-24, score-0.065]

15 As an alternative to extracting class labels from text, some methods simply import them from human-curated resources, for example from the set of categories encoded in Wikipedia (Remy, 2002). [sent-25, score-0.849]

16 As a result, class labels potentially exhibit higher syntactic diversity. [sent-26, score-0.855]

17 The modifiers-plus-nouns format ( “computer security software companies ”) is usually still the norm. [sent-27, score-0.757]

18 But other formats are possible: “software companies based in london ”, “software companies of the united kingdom ”. [sent-28, score-1.009]

19 Vocabulary coverage gaps remain a problem, with many relevant class labels ( “software companies of texas ” “software companies started in a garage ”, “software companies that give sap training ”) still missing. [sent-29, score-2.382]

20 There is a need for methods that more aggressively identify fine-grained class labels, beyond those extracted by previous methods or encoded in existing, manually-created resources. [sent-30, score-0.516]

21 Such class labels increase coverage, for example in scenarios that enrich Web search results with instances available for the class labels specified in the queries. [sent-31, score-1.696]

22 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 4t0ic3s–414, method to assemble a large vocabulary of class labels from queries. [sent-35, score-0.909]

23 The class labels include finegrained class labels ( “countries with double taxation agreements with india ”, “no front license plate states ”) that are difficult to extract from text by previous methods for open-domain information extraction. [sent-36, score-2.153]

24 Second, the method acquires representative instances (singapore, united kingdom; arizona, new mexico) that belong to fine-grained class labels ( “countries with double taxation agreements with india ”, “no front license plate states ”). [sent-37, score-1.534]

25 Both class labels and their instances are extracted from Web search queries. [sent-38, score-0.913]

26 1 Extraction of Class Labels Overview: Given a set of arbitrary Web search queries as input, our method produces a vocabulary of fine-grained class labels. [sent-40, score-0.852]

27 Initial Vocabulary of Class Labels: Out of a set of arbitrary search queries available as input, the queries in the format “list of . [sent-42, score-0.507]

28 ” are selected as the initial vocabulary of class labels. [sent-44, score-0.656]

29 Thus, the query “list of software companies that use linux” gives the class label “software companies that use linux”. [sent-46, score-1.638]

30 Generation via Phrase Similarities: As a prerequisite to generating class labels, distributionally similar phrases (Lin and Pantel, 2002; Lin and Wu, 2009; Pantel et al. [sent-47, score-0.684]

31 A phrase is represented as a vector of its contextual features. [sent-49, score-0.075]

32 A feature is a word, collected from windows of three words centered around the occurrences of the phrase in sentences across Web documents (Lin and Wu, 2009). [sent-50, score-0.038]

33 In the contextual vector of a phrase, the weight of a feature is the pointwise-mutual information (Lin and Wu, 2009) 404 between the phrase P and the feature F. [sent-51, score-0.075]

34 The distributional similarity score between two phrases is the cosine similarity between the contextual vectors of the two phrases. [sent-52, score-0.088]

35 The lists of most distributionally similar phrases of a phrase P are thus compiled offline, by ranking the similar phrases of P in decreasing order of their similarity score relative to P. [sent-53, score-0.252]

36 Each class label from the initial vocabulary is expanded into a set of generated, candidate class labels. [sent-54, score-1.222]

37 To this effect, every ngram P within a given class label is replaced with each of the distributionally similar phrases, if any, available for the ngram. [sent-55, score-0.706]

38 As shown later in the experimental section, the expansion can increase the vocabulary by a factor of 100. [sent-56, score-0.089]

39 Approximate Syntactic Filtering: The set of generated class labels is noisy. [sent-57, score-0.815]

40 The set is filtered, by retaining only class labels whose syntactic structure matches the syntactic structure of some class label(s) from the initial vocabulary. [sent-58, score-1.446]

41 The syntactic structure is loosely approximated at surface rather than syn- tactic level. [sent-59, score-0.068]

42 A generated class label is retained, if its sequence of part of speech tags matches the sequence of part of speech tags of one of the class labels from the initial vocabulary. [sent-60, score-1.489]

43 Query Filtering: Generated class labels that pass previous filters are further restricted. [sent-65, score-0.783]

44 They are intersected with the set of arbitrary Web search queries available as input. [sent-66, score-0.313]

45 Generated class labels that are not full queries are discarded. [sent-67, score-0.955]

46 2 Extraction of Instances Overview: Our method mines instances of finegrained class labels from queries. [sent-69, score-0.945]

47 In a nutshell, it identifies queries containing two types of information simultaneously. [sent-70, score-0.172]

48 First, the queries contain an instance (marvin gaye) ofthe more general class labels (“musicians”) from which the fine-grained class labels ( “musicians who have been shot”) can be obtained. [sent-71, score-1.738]

49 Second, the queries contain the constraints added by the fine-grained class labels ( “. [sent-72, score-0.955]

50 Instances of General Class Labels: Following (Ponzetto and Strube, 2007), the Wikipedia category network is refined into a hierarchy that discards non-IsA (thematic) edges, and retains only IsA (subsumption) edges from the network (Ponzetto and Strube, 2007). [sent-76, score-0.077]

51 , titles of Wikipedia articles, are propagated upwards to all their ancestor categories. [sent-79, score-0.086]

52 The class label “musicians ” would be mapped into madonna, marvin gaye, jon bon jovi etc. [sent-80, score-0.688]

53 The mappings from each ancestor category, to all its descendant instances in the Wikipedia hierarchy, represent our mappings from more general class labels to instances. [sent-81, score-1.052]

54 Decomposition of Fine-Grained Class Labels: A fine-grained class label (e. [sent-82, score-0.566]

55 , “musicians who have been shot”) is effectively decomposed into pairs of two pieces of information. [sent-84, score-0.078]

56 The first piece is a more general class label ( “musicians”), if any occurs in it. [sent-85, score-0.631]

57 The second piece is a bag of words, collected from the remainder of the fine-grained class label after discarding stop words. [sent-86, score-0.7]

58 Note that the standard set of stop words is augmented with auxiliary verbs (e. [sent-87, score-0.03]

59 In the first piece of each pair, the general class label is then replaced with each of its instances. [sent-93, score-0.659]

60 This produces multiple pairs of a candidate instance and a bag of words, for each fine-grained class label. [sent-94, score-0.526]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('class', 0.487), ('companies', 0.434), ('labels', 0.296), ('musicians', 0.256), ('software', 0.204), ('queries', 0.172), ('shot', 0.136), ('garage', 0.128), ('taxation', 0.128), ('distributionally', 0.112), ('started', 0.1), ('kingdom', 0.089), ('countries', 0.089), ('agreements', 0.089), ('vocabulary', 0.089), ('instances', 0.087), ('gaye', 0.085), ('marvin', 0.085), ('india', 0.085), ('initial', 0.08), ('label', 0.079), ('double', 0.075), ('pas', 0.075), ('license', 0.068), ('linux', 0.068), ('piece', 0.065), ('restricts', 0.063), ('etzioni', 0.061), ('web', 0.061), ('arbitrary', 0.061), ('security', 0.06), ('format', 0.059), ('kozareva', 0.057), ('pantel', 0.055), ('front', 0.054), ('ancestor', 0.054), ('ponzetto', 0.052), ('durme', 0.052), ('united', 0.052), ('phrases', 0.051), ('decomposed', 0.05), ('wikipedia', 0.049), ('strube', 0.048), ('mappings', 0.047), ('plate', 0.047), ('search', 0.043), ('extraction', 0.042), ('finegrained', 0.041), ('wu', 0.041), ('hierarchy', 0.04), ('bag', 0.039), ('ca', 0.039), ('exhibit', 0.038), ('singapore', 0.038), ('phrase', 0.038), ('assemble', 0.037), ('remote', 0.037), ('discards', 0.037), ('intersected', 0.037), ('concisely', 0.037), ('gaps', 0.037), ('import', 0.037), ('mars', 0.037), ('bon', 0.037), ('automobiles', 0.037), ('nutshell', 0.037), ('replacements', 0.037), ('schools', 0.037), ('contextual', 0.037), ('lin', 0.037), ('hovy', 0.037), ('syntactic', 0.034), ('prerequisite', 0.034), ('birds', 0.034), ('mines', 0.034), ('mexico', 0.034), ('mountain', 0.034), ('descendant', 0.034), ('lary', 0.034), ('tactic', 0.034), ('representative', 0.034), ('generated', 0.032), ('rigid', 0.032), ('upwards', 0.032), ('sap', 0.032), ('acquires', 0.032), ('become', 0.03), ('concepts', 0.03), ('stop', 0.03), ('virtually', 0.03), ('isa', 0.03), ('encoded', 0.029), ('private', 0.028), ('offline', 0.028), ('preceded', 0.028), ('pieces', 0.028), ('replaced', 0.028), ('matches', 0.028), ('marius', 0.027), ('somehow', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries

Author: Marius Pasca

Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.

2 0.091314055 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

Author: Ahmed Hassan

Abstract: Web search users frequently modify their queries in hope of receiving better results. This process is referred to as “Query Reformulation”. Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. However, this work has been limited to bag-of-words models where the main signals being used are word overlap, character level edit distance and word level edit distance. In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts.

3 0.080484465 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

Author: Joern Wuebker ; Stephan Peitz ; Felix Rietig ; Hermann Ney

Abstract: Automatically clustering words from a monolingual or bilingual training corpus into classes is a widely used technique in statistical natural language processing. We present a very simple and easy to implement method for using these word classes to improve translation quality. It can be applied across different machine translation paradigms and with arbitrary types of models. We show its efficacy on a small German→English and a larger F ornenc ah s→mGalelrm Gaenrm mtarann→slEatniognli tsahsk a nwdit ha lbaortghe rst Farnednacrhd→ phrase-based salandti nhie traaskrch wiciathl phrase-based translation systems for a common set of models. Our results show that with word class models, the baseline can be improved by up to 1.4% BLEU and 1.0% TER on the French→German task and 0.3% BLEU aonnd t h1e .1 F%re nTcEhR→ on tehrem German→English Btask.

4 0.07002534 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries

Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen

Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1

5 0.042328697 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues

Author: Matt Gardner ; Partha Pratim Talukdar ; Bryan Kisiel ; Tom Mitchell

Abstract: Automatically constructed Knowledge Bases (KBs) are often incomplete and there is a genuine need to improve their coverage. Path Ranking Algorithm (PRA) is a recently proposed method which aims to improve KB coverage by performing inference directly over the KB graph. For the first time, we demonstrate that addition of edges labeled with latent features mined from a large dependency parsed corpus of 500 million Web documents can significantly outperform previous PRAbased approaches on the KB inference task. We present extensive experimental results validating this finding. The resources presented in this paper are publicly available.

6 0.039886843 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

7 0.036385298 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

8 0.035593294 160 emnlp-2013-Relational Inference for Wikification

9 0.034727715 58 emnlp-2013-Dependency Language Models for Sentence Completion

10 0.034556892 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

11 0.034207039 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

12 0.0341678 117 emnlp-2013-Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction

13 0.033635899 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

14 0.03291285 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

15 0.032835156 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

16 0.032205179 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

17 0.032054592 95 emnlp-2013-Identifying Multiple Userids of the Same Author

18 0.031506401 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

19 0.030940974 110 emnlp-2013-Joint Bootstrapping of Corpus Annotations and Entity Types

20 0.030896628 177 emnlp-2013-Studying the Recursive Behaviour of Adjectival Modification with Compositional Distributional Semantics


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.122), (1, 0.014), (2, -0.017), (3, -0.013), (4, 0.011), (5, 0.037), (6, 0.013), (7, 0.066), (8, 0.017), (9, -0.026), (10, -0.019), (11, 0.068), (12, -0.065), (13, -0.044), (14, 0.031), (15, -0.038), (16, -0.086), (17, -0.077), (18, 0.077), (19, 0.048), (20, 0.086), (21, -0.034), (22, -0.056), (23, 0.038), (24, -0.104), (25, 0.022), (26, 0.049), (27, 0.033), (28, -0.048), (29, 0.02), (30, -0.034), (31, 0.063), (32, 0.016), (33, 0.041), (34, 0.085), (35, 0.024), (36, 0.103), (37, 0.036), (38, 0.058), (39, -0.001), (40, -0.024), (41, -0.006), (42, 0.099), (43, -0.035), (44, -0.037), (45, 0.016), (46, -0.193), (47, -0.109), (48, -0.197), (49, 0.077)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99004877 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries

Author: Marius Pasca

Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.

2 0.58700532 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

Author: Ahmed Hassan

Abstract: Web search users frequently modify their queries in hope of receiving better results. This process is referred to as “Query Reformulation”. Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. However, this work has been limited to bag-of-words models where the main signals being used are word overlap, character level edit distance and word level edit distance. In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts.

3 0.57633507 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries

Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen

Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1

4 0.50490963 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

Author: Jinpeng Wang ; Wayne Xin Zhao ; Haitian Wei ; Hongfei Yan ; Xiaoming Li

Abstract: Hot trends are likely to bring new business opportunities. For example, “Air Pollution” might lead to a significant increase of the sales of related products, e.g., mouth mask. For ecommerce companies, it is very important to make rapid and correct response to these hot trends in order to improve product sales. In this paper, we take the initiative to study the task of how to identify trend related products. The major novelty of our work is that we automatically learn commercial intents revealed from microblogs. We carefully construct a data collection for this task and present quite a few insightful findings. In order to solve this problem, we further propose a graph based method, which jointly models relevance and associativity. We perform extensive experiments and the results showed that our methods are very effective.

5 0.46236679 182 emnlp-2013-The Topology of Semantic Knowledge

Author: Jimmy Dubuisson ; Jean-Pierre Eckmann ; Christian Scheible ; Hinrich Schutze

Abstract: Studies of the graph of dictionary definitions (DD) (Picard et al., 2009; Levary et al., 2012) have revealed strong semantic coherence of local topological structures. The techniques used in these papers are simple and the main results are found by understanding the structure of cycles in the directed graph (where words point to definitions). Based on our earlier work (Levary et al., 2012), we study a different class of word definitions, namely those of the Free Association (FA) dataset (Nelson et al., 2004). These are responses by subjects to a cue word, which are then summarized by a directed, free association graph. We find that the structure of this network is quite different from both the Wordnet and the dictionary networks. This difference can be explained by the very nature of free association as compared to the more “logical” construction of dictionaries. It thus sheds some (quantitative) light on the psychology of free association. In NLP, semantic groups or clusters are interesting for various applications such as word sense disambiguation. The FA graph is tighter than the DD graph, because of the large number of triangles. This also makes drift of meaning quite measurable so that FA graphs provide a quantitative measure of the semantic coherence of small groups of words.

6 0.39950371 102 emnlp-2013-Improving Learning and Inference in a Large Knowledge-Base using Latent Syntactic Cues

7 0.39475489 184 emnlp-2013-This Text Has the Scent of Starbucks: A Laplacian Structured Sparsity Model for Computational Branding Analytics

8 0.36919323 23 emnlp-2013-Animacy Detection with Voting Models

9 0.36567742 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

10 0.36231571 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

11 0.31330726 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

12 0.30683002 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

13 0.29397789 117 emnlp-2013-Latent Anaphora Resolution for Cross-Lingual Pronoun Prediction

14 0.29380587 43 emnlp-2013-Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set

15 0.29213604 144 emnlp-2013-Opinion Mining in Newspaper Articles by Entropy-Based Word Connections

16 0.28844759 156 emnlp-2013-Recurrent Continuous Translation Models

17 0.2784251 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data

18 0.27699536 95 emnlp-2013-Identifying Multiple Userids of the Same Author

19 0.26985025 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

20 0.2664769 91 emnlp-2013-Grounding Strategic Conversation: Using Negotiation Dialogues to Predict Trades in a Win-Lose Game


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.026), (18, 0.06), (22, 0.038), (30, 0.093), (45, 0.012), (50, 0.016), (51, 0.144), (55, 0.39), (66, 0.022), (71, 0.02), (75, 0.041), (77, 0.015), (90, 0.01), (96, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74237752 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries

Author: Marius Pasca

Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.

2 0.72254217 156 emnlp-2013-Recurrent Continuous Translation Models

Author: Nal Kalchbrenner ; Phil Blunsom

Abstract: We introduce a class of probabilistic continuous translation models called Recurrent Continuous Translation Models that are purely based on continuous representations for words, phrases and sentences and do not rely on alignments or phrasal translation units. The models have a generation and a conditioning aspect. The generation of the translation is modelled with a target Recurrent Language Model, whereas the conditioning on the source sentence is modelled with a Convolutional Sentence Model. Through various experiments, we show first that our models obtain a perplexity with respect to gold translations that is > 43% lower than that of stateof-the-art alignment-based translation models. Secondly, we show that they are remarkably sensitive to the word order, syntax, and meaning of the source sentence despite lacking alignments. Finally we show that they match a state-of-the-art system when rescoring n-best lists of translations.

3 0.63482028 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang

Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.

4 0.43819121 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

5 0.43500596 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou

Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1

6 0.43274248 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

7 0.4313347 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

8 0.4309682 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

9 0.43034548 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

10 0.43007809 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

11 0.42956457 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

12 0.42933843 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

13 0.42928147 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

14 0.42797369 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

15 0.42700836 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

16 0.42672279 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

17 0.42655021 143 emnlp-2013-Open Domain Targeted Sentiment

18 0.42642805 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

19 0.42638558 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

20 0.4261196 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors