acl acl2010 acl2010-245 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. [sent-2, score-1.071]
2 In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. [sent-3, score-1.277]
3 We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance. [sent-5, score-0.377]
4 1 Introduction Web queries can be considered as implicit questions or commands, in that they are performed either to find information on the web or to initiate interaction with web services. [sent-6, score-0.424]
5 Web users, however, rarely express their intent in full language. [sent-7, score-0.414]
6 For example, to find out “what are the movies of 2010 in which johnny depp stars”, a user may simply query “johnny depp movies 2010”. [sent-8, score-0.681]
7 As search engine technologies evolve, it is increasingly believed that search will be shifting away from “ten blue links” toward understanding intent and serving objects. [sent-10, score-0.472]
8 Searching over such data sources, in many cases, can offer more relevant and essential results com- pared with merely returning web pages that contain query keywords. [sent-12, score-0.463]
9 Table 1 shows a simplified view of a structured data source, where each row represents a movie object. [sent-13, score-0.254]
10 nTshtrias wntosul Yde adrel =iver 2 0di1r0ec at answers t 3o the query rather than having the user sort through list of keyword results. [sent-16, score-0.389]
11 In no small part, the success of such an approach relies on robust understanding of query intent. [sent-17, score-0.417]
12 Most previous works in this area focus on query intent classification (Shen et al. [sent-18, score-0.803]
13 Indeed, the intent class information is crucial in determining if a query can be answered by any structured data sources and, if so, by which one. [sent-22, score-0.854]
14 , individual constituents of a query and their semantic roles. [sent-25, score-0.488]
15 A key contribution of this work is that we formally define query se- mantic structure as comprised of intent heads (IH) and intent modifiers (IM), e. [sent-27, score-1.228]
16 Identifying the semantic structure of queries can be beneficial to information retrieval. [sent-31, score-0.341]
17 Knowing the semantic role of each query constituent, we 1337 ProceedingUsp opfs thaela 4, 8Stwhe Adnennu,a 1l1- M16ee Jtiunlgy o 2f0 t1h0e. [sent-32, score-0.452]
18 can reformulate the query into a structured form or reweight different query constituents for structured data retrieval (Robertson et al. [sent-50, score-0.86]
19 The semantic features can be constructed from structured data sources or by mining query logs, while the syntactic features can be obtained by readily-available syntactic analysis tools. [sent-60, score-0.767]
20 Finally, we evaluate our proposed models and features on manuallyannotated query sets from three domains, while our techniques are general enough to be applied to many other domains. [sent-63, score-0.447]
21 1 Query intent understanding As mentioned in the introduction, previous works on query intent understanding have largely focused on classification, i. [sent-65, score-1.333]
22 , automatically mapping queries into semantic classes (Shen et al. [sent-67, score-0.309]
23 There are relatively few published works on understanding the semantic structure of web queries. [sent-71, score-0.285]
24 The most relevant ones are on the problem of query tagging, i. [sent-72, score-0.359]
25 , assigning semantic labels to query terms (Li et al. [sent-74, score-0.5]
26 Their methods aim at extracting attribute names, such as cost and side effect for the concept Drug, from documents and query logs in a weakly-supervised learning framework. [sent-83, score-0.549]
27 When used in the context of web queries, attribute names usually serve as IHs. [sent-84, score-0.285]
28 In fact, one immediate application of their research is to understand web queries that request factual information of some concepts, e. [sent-85, score-0.32]
29 2 Question answering Query intent understanding is analogous to question understanding for question answering (QA) systems. [sent-90, score-0.53]
30 Many web queries can be viewed as the keyword-based counterparts of natural language questions. [sent-91, score-0.32]
31 For example, the query “california national” and “national parks califorina” both imply the question “What are the national parks in California? [sent-92, score-0.455]
32 In this 1338 work, we propose to use POS tagging and parsing outputs as features, in addition to other features, in extracting the semantic structure of web queries. [sent-100, score-0.228]
33 While some IM categories belong to named entities such as IM:Director for the intent class Movie, there can be semantic labels that are not named entities such as IH and IM:Genre (again for Movie). [sent-105, score-0.597]
34 3 Query Semantic Structure Unlike database query languages such as SQL, web queries are usually formulated as sequences of words without explicit structures. [sent-106, score-0.709]
35 This makes web queries difficult to interpret by computers. [sent-107, score-0.32]
36 For example, should the query “aspirin side effect” be interpreted as “the side effect of aspirin” or “the aspirin of side effect”? [sent-108, score-0.435]
37 1 Definition We let C denote a set of query intent classes that represent sdeenmoatenti ac concepts sruyc hin as Movie, s Pr tohda-t uct and Drug. [sent-111, score-0.773]
38 The query constituents introduced below are all defined w. [sent-112, score-0.395]
39 the intent class of a query, c ∈ C, which is assumed to be known. [sent-115, score-0.456]
40 Intent head An intent head (IH) is a query segment that corresponds to an attribute name of an intent class. [sent-116, score-1.619]
41 For example, the IH of the query “alice in wonderland 2010 cast” is “cast”, which is an attribute name of Movie. [sent-117, score-0.646]
42 For example, “movie avatar” does not contain any segment that corresponds to an attribute name of Movie. [sent-125, score-0.32]
43 Such a query, however, does have an implicit intent which is to obtain general information about the movie. [sent-126, score-0.414]
44 Intent modifier In contrast, an intent modifier (IM) is a query segment that corresponds to an attribute value (of some attribute name). [sent-127, score-1.179]
45 The role of IMs is to imposing constraints on the attributes of an intent class. [sent-128, score-0.414]
46 For example, there are two constraints implied in the query “alice in wonderland 2010 cast”: (1) the Title of the movie is “alice in wonderland”; and (2) the Year of the movie is “2010”. [sent-129, score-0.922]
47 Other Additionally, there can be query segments that do not play any semantic roles, which we refer to as Other. [sent-138, score-0.548]
48 In many cases, the IHs of noun phrase queries are exactly the head nouns in the linguistic sense. [sent-141, score-0.371]
49 Due to the strong resemblance, it is interesting to see if IHs can be identified by extracting linguistic head nouns from queries based on syntactic analysis. [sent-145, score-0.353]
50 , 2008), the lexical category distribution of web queries is dramatically different from that of natural languages. [sent-152, score-0.32]
51 Moreover, unlike most natural languages that follow the linear-order principle, web queries can have relatively free word orders (although some orders may occur more often than others statistically). [sent-154, score-0.32]
52 Although a POS tagger and a chunker may not work well on queries, their output can be used as features for learning statistical models for semantic structure extraction, which we introduce next. [sent-158, score-0.233]
53 4 Models This section presents two statistical models for semantic understanding of noun phrase queries. [sent-159, score-0.25]
54 Assuming that the intent class c ∈ C of a query is known, we tc tahset t ihnete problem cof ∈ extracting utherey semantic structure of the query into a joint segmen- tation/classification problem. [sent-160, score-1.298]
55 At a high level, we would like to identify query segments that correspond to IHs, IMs and Others. [sent-161, score-0.455]
56 , sN) denotes a query segmentation as well as a classification of all segments. [sent-172, score-0.359]
57 Each segment sj is represented by a tuple (uj , vj , yj). [sent-173, score-0.276]
58 Here uj and vj are the indices of the starting and ending word tokens respectively; yj ∈ Y is a label indicating the semantic role of s. [sent-174, score-0.307]
59 For notional simplicity, we assume that the intent class is given and use p(s|x) as a shorthand for p(s|c, x), but keep in umsiend p tsh|xat) )t ahes ala sbheol space afondr ph(esn|cce,x th),e b parameter space is class-dependent. [sent-176, score-0.456]
60 1 CRFs One natural approach to extracting the semantic structure of queries is to use linear-chain CRFs (Lafferty et al. [sent-179, score-0.34]
61 By doing this, we implicitly assume that there are no two adjacent segments with the same label in the true segment sequence. [sent-192, score-0.305]
62 , p(s|x) =Zλ1(x)expNjX=+11λ · f(sj−1,sj,x) (3) In this case, the features f(sj−1, sj , x) are defined on segments instead of on word tokens. [sent-197, score-0.226]
63 5 Features In this work, we explore the use of transition, lexical, semantic and syntactic features in Markov and semi-Markov CRFs. [sent-203, score-0.231]
64 Since feature functions are defined on segments in semi-Markov CRFs, we create B2 that indicates whether the phrase in a hypothesized query segment co-occurs with a state label. [sent-215, score-0.724]
65 Here the set of phrase identities are extracted from the query segments in the training data. [sent-216, score-0.541]
66 Furthermore, we create another type of lexical feature, B3, which is activated when a specific word occurs in a hypothesized query segment. [sent-217, score-0.388]
67 Before describing how such lexicons are generated for our task, we first introduce the forms of the semantic features assuming the availability of the lexicons. [sent-224, score-0.274]
68 For both A3 and A4, the number of such semantic features is equal to the number of lexicons multiplied by the number of state labels. [sent-228, score-0.303]
69 One set of semantic features (B4) inspect whether the phrase of a hypothesized query segment matches any element in a given lexicon. [sent-230, score-0.78]
70 A second set of semantic features (B5) relax the exact match constraints made by B4, and take as the feature value the maximum “similarity” between the query segment and all lexicon elements. [sent-231, score-0.764]
71 :vj − :vj Lexicon generation To create the semantic features described above, we generate two types of lexicons leveraging databases and query logs for each intent class. [sent-235, score-1.127]
72 The first type of lexicon is an IH lexicon comprised of a list of attribute names for the intent class, e. [sent-236, score-0.752]
73 , “box office” and “review” for the intent class Movie. [sent-238, score-0.456]
74 Inspired by Pasca and Van Durme (2007), we apply a bootstrapping algorithm that automatically learns attribute names for an intent class from query logs. [sent-242, score-0.996]
75 The key difference from their work is that we create templates that consist of semantic labels at the segment level from training data. [sent-243, score-0.307]
76 We select the most frequent templates (top 2 in this work) from training data and use them to discover new IH phrases from the query log. [sent-245, score-0.359]
77 Secondly, we have a set IM lexicons, each comprised of a list of attribute values of an attribute name in Ac. [sent-246, score-0.315]
78 For example, the lexicon for IM:Title (in Movie) is a list of movie titles generated by aggregating the values in the Title column of a movie database. [sent-248, score-0.518]
79 2, web queries often lack syntactic cues and do not necessarily follow the linear order principle. [sent-255, score-0.37]
80 To this end, we generate a sequence of POS tags for a given query, and use the co-occurrence of POS tag identities and state labels as syntactic features (A5) for CRFs. [sent-259, score-0.289]
81 For semi-Markov CRFs, we instead examine the POS tag sequence of the corresponding phrase in a query segment. [sent-260, score-0.437]
82 Again their identities are combined with state labels to create syntactic features B6. [sent-261, score-0.256]
83 To be precise, if a query segment is determined by the chunker to be a chunk, we use the indicator of the phrase type of the chunk (e. [sent-263, score-0.653]
84 Such features are not activated if a query segment is determined not to be a chunk. [sent-266, score-0.613]
85 The annotation was given on both segmentation of the queries and classification of the segments according to the label sets defined in Table 3. [sent-269, score-0.355]
86 Specifically, let us assume that the true segment sequence of a query is s = (s1, s2, . [sent-276, score-0.558]
87 , sN), and the decoded segment sequence is s0 = (s01 , s02 , . [sent-279, score-0.236]
88 We then experiment with lexical features, semantic features and syntactic features for both models. [sent-290, score-0.319]
89 Lexical features The first experiment we did is to evaluate the performance of lexical features (combined with transition features). [sent-292, score-0.256]
90 , indicators of whether a query segment contains a word identity, gave an absolute 7. [sent-296, score-0.525]
91 We do not have an IH lexicon for Job as the attribute names in that domain are much fewer and are well covered by training set examples. [sent-309, score-0.239]
92 Syntactic features As shown in the row A1-A5 in Table 4, combined with all other features, the syntactic features (A5) built upon POS tags boosted the CRF model performance. [sent-312, score-0.226]
93 Transition features In addition, it is interesting to see the importance of transition features in both models. [sent-343, score-0.256]
94 Since web queries do not generally follow the linear order principle, is it helpful to incorporate transition features in learning? [sent-344, score-0.488]
95 One intuitive explanation is that although web queries are relatively “order-free”, statistically speaking, some orders are much more likely to occur than others. [sent-347, score-0.32]
96 Experiments show the effectiveness of semantic features and syntactic fea- tures in both Markov and semi-Markov CRF models. [sent-361, score-0.231]
97 In the future, it would be useful to explore other approaches to automatic lexicon discovery to improve the quality or to increase the coverage of both IH and IM lexicons, and to systematically evaluate their impact on query understanding performance. [sent-362, score-0.475]
98 Extracting structured information from user queries with semi-supervised conditional random fields. [sent-404, score-0.317]
99 What you seek is what you get: Extraction of class attributes from query logs. [sent-428, score-0.401]
100 Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs. [sent-432, score-0.505]
wordName wordTfidf (topN-words)
[('intent', 0.414), ('query', 0.359), ('im', 0.234), ('ih', 0.228), ('ihs', 0.216), ('queries', 0.216), ('movie', 0.215), ('crfs', 0.212), ('segment', 0.166), ('wonderland', 0.133), ('ims', 0.123), ('attribute', 0.12), ('alice', 0.116), ('cast', 0.105), ('web', 0.104), ('johnny', 0.1), ('segments', 0.096), ('lexicons', 0.093), ('semantic', 0.093), ('features', 0.088), ('transition', 0.08), ('aspirin', 0.076), ('title', 0.074), ('avatar', 0.071), ('sigir', 0.068), ('vj', 0.068), ('job', 0.065), ('names', 0.061), ('lexicon', 0.058), ('understanding', 0.058), ('pasca', 0.058), ('arguello', 0.057), ('depp', 0.057), ('moviejobnational', 0.057), ('uj', 0.057), ('head', 0.056), ('year', 0.056), ('li', 0.054), ('director', 0.054), ('noun', 0.054), ('chunker', 0.052), ('syntactic', 0.05), ('semimarkov', 0.05), ('labels', 0.048), ('xiao', 0.048), ('yj', 0.046), ('crf', 0.046), ('manshadi', 0.046), ('sarawagi', 0.046), ('phrase', 0.045), ('pos', 0.044), ('label', 0.043), ('sj', 0.042), ('class', 0.042), ('comprised', 0.041), ('databases', 0.041), ('identities', 0.041), ('structured', 0.039), ('durme', 0.039), ('logs', 0.039), ('movies', 0.039), ('park', 0.038), ('adventure', 0.038), ('paparizos', 0.038), ('parkaverage', 0.038), ('xuj', 0.038), ('decoded', 0.037), ('constituents', 0.036), ('name', 0.034), ('markov', 0.033), ('pennacchiotti', 0.033), ('sci', 0.033), ('canon', 0.033), ('prec', 0.033), ('employee', 0.033), ('parks', 0.033), ('unlikely', 0.033), ('sequence', 0.033), ('bruce', 0.032), ('conditional', 0.032), ('beneficial', 0.032), ('chunk', 0.031), ('extracting', 0.031), ('rb', 0.03), ('intends', 0.03), ('subordinating', 0.03), ('aggregating', 0.03), ('national', 0.03), ('database', 0.03), ('user', 0.03), ('works', 0.03), ('state', 0.029), ('hypothesized', 0.029), ('robertson', 0.029), ('lev', 0.029), ('barr', 0.029), ('retrieval', 0.028), ('lafferty', 0.028), ('gao', 0.028), ('yi', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
2 0.21358632 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
Author: Xu Sun ; Jianfeng Gao ; Daniel Micol ; Chris Quirk
Abstract: This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms cantly its baseline systems. 1 signifi-
3 0.15879379 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
4 0.12974977 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
5 0.12703319 197 acl-2010-Practical Very Large Scale CRFs
Author: Thomas Lavergne ; Olivier Cappe ; Francois Yvon
Abstract: Conditional Random Fields (CRFs) are a widely-used approach for supervised sequence labelling, notably due to their ability to handle large description spaces and to integrate structural dependency between labels. Even for the simple linearchain model, taking structure into account implies a number of parameters and a computational effort that grows quadratically with the cardinality of the label set. In this paper, we address the issue of training very large CRFs, containing up to hun- dreds output labels and several billion features. Efficiency stems here from the sparsity induced by the use of a ‘1 penalty term. Based on our own implementation, we compare three recent proposals for implementing this regularization strategy. Our experiments demonstrate that very large CRFs can be trained efficiently and that very large models are able to improve the accuracy, while delivering compact parameter sets.
6 0.12563699 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
7 0.11638124 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
8 0.11247898 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
9 0.096796624 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text
10 0.086363308 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network
11 0.085211553 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
12 0.08037515 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
13 0.079631366 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
14 0.077458732 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
15 0.07356292 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
16 0.07339523 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
17 0.073194817 85 acl-2010-Detecting Experiences from Weblogs
18 0.072664514 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
19 0.07252492 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
20 0.071574949 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
topicId topicWeight
[(0, -0.218), (1, 0.088), (2, -0.069), (3, 0.048), (4, 0.016), (5, -0.034), (6, -0.028), (7, 0.043), (8, 0.03), (9, 0.036), (10, -0.052), (11, -0.007), (12, -0.064), (13, -0.165), (14, -0.006), (15, 0.151), (16, -0.098), (17, 0.059), (18, -0.008), (19, 0.017), (20, 0.057), (21, -0.184), (22, 0.029), (23, 0.087), (24, 0.079), (25, -0.004), (26, 0.19), (27, -0.09), (28, -0.035), (29, 0.049), (30, 0.061), (31, -0.1), (32, -0.12), (33, -0.197), (34, 0.148), (35, -0.017), (36, -0.125), (37, -0.124), (38, -0.029), (39, 0.149), (40, 0.1), (41, 0.085), (42, 0.004), (43, 0.009), (44, 0.101), (45, 0.09), (46, -0.036), (47, -0.004), (48, 0.065), (49, 0.009)]
simIndex simValue paperId paperTitle
same-paper 1 0.9426586 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
2 0.85910451 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
Author: Xu Sun ; Jianfeng Gao ; Daniel Micol ; Chris Quirk
Abstract: This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms cantly its baseline systems. 1 signifi-
3 0.71744877 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
Author: Manoj Kumar Chinnakotla ; Karthik Raman ; Pushpak Bhattacharyya
Abstract: In a previous work of ours Chinnakotla et al. (2010) we introduced a novel framework for Pseudo-Relevance Feedback (PRF) called MultiPRF. Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. MulitiPRF showed remarkable improvement over plain Model Based Feedback (MBF) uniformly for 4 languages, viz., French, German, Hungarian and Finnish with English as the assisting language. This fact inspired us to study the effect of any source-assistant pair on MultiPRF performance from out of a set of languages with widely different characteristics, viz., Dutch, English, Finnish, French, German and Spanish. Carrying this further, we looked into the effect of using two assisting languages together on PRF. The present paper is a report of these investigations, their results and conclusions drawn therefrom. While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e.g., French and Spanish.
4 0.60282677 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
Author: Tingxu Yan ; Tamsin Maxwell ; Dawei Song ; Yuexian Hou ; Peng Zhang
Abstract: p . zhang1 @ rgu .ac .uk Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone.
5 0.56023115 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
6 0.44917914 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
7 0.44505271 111 acl-2010-Extracting Sequences from the Web
8 0.44130316 197 acl-2010-Practical Very Large Scale CRFs
9 0.42451569 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
10 0.42056414 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
11 0.39982003 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
12 0.38777775 159 acl-2010-Learning 5000 Relational Extractors
13 0.38035181 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
14 0.38007331 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
15 0.37961179 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
16 0.36113948 185 acl-2010-Open Information Extraction Using Wikipedia
17 0.33666575 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling
18 0.33504477 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
19 0.3333194 259 acl-2010-WebLicht: Web-Based LRT Services for German
20 0.33121315 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
topicId topicWeight
[(14, 0.025), (25, 0.071), (39, 0.018), (42, 0.035), (47, 0.235), (54, 0.012), (59, 0.105), (72, 0.027), (73, 0.051), (78, 0.035), (83, 0.092), (84, 0.049), (98, 0.151)]
simIndex simValue paperId paperTitle
1 0.96351624 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars
Author: Julia Hockenmaier ; Yusuke Miyao ; Josef van Genabith
Abstract: unkown-abstract
same-paper 2 0.82654023 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
3 0.69055188 130 acl-2010-Hard Constraints for Grammatical Function Labelling
Author: Wolfgang Seeker ; Ines Rehbein ; Jonas Kuhn ; Josef Van Genabith
Abstract: For languages with (semi-) free word order (such as German), labelling grammatical functions on top of phrase-structural constituent analyses is crucial for making them interpretable. Unfortunately, most statistical classifiers consider only local information for function labelling and fail to capture important restrictions on the distribution of core argument functions such as subject, object etc., namely that there is at most one subject (etc.) per clause. We augment a statistical classifier with an integer linear program imposing hard linguistic constraints on the solution space output by the classifier, capturing global distributional restrictions. We show that this improves labelling quality, in particular for argument grammatical functions, in an intrinsic evaluation, and, importantly, grammar coverage for treebankbased (Lexical-Functional) grammar acquisition and parsing, in an extrinsic evaluation.
4 0.68642771 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
5 0.68333352 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
Author: Xianpei Han ; Jun Zhao
Abstract: Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7% and 14.7%. 1
6 0.68150544 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
7 0.68057537 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
8 0.68056178 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
9 0.68038762 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
10 0.68010318 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
11 0.67966259 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
12 0.67815936 136 acl-2010-How Many Words Is a Picture Worth? Automatic Caption Generation for News Images
13 0.67777723 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
14 0.67763358 39 acl-2010-Automatic Generation of Story Highlights
15 0.67762977 169 acl-2010-Learning to Translate with Source and Target Syntax
16 0.67727154 162 acl-2010-Learning Common Grammar from Multilingual Corpus
17 0.6767413 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
18 0.67481691 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
19 0.67465913 71 acl-2010-Convolution Kernel over Packed Parse Forest
20 0.67452586 214 acl-2010-Sparsity in Dependency Grammar Induction