acl acl2010 acl2010-215 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language. [sent-7, score-0.527]
2 Search engines are unable to index this information and hence, unable to retrieve it for the user who may be searching for such information. [sent-15, score-0.289]
3 So, the only way for users to access this information is to find the appropriate web-form, fill in the necessary search parameters, and use it to query the database that contains the information that is being searched for. [sent-16, score-0.422]
4 The ubiquity of mobile devices has made information access an any time, any place activity. [sent-19, score-0.348]
5 However, information access using text input on mobile devices is te- dious and unnatural because of the limited screen space and the small (or soft) keyboards. [sent-20, score-0.348]
6 In addition, by the mobile nature of these devices, users often like to use them in hands-busy environments, ruling out the possibility of typing text. [sent-21, score-0.223]
7 Filling web-forms using the small screens and tiny keyboards of mobile devices is neither easy nor quick. [sent-22, score-0.285]
8 provides a unifed interface onn iPhone (shown in Figure 1) that can be used by users to search for static and dynamic questions. [sent-26, score-0.808]
9 Static questions are questions whose answers to these questions remain the same irrespective of when and where the questions are asked. [sent-27, score-1.416]
10 Examples of such questions are What is the speed of light? [sent-28, score-0.298]
11 For static questions, the system retrieves the answers from an archive of human generated answers to questions. [sent-31, score-0.888]
12 This ensures higher accuracy for the answers retrieved (if found in the archive) and also allows us to retrieve related questions on the user’s topic of interest. [sent-32, score-0.747]
13 Figure 1: Retrieval results for static and dynamic questions using Qme ! [sent-33, score-0.896]
14 Dynamic questions are questions whose answers depend on when and where they are asked. [sent-34, score-0.82]
15 Examples of such questions are What is the stock price of General Motors? [sent-35, score-0.298]
16 c 0120 S1y0s Atesmso Dcieamtio n s ftorart Cio nms,p puatagteiso 6n0a–l6 L5in,guistics The answers to dynamic questions are often part of the Deep Web. [sent-41, score-0.835]
17 Our system retrieves the answers to such dynamic questions by parsing the questions to retrieve pertinent search keywords, which are in turn used to query information databases accessible over the Internet using web forms. [sent-42, score-1.842]
18 However, the internal distinction between dynamic and static questions, and the subsequent differential treatment within the system is seamless to the user. [sent-43, score-0.628]
19 The user simply uses a single unified interface to ask a question and receive a collection of answers that potentially address her question directly. [sent-44, score-0.632]
20 In Section 3, we present bootstrap techniques to distinguish dynamic questions from static questions, and evaluate the efficacy of these techniques on a test corpus. [sent-47, score-0.896]
21 In Section 4, we show how our system retrieves answers to dynamic questions. [sent-48, score-0.659]
22 In Section 5, we show how our system retrieves answers to static questions. [sent-49, score-0.631]
23 The user of this application provides a spoken language query to a mobile device intending to find an answer to the question. [sent-55, score-0.605]
24 1 This textual output of recognition is then used to classify the user query either as a dynamic query or a static query. [sent-58, score-1.132]
25 If the user query is static, the result of the speech recognizer is used to search a large corpus of question-answer pairs to retrieve the relevant answers. [sent-59, score-0.542]
26 If the user query is dynamic, the answers are retrieved by querying a web form from the appropriate web site (e. [sent-62, score-0.8]
27 In Figure 1, we illustrate the answers that Qme ! [sent-66, score-0.224]
28 returns for static and dy1For this paper, the ASR used to recognize these utterances incorporates an acoustic model adapted to speech col- lected from mobile devices and a four-gram language model that is built from the corpus of questions. [sent-67, score-0.57]
29 1 Demonstration In the demonstration, we plan to show the users static and dynamic query handling on an iPhone using spoken language queries. [sent-70, score-0.9]
30 Users can use the iphone and speak their queries using an interface provided by Qme ! [sent-71, score-0.255]
31 3 Dynamic and Static Questions As mentioned in the introduction, dynamic questions require accessing the hidden web through a web form with the appropriate parameters. [sent-74, score-0.797]
32 Answers to dynamic questions cannot be preindexed as can be done for static questions. [sent-75, score-0.896]
33 In dynamic questions, there may be no explicit reference to time, unlike the questions in the TERQAS corpus (Radev and Sundheim. [sent-77, score-0.611]
34 The time-dependency of a dynamic question lies in the temporal nature of its answer. [sent-79, score-0.543]
35 For example, consider the question, What is the address of the theater White Christmas is playing at in New York? [sent-80, score-0.231]
36 If the question is asked in the summer, the answer will be “This play is not currently playing anywhere in NYC. [sent-84, score-0.331]
37 ” If the question is asked dur- ing December, 2009, the answer might be different than the answer given in December 2010, because the theater at which White Christmas is playing differs from 2009 to 2010. [sent-85, score-0.448]
38 , 2004), a method of improving QA accuracy by asking auxiliary questions related to the original question in order to temporally verify and restrict the original answer. [sent-90, score-0.451]
39 , 2009) use the temporal relations in a question to decompose it into simpler questions, the answers of which are recomposed to produce the answers to the original question. [sent-94, score-0.678]
40 1 Question Classification: Dynamic and Static Questions We automatically classify questions as dynamic and static questions. [sent-96, score-0.896]
41 The answers to static questions can be retrieved from the QA archive. [sent-97, score-0.866]
42 To answer dynamic questions, we query the database(s) associated with the topic of the question through web forms on the Internet. [sent-98, score-0.89]
43 We first use a topic classifier to detect the topic of a question followed by a dynamic/static classifier trained on questions related to a topic, as shown in Figure 3. [sent-99, score-0.785]
44 For the question what movies are playing around me? [sent-100, score-0.363]
45 , we detect it is a movie related dynamic question and query a movie information web site (e. [sent-101, score-0.882]
46 Our initial approach was to use such signal words and phrases to automatically identify dynamic questions. [sent-109, score-0.313]
47 We also included spatial indexicals, such as here and other clauses that were observed to be contained in dynamic questions such as cost of, and how much is in the list of signal phrases. [sent-111, score-0.667]
48 5% of our dataset which consisted of several million questions as dynamic. [sent-114, score-0.335]
49 The type of questions identified were What is playing in the movie theaters tonight? [sent-115, score-0.627]
50 This shows that the temporal and spatial indexicals encoded as a regular-expression based recognizer is unable to identify a large percentage of the dynamic questions. [sent-120, score-0.664]
51 This approach leaves out dynamic questions that do not contain temporal or spatial indexicals. [sent-121, score-0.784]
52 The information that is most likely being sought by this question is what is the name of the person who got voted off the TV show Survivor most recently, and not what is the name of the person (or persons) who have gotten voted off the Survivor at some point in the past. [sent-130, score-0.205]
53 Knowing the broad topic (such as movies, current affairs, and music) of the question may be very useful. [sent-131, score-0.336]
54 It is likely that there may be many dynamic questions about movies, sports, and finance, while history and geography may have few or none. [sent-132, score-0.611]
55 The questions in our dataset are annotated with a broad topic tag. [sent-134, score-0.558]
56 5% of our dataset identified as dynamic questions by their broad topic produced a long-tailed distribution. [sent-136, score-0.905]
57 Of the 104 broad topics, the top-5 topics contained over 50% of the dynamic questions. [sent-137, score-0.459]
58 Considering the issues laid out in the previous section, our classification approach is to chain two machine-learning-based classifiers: a topic classifier chained to a dynamic/static classifier, as shown in Figure 3. [sent-139, score-0.187]
59 In this architecture, we build one topic classifier, but several dynamic/static classifiers, each trained on data pertaining to one broad topic. [sent-140, score-0.223]
60 Figure 3: Chaining two classifiers We used supervised learning to train the topic 62 classifier, since our entire dataset is annotated by human experts with topic labels. [sent-141, score-0.312]
61 Baseline: We treat questions as dynamic if they contain temporal indexicals, e. [sent-143, score-0.728]
62 A question is considered static if it does not contain any such words/phrases. [sent-147, score-0.398]
63 1 Experiments and Results Topic Classification The topic classifier was trained using a training set consisting of over one million questions downloaded from the web which were manually labeled by human experts as part of answering the questions. [sent-160, score-0.578]
64 Word trigrams of the question are used as features for a MaxEnt classifier which outputs a score distribution on all of the 104 possible topic labels. [sent-162, score-0.3]
65 We evaluated these methods on a 250 question test set drawn from the broad topic of Movies. [sent-170, score-0.336]
66 0824%% Table 2: Best Results of dynamic/static classification 4 Retrieving answers to dynamic questions Following the classification step outlined in Section 3. [sent-177, score-0.835]
67 1, we know whether a user query is static or dynamic, and the broad category of the question. [sent-178, score-0.723]
68 If the question is dynamic, then our system performs a vertical search based on the broad topic of the question. [sent-179, score-0.502]
69 For each broad topic, we have identified a few trusted content aggregator websites. [sent-181, score-0.392]
70 For example, for dynamic questions related to Movies-related dynamic user queries, www . [sent-182, score-1.105]
71 Other such trusted content aggregator websites have been identified for Mass Transit related and for Yellowpages related dynamic user queries. [sent-185, score-0.726]
72 We have also identified the web-forms that can be used to search these aggregator sites and the search parameters that these web-forms need for searching. [sent-186, score-0.458]
73 So, given a user query, whose broad category has been determined and which has been classified as a dynamic query by the system, the next step is to parse the query to obtain pertinent search parameters. [sent-187, score-1.152]
74 The search parameters are dependent on the broad category of the question, the trusted content aggregator website(s), the web-forms associated with this category, and of course, the content 63 of the user query. [sent-188, score-0.632]
75 From the search parameters, a search query to the associated web-form is issued to search the related aggregator site. [sent-189, score-0.693]
76 , the pertinent search parameters that are parsed out are movie-name: Twilight, city: Madison, and state: New Jersey, which are used to build a search string that Fandango’s web-form can use to search the Fandango site. [sent-191, score-0.45]
77 , the pertinent search parameters that are parsed out are business-name: Saigon Kitchen, city: Austin, and state: Texas, which are used to construct a search string to search the Yellowpages website. [sent-193, score-0.45]
78 These are just two examples of the kinds of dynamic user queries that we encounter. [sent-194, score-0.534]
79 Within each broad category, there is a wide variety of the sub-types of user queries, and for each sub-type, we have to parse out different search parameters and use different web-forms. [sent-195, score-0.381]
80 It is quite likely that many of the dynamic queries may not have all the pertinent search parameters explicitly outlined. [sent-197, score-0.644]
81 For example, a mass transit query may be When is the next train to Princeton? [sent-198, score-0.312]
82 The bare minimum search parameters needed to answer this query are a from-location, and a to-location. [sent-200, score-0.401]
83 Other examples of dynamic queries in which search parameters are not explicit in the query, and hence, have to be deduced by the system, include queries such as Where is XMen playing? [sent-205, score-0.645]
84 Based on our understanding of natural language, in such a scenario, our system is built to assume that the user wants to find a movie theatre (or, is referring to a hardware store) near where he is currently located. [sent-209, score-0.365]
85 So, the system obtains the user’s location from the GPS sensor and uses it to search for a theatre (or locate the hardware store) within a five-mile radius of her location. [sent-210, score-0.306]
86 In the last few paragraphs, we have discussed how we search for answers to dynamic user queries from the hidden web by using web-forms. [sent-211, score-0.957]
87 So, we parse the HTML-encoded result pages to get just the answers to the user query and reformat it, to fit the Qme ! [sent-216, score-0.555]
88 2 5 Retrieving answers to static questions Answers to static user queries questions whose answers do not change over time are retrieved in a different way than answers to dynamic questions. [sent-218, score-2.431]
89 A description of how our system retrieves the answers to static questions is presented in this section. [sent-219, score-0.929]
90 1 Representing Search Index as an FST To obtain results for static user queries, we have implemented our own search engine using finite-state transducers (FST), in contrast to using Lucene (Hatcher and Gospodnetic. [sent-222, score-0.519]
91 , 2004) as it is a more efficient representation of the search index that allows us to consider word lattices output by ASR as input queries. [sent-223, score-0.216]
92 We index each question-answer (QA) pair from our repository ((qi, ai), qai for short) using the words (wqi) in question qi. [sent-225, score-0.191]
93 g old) is the input symbol for a set of arcs whose output symbol is the index of the QA pairs where old appears 2We are aware that we could use SOAP (Simple Object Access Protocol) encoding to do the search, however not all aggregator sites use SOAP yet. [sent-228, score-0.281]
94 In contrast to most text search engines, where stop words are removed from the query, we weight the query terms with their idfvalues which results in a weighted NgramFSA. [sent-236, score-0.309]
95 The NgramFSA is composed with the SearchFST and we obtain all the arcs (wq, qawq, c(wq,qawq)) where wq is a query term, qawq is a QA index with the query term and, c(wq,qawq) is the weight associated with that pair. [sent-237, score-0.573]
96 Using this information, we aggregate the weight for a QA pair (qaq) across all query words and rank the retrieved QAs in the descending order of this aggregated weight. [sent-238, score-0.262]
97 The query composition, QA weight aggregation and selection of top N QA pairs are computed with finite-state transducer operations as shown in Equations 1 and 23. [sent-240, score-0.203]
98 , a speech-driven question answering system for use on mobile devices. [sent-243, score-0.316]
99 The novelty of this system is that it provides users with a single unified interface for searching both the visible and the hidden web using the most natural input modality for use on mobile phones spoken language. [sent-244, score-0.449]
100 Enhancing qa systems with complex temporal question processing capabilities. [sent-325, score-0.363]
wordName wordTfidf (topN-words)
[('dynamic', 0.313), ('questions', 0.298), ('static', 0.285), ('answers', 0.224), ('qme', 0.215), ('query', 0.203), ('mobile', 0.173), ('aggregator', 0.172), ('playing', 0.166), ('qa', 0.133), ('user', 0.128), ('indexicals', 0.123), ('mishra', 0.123), ('temporal', 0.117), ('topic', 0.116), ('question', 0.113), ('devices', 0.112), ('iphone', 0.108), ('broad', 0.107), ('search', 0.106), ('yellowpages', 0.098), ('web', 0.093), ('queries', 0.093), ('pertinent', 0.092), ('bangalore', 0.092), ('retrieves', 0.092), ('movies', 0.084), ('movie', 0.08), ('trusted', 0.079), ('gps', 0.079), ('index', 0.078), ('december', 0.074), ('fandango', 0.074), ('ngramfsa', 0.074), ('searchfst', 0.074), ('sensor', 0.074), ('transit', 0.074), ('wqi', 0.074), ('classifier', 0.071), ('fst', 0.07), ('deep', 0.066), ('theater', 0.065), ('christmas', 0.065), ('survivor', 0.065), ('access', 0.063), ('retrieved', 0.059), ('spatial', 0.056), ('timeml', 0.055), ('recognizer', 0.055), ('interface', 0.054), ('www', 0.053), ('hardware', 0.053), ('answer', 0.052), ('users', 0.05), ('repositories', 0.05), ('retrieve', 0.05), ('hatcher', 0.049), ('qawq', 0.049), ('saigon', 0.049), ('saquete', 0.049), ('summers', 0.049), ('theaters', 0.049), ('twilight', 0.049), ('spoken', 0.049), ('voted', 0.046), ('demonstration', 0.044), ('accessible', 0.043), ('prager', 0.043), ('theatre', 0.043), ('lucene', 0.043), ('madison', 0.043), ('soap', 0.043), ('classifiers', 0.043), ('feng', 0.041), ('parameters', 0.04), ('temporally', 0.04), ('wq', 0.04), ('pustejovsky', 0.04), ('topics', 0.039), ('jersey', 0.037), ('dataset', 0.037), ('fsa', 0.035), ('bagging', 0.035), ('mass', 0.035), ('white', 0.035), ('identified', 0.034), ('archive', 0.033), ('kitchen', 0.033), ('engines', 0.033), ('lattices', 0.032), ('moldovan', 0.032), ('old', 0.031), ('asr', 0.031), ('retrieving', 0.031), ('tv', 0.031), ('ago', 0.031), ('near', 0.031), ('system', 0.03), ('vertical', 0.03), ('week', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
2 0.20384181 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
Author: Mattia Tomasoni ; Minlie Huang
Abstract: This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental re- sults on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method.
3 0.18235992 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun
Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.
4 0.15879379 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
5 0.13824441 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui
Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.
6 0.13223071 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
7 0.12685408 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
8 0.096505895 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
9 0.090943009 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
10 0.087381624 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
11 0.083039403 63 acl-2010-Comparable Entity Mining from Comparative Questions
12 0.080907941 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
13 0.078506567 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
14 0.075926416 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
15 0.073738858 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
16 0.073566929 79 acl-2010-Cross-Lingual Latent Topic Extraction
17 0.071447022 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue
18 0.06789238 204 acl-2010-Recommendation in Internet Forums and Blogs
19 0.066244215 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
20 0.06569583 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
topicId topicWeight
[(0, -0.172), (1, 0.104), (2, -0.11), (3, -0.054), (4, 0.002), (5, -0.13), (6, -0.051), (7, 0.021), (8, -0.023), (9, -0.026), (10, -0.053), (11, -0.011), (12, -0.12), (13, -0.149), (14, 0.072), (15, 0.221), (16, -0.192), (17, -0.1), (18, 0.034), (19, 0.004), (20, 0.066), (21, -0.219), (22, 0.159), (23, -0.053), (24, 0.098), (25, 0.062), (26, 0.077), (27, -0.005), (28, -0.114), (29, 0.036), (30, -0.01), (31, -0.026), (32, -0.098), (33, 0.061), (34, 0.04), (35, -0.034), (36, -0.043), (37, 0.101), (38, 0.106), (39, 0.047), (40, 0.006), (41, -0.053), (42, -0.026), (43, 0.025), (44, -0.039), (45, 0.032), (46, 0.011), (47, 0.022), (48, -0.081), (49, -0.001)]
simIndex simValue paperId paperTitle
same-paper 1 0.9791764 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
2 0.75344151 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
Author: Baoxun Wang ; Xiaolong Wang ; Chengjie Sun ; Bingquan Liu ; Lin Sun
Abstract: Quantifying the semantic relevance between questions and their candidate answers is essential to answer detection in social media corpora. In this paper, a deep belief network is proposed to model the semantic relevance for question-answer pairs. Observing the textual similarity between the community-driven questionanswering (cQA) dataset and the forum dataset, we present a novel learning strategy to promote the performance of our method on the social community datasets without hand-annotating work. The experimental results show that our method outperforms the traditional approaches on both the cQA and the forum corpora.
3 0.73259336 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood
Author: Matthias H. Heie ; Edward W. D. Whittaker ; Sadaoki Furui
Abstract: In this paper we demonstrate that there is a strong correlation between the Question Answering (QA) accuracy and the log-likelihood of the answer typing component of our statistical QA model. We exploit this observation in a clustering algorithm which optimizes QA accuracy by maximizing the log-likelihood of a set of question-and-answer pairs. Experimental results show that we achieve better QA accuracy using the resulting clusters than by using manually derived clusters.
4 0.69788897 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering
Author: Mattia Tomasoni ; Minlie Huang
Abstract: This paper presents a framework for automatically processing information coming from community Question Answering (cQA) portals with the purpose of generating a trustful, complete, relevant and succinct summary in response to a question. We exploit the metadata intrinsically present in User Generated Content (UGC) to bias automatic multi-document summarization techniques toward high quality information. We adopt a representation of concepts alternative to n-grams and propose two concept-scoring functions based on semantic overlap. Experimental re- sults on data drawn from Yahoo! Answers demonstrate the effectiveness of our method in terms of ROUGE scores. We show that the information contained in the best answers voted by users of cQA portals can be successfully complemented by our method.
5 0.5800069 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
6 0.54791743 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
7 0.52030694 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
8 0.51234078 63 acl-2010-Comparable Entity Mining from Comparative Questions
9 0.49820146 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
10 0.46345153 204 acl-2010-Recommendation in Internet Forums and Blogs
11 0.4547475 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study
12 0.41896716 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
13 0.40715951 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
14 0.40274191 224 acl-2010-Talking NPCs in a Virtual Game World
15 0.38759762 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
16 0.35668349 248 acl-2010-Unsupervised Ontology Induction from Text
17 0.35304332 259 acl-2010-WebLicht: Web-Based LRT Services for German
18 0.3421047 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images
19 0.34151396 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
20 0.33740279 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
topicId topicWeight
[(14, 0.024), (25, 0.051), (42, 0.015), (44, 0.015), (58, 0.015), (59, 0.071), (71, 0.013), (72, 0.043), (73, 0.057), (76, 0.013), (78, 0.024), (83, 0.105), (84, 0.029), (88, 0.285), (97, 0.019), (98, 0.129)]
simIndex simValue paperId paperTitle
same-paper 1 0.78072882 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
2 0.77876133 112 acl-2010-Extracting Social Networks from Literary Fiction
Author: David Elson ; Nicholas Dames ; Kathleen McKeown
Abstract: We present a method for extracting social networks from literature, namely, nineteenth-century British novels and serials. We derive the networks from dialogue interactions, and thus our method depends on the ability to determine when two characters are in conversation. Our approach involves character name chunking, quoted speech attribution and conversation detection given the set of quotes. We extract features from the social networks and examine their correlation with one another, as well as with metadata such as the novel’s setting. Our results provide evidence that the majority of novels in this time period do not fit two characterizations provided by literacy scholars. Instead, our results suggest an alternative explanation for differences in social networks.
3 0.70504218 233 acl-2010-The Same-Head Heuristic for Coreference
Author: Micha Elsner ; Eugene Charniak
Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.
4 0.56640512 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
Author: Vincent Ng
Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.
5 0.56533325 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
Author: Wei Wei ; Jon Atle Gulla
Abstract: Existing works on sentiment analysis on product reviews suffer from the following limitations: (1) The knowledge of hierarchical relationships of products attributes is not fully utilized. (2) Reviews or sentences mentioning several attributes associated with complicated sentiments are not dealt with very well. In this paper, we propose a novel HL-SOT approach to labeling a product’s attributes and their associated sentiments in product reviews by a Hierarchical Learning (HL) process with a defined Sentiment Ontology Tree (SOT). The empirical analysis against a humanlabeled data set demonstrates promising and reasonable performance of the proposed HL-SOT approach. While this paper is mainly on sentiment analysis on reviews of one product, our proposed HLSOT approach is easily generalized to labeling a mix of reviews of more than one products.
6 0.56344265 73 acl-2010-Coreference Resolution with Reconcile
7 0.56038988 174 acl-2010-Modeling Semantic Relevance for Question-Answer Pairs in Web Social Communities
8 0.56031722 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
9 0.55904371 127 acl-2010-Global Learning of Focused Entailment Graphs
10 0.55798501 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
11 0.55672455 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
12 0.5549897 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
13 0.55451739 39 acl-2010-Automatic Generation of Story Highlights
14 0.5539065 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
15 0.5530746 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
16 0.5527122 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
17 0.55270767 71 acl-2010-Convolution Kernel over Packed Parse Forest
18 0.55257165 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
19 0.55245614 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
20 0.55219138 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering