emnlp emnlp2012 emnlp2012-139 knowledge-graph by maker-knowledge-mining

139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions

Source: pdf

Author: Victor Chahuneau ; Kevin Gimpel ; Bryan R. Routledge ; Lily Scherlis ; Noah A. Smith

Abstract: We investigate the use of language in food writing, specifically on restaurant menus and in customer reviews. Our approach is to build predictive models of concrete external variables, such as restaurant menu prices. We make use of a dataset of menus and customer reviews for thousands of restaurants in several U.S. cities. By focusing on prediction tasks and doing our analysis at scale, our methodology allows quantitative, objective measurements of the words and phrases used to de- scribe food in restaurants. We also explore interactions in language use between menu prices and sentiment as expressed in user reviews.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com s Abstract We investigate the use of language in food writing, specifically on restaurant menus and in customer reviews. [sent-5, score-0.576]

2 Our approach is to build predictive models of concrete external variables, such as restaurant menu prices. [sent-6, score-0.768]

3 We make use of a dataset of menus and customer reviews for thousands of restaurants in several U. [sent-7, score-0.52]

4 We also explore interactions in language use between menu prices and sentiment as expressed in user reviews. [sent-11, score-0.88]

5 1 Introduction What words might a menu writer use to justify the high price of a steak? [sent-12, score-1.061]

6 In this paper, we explore questions like these that relate restaurant menus, prices, and customer sentiment. [sent-16, score-0.258]

7 Our goal is to understand how language is used in the food domain, and we direct our investigation using external variables such as restaurant menu prices. [sent-17, score-0.931]

8 We echo this pattern here as we turn our attention to language use on restaurant menus and in user restaurant reviews. [sent-28, score-0.597]

9 We use data from a large corpus of restaurant menus and reviews crawled from the web and formulate several prediction tasks. [sent-29, score-0.641]

10 In addition to predicting menu prices, we also consider predicting sentiment along with price. [sent-30, score-0.794]

11 Much of this research has focused on customer-written reviews of goods and services, and perspectives have been gained on how sentiment is expressed in this type of informal text. [sent-32, score-0.413]

12 In addition to sentiment, however, other variables are reflected in a reviewer’s choice of words, such as the price of the item under consideration. [sent-33, score-0.681]

13 In this paper, we take a step toward joint modeling of multiple variables in review text, exploring connections between price and sentiment in restaurant reviews. [sent-34, score-0.894]

14 Freedman and Jurafsky (201 1) studied the use of language in food advertising, specifically the words on potato chip bags. [sent-41, score-0.233]

15 They argued that, due to the ubiquity of food writing across cultures, ethnic groups, and social classes, studying the use of language for describing food can provide perspective on how different socioeconomic groups selfidentify using language and how they are linguistically targeted. [sent-42, score-0.326]

16 In particular, they showed that price affects how “authenticity” is realized in marketing language, a point we return to in §5. [sent-43, score-0.496]

17 This is an example of how price can affect how an underlying variable is expressed in language. [sent-44, score-0.496]

18 Among other explorations in this paper, we consider how price interacts with expression of sentiment in user reviews of restaurants. [sent-45, score-0.896]

19 (201 1) had a 1358 similar goal but decomposed user reviews into parts describing particular aspects of the product being reviewed (Hu and Liu, 2004). [sent-58, score-0.306]

20 Our paper differs from price modeling based on product reviews in several ways. [sent-59, score-0.766]

21 We consider a large set of weaklyrelated products instead of a homogeneous selection of a few products, and the reviews in our dataset are not product-centered but rather describe the overall experience of visiting a restaurant. [sent-60, score-0.268]

22 Consequently, menu items are not always mentioned in reviews and rarely appear with their exact names. [sent-61, score-0.86]

23 This makes it difficult to directly use review features in a pricing model for individual menu items. [sent-62, score-0.723]

24 , 1994), often including recommendations for writing menu item descriptions (Miller and Pavesic, 1996; McVety et al. [sent-64, score-0.775]

25 Their guidelines frequently include example menus from successful restaurants, but typically do not use large corpora of menus or automated analysis, as we do here. [sent-66, score-0.31]

26 (2001 ; 2005) showed that the way that menu items are described affects customers’ perceptions and purchasing behavior. [sent-69, score-0.645]

27 When menu items are described evocatively, customers choose them more often and report higher satisfaction with quality and value, as compared to when they are given the same items described with conventional names. [sent-70, score-0.696]

28 While our goals are related, our experimental approach is different, as we use automated analysis of thousands of restaurant menus and rely on a set of one million reviews as a surrogate for observing customer behavior. [sent-73, score-0.656]

29 For example, there are over 50,000 menu items in New York that include Table 1: Dataset statistics. [sent-76, score-0.617]

30 Each menu includes a list of item names with optional text descriptions and prices. [sent-115, score-0.799]

31 com) with metadata and user reviews for the restaurant, which we also collected. [sent-118, score-0.387]

32 Also, the category of food and a price range ($ to $$$$, indicating the price of a typical meal at the restaurant) are indicated. [sent-122, score-1.225]

33 The distribution of prices of individual menu items is highly skewed, with a mean of $9. [sent-124, score-0.803]

34 On average, a restaurant has 73 items on its menu with a median price of $8. [sent-127, score-1.394]

35 69 and 119 Yelp reviews with a median rating of 3. [sent-128, score-0.374]

36 55 1359 Figure 1: F3krequency distributions of restaurant price ranges (left) and review ratings (right). [sent-129, score-0.773]

37 The 1stkar rating and price range distributions are shown in Figure 1. [sent-131, score-0.591]

38 These include predicting individual menu item prices (§5), predicting the price range for each restaurant (§6), and finally jointly predicting median price and sentiment for each restaurant (§7). [sent-135, score-2.709]

39 5 Menu Item Price Prediction We first consider the problem of predicting the price of each item on a menu. [sent-151, score-0.735]

40 In this case, every instance xi corresponds to a single item in the menu parametrized by the features detailed below and yi is the item’s price. [sent-152, score-0.773]

41 The first two predict the mean or the median of the prices in the training set for a given item name, and use the overall price mean or median when a name is missing in the training set. [sent-155, score-1.051]

42 The third baseline is an ‘1-regularized linear regression model trained with a single binary feature for each item name in the training data. [sent-156, score-0.243]

43 We note that there is a wide variation of menu item names in the dataset, with more than 400,000 distinct names. [sent-158, score-0.774]

44 Although we address this issue later by introducing local text features, we also performed simple normalization of the item names for all of the baselines described above. [sent-159, score-0.209]

45 To do this normalization, we first compiled a stop word list based on the most frequent words in the item names. [sent-160, score-0.207]

46 2 We 1The price distribution is more symmetric in the log domain. [sent-161, score-0.496]

47 removed stop words and then ordered the words in each item name lexicographically, in order to collapse together items such as coffee black and black coffee. [sent-163, score-0.259]

48 We now introduce several sets of features that we add to the normalized item names:3 I. [sent-167, score-0.208]

49 METADATA: Binary features for each restaurant metadata field mentioned above, excluding price range. [sent-168, score-0.83]

50 MENUDESC: n-grams in menu item descriptions, as in MENUNAMES. [sent-177, score-0.75]

51 Review Features In addition to these features, we consider leveraging the large amount of text present in user reviews to improve predictions. [sent-178, score-0.279]

52 We at- tempted to find mentions of menu items in the reviews and to include features extracted from the surrounding words in the model. [sent-179, score-0.913]

53 Perfect item mentions being relatively rare, we consider inexact matches weighted by a coefficient measuring the degree of resemblance: we used the Dice similarity between the set ofwords in the sentence and in the item name. [sent-180, score-0.4]

54 We then extracted n-gram features from this sentence, and tried several ways to use them for price prediction. [sent-181, score-0.519]

55 Given a review sentence, one option is to add the corresponding features to every item matching this sentence, with a value equal to the similarity coefficient. [sent-182, score-0.282]

56 Another option is to select the best matching item and use the same real-valued features but only for this single item. [sent-183, score-0.208]

57 All of 3The normalized item names are present as binary features in all of our regression models 1360 Table 2: Results for menu item price prediction. [sent-186, score-1.536]

58 We also tried to incorporate the reviews by using them in aggregate via predictions from a separate model; we found this approach to work better than the methods described above which all use features from the reviews directly in the regression model. [sent-189, score-0.567]

59 In particular, we use the review features in a separate model that we will describe below (§6) to predict the price range of each restaurant. [sent-190, score-0.635]

60 We use the estimated price range (which we denote PcR) as a single additional real-valued feature for indivcidual item price prediction. [sent-192, score-1.219]

61 Using menu name features (MENUNAMES) brings the bulk of the improvement, though menu description features (MENUDESC) and the remaining features also lead to small gains. [sent-196, score-1.199]

62 While METADATA features improve over the baselines when used alone, they do not lead to improved performance over the MENU* + PcR features, suggesting that the text features may be cable to sub- 1361 stitute for the information in the metadata, at least for prediction of individual item prices. [sent-198, score-0.271]

63 We suspect that these features could be more effective with a better method of linking menu items to mentions in review text. [sent-202, score-0.744]

64 By comparing the weights of related features, we can see the relative differences in terms of contribution to menu item prices. [sent-205, score-0.779]

65 This is related to observations made by Freedman and Jurafsky (201 1) that cheaper food is marketed by appealing to tradition and historicity, with more expensive food described in terms of naturalness, quality of ingredients, and the preparation process (e. [sent-209, score-0.381]

66 Pane (d) shows feature weights for trigrams containing units of chicken; we can see an ordering in terms of size (bits < cubes < strips < cuts) as well as the price increase associated with the use of the word morsels in place of less refined units. [sent-214, score-0.553]

67 We also see a difference between pieces and pcs, with the latter being frequently used to refer to entire cuts of price prediction. [sent-215, score-0.518]

68 Panes (f), (g), and (h) reveal price differences due to slight variations in word form. [sent-221, score-0.496]

69 We also find that food items prefixed with roast lead to more expensive prices than the similar roasted, except in the case ofpork, though here the different forms may be evoking two different preparation styles. [sent-223, score-0.402]

70 6 Restaurant Price Range Prediction In addition to predicting the prices of individual menu items, we also considered the task of predicting the price range listed for each restaurant on its Yelp page. [sent-226, score-1.572]

71 The values for this field are integers from 1 to 4 and indicate the price of a typical meal from the restaurant. [sent-227, score-0.524]

72 For this task, we again train an ‘1-regularized linear regression model with integral price ranges as the true output values yi. [sent-228, score-0.554]

73 In addition to the feature sets used for individual menu item price prediction, we used features on reviews (REVIEWS). [sent-236, score-1.512]

74 Specifically, we used binary features for unigrams, bigrams, and trigrams in the full set of reviews for each restaurant. [sent-237, score-0.294]

75 1 Results Our results for price range prediction are shown in Table 4. [sent-252, score-0.578]

76 Predicting the most frequent price range gave us an accuracy of 48. [sent-253, score-0.538]

77 Performance improvements were obtained by separately adding menu (MENU*), metadata (METADATA), and review features (REVIEWS). [sent-255, score-0.77]

78 Unlike individual item price prediction, the reviews were more helpful than the menu features for predicting overall price range. [sent-256, score-2.062]

79 This is not surprising, since reviewers will often generally discuss price in their reviews. [sent-257, score-0.496]

80 To do this, we trained a logistic regression model predicting polarity for each review; we used the REVIEWS feature set, but this time considering each review as a single training instance. [sent-260, score-0.223]

81 The polarity of a review was determined by whether or not its star rating was greater than the average rating across all reviews in the dataset (3. [sent-261, score-0.531]

82 We omit full details of these models because the polarity prediction task for user reviews is wellknown in the sentiment analysis community and our model is not an innovation over prior work (Pang and Lee, 2008). [sent-264, score-0.477]

83 2 Interpreting Reviews Given learned models for predicting a restaurant’s price range from its set of reviews as well as polarity for each review, we can turn the process around and use the feature weights to analyze the review text. [sent-267, score-0.975]

84 Restricting our attention to reviews of 50–60 words, Table 5 shows sample reviews from our test 1363 set that lead to various predictions of price range and sentiment. [sent-268, score-1.024]

85 5 This technique can also be useful when trying to determine the “true” star rating for a review (if provided star ratings are noisy), or to show the most positive and most negative reviews for a product within a particular star rating. [sent-269, score-0.61]

86 We can also do a more fine-grained analysis of review text by noting the contribution to the price range prediction of each position in the text stream. [sent-271, score-0.652]

87 In Figure 2, we show the influence of each word in a review sentence on the predicted polarity (brown) and price range (yellow). [sent-273, score-0.649]

88 The second sentence is a difficult example for sentiment analysis, since there are several positive words and phrases early but the sentiment is chiefly expressed in the final clause. [sent-276, score-0.242]

89 In both examples, the yellow bars show that price estimates are reflected mainly through isolated mentions of offerings and amenities (drinks, atmosphere, security, good service). [sent-278, score-0.552]

90 sushi tatsatsetde dgood , food was fresh butb tnothing left me yeayrenairnngi tgo roet ruertnu great dark , sexy atmosphere . [sent-288, score-0.297]

91 sushi tasted good , food was fresh but nothing left me yearning to return Figure 2: Local (position-level) sentiment (brown) and price (yellow) estimates for two sentences in the test corpus. [sent-291, score-0.844]

92 6 ment and price are independent of each We try to capture this interaction by modeling at the same time review polarity and item price: we consider the task of jointly predicting aggregate sentiment and price for a restaurant. [sent-293, score-1.463]

93 For every restaurant in our dataset, we compute its median item price ¯ p and its median star rating r¯. [sent-294, score-1.164]

94 We notice that the spread of the sentiment values is larger, which suggests that reviews give stronger clues about consumer experience than about the cost of a typical meal. [sent-314, score-0.408]

95 expensive) appear in this limited selection, as well as certain phrases indicating both sentiment and price (overpriced vs. [sent-316, score-0.617]

96 Other examples of note: gem is used in strongly-positive reviews of cheap restaurants; for expensive restaurants, reviewers use highly recommended or amazing. [sent-318, score-0.272]

97 Also, phrases like no flavor and manager appear in negative reviews of more expensive restaurants, while dirty appears more often in negative reviews of cheaper restaurants. [sent-319, score-0.541]

98 8 Conclusion We have explored linguistic relationships between food prices and customer sentiment through quantitative analysis of a large corpus of menus and reviews. [sent-320, score-0.676]

99 Movie reviews and revenues: An experiment in text regression. [sent-406, score-0.243]

100 How descriptive food names bias sensory perceptions in restaurants. [sent-501, score-0.215]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('menu', 0.565), ('price', 0.496), ('reviews', 0.243), ('restaurant', 0.203), ('item', 0.185), ('food', 0.163), ('prices', 0.158), ('menus', 0.155), ('sentiment', 0.121), ('metadata', 0.108), ('median', 0.078), ('review', 0.074), ('mae', 0.073), ('star', 0.071), ('ghose', 0.071), ('hospitality', 0.071), ('restaurants', 0.067), ('pricing', 0.061), ('mre', 0.061), ('regression', 0.058), ('menudesc', 0.056), ('pane', 0.056), ('pcr', 0.056), ('wansink', 0.056), ('yelp', 0.056), ('customer', 0.055), ('predicting', 0.054), ('rating', 0.053), ('items', 0.052), ('goods', 0.049), ('consumer', 0.044), ('reviewer', 0.044), ('atmosphere', 0.042), ('freedman', 0.042), ('kogan', 0.042), ('nnz', 0.042), ('potato', 0.042), ('sales', 0.042), ('zwicky', 0.042), ('range', 0.042), ('prediction', 0.04), ('polarity', 0.037), ('quarterly', 0.036), ('economics', 0.036), ('fresh', 0.036), ('routledge', 0.036), ('user', 0.036), ('gimpel', 0.033), ('eisenstein', 0.03), ('mentions', 0.03), ('expensive', 0.029), ('weights', 0.029), ('stars', 0.029), ('administrative', 0.028), ('allmenus', 0.028), ('ambience', 0.028), ('archak', 0.028), ('authenticity', 0.028), ('baye', 0.028), ('chicken', 0.028), ('chip', 0.028), ('crispy', 0.028), ('dice', 0.028), ('drinks', 0.028), ('ipeirotis', 0.028), ('kasavana', 0.028), ('mashed', 0.028), ('mcvety', 0.028), ('meal', 0.028), ('menunames', 0.028), ('nonstandard', 0.028), ('perceptions', 0.028), ('relatedly', 0.028), ('revenues', 0.028), ('sexy', 0.028), ('sushi', 0.028), ('mean', 0.028), ('trigrams', 0.028), ('customers', 0.027), ('nf', 0.027), ('product', 0.027), ('cheaper', 0.026), ('yellow', 0.026), ('descriptions', 0.025), ('products', 0.025), ('tasty', 0.024), ('lerman', 0.024), ('crisp', 0.024), ('market', 0.024), ('spellings', 0.024), ('names', 0.024), ('quantitative', 0.024), ('joshi', 0.024), ('opinion', 0.023), ('features', 0.023), ('carnegie', 0.023), ('cmu', 0.023), ('pittsburgh', 0.023), ('stop', 0.022), ('cuts', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions

Author: Victor Chahuneau ; Kevin Gimpel ; Bryan R. Routledge ; Lily Scherlis ; Noah A. Smith

2 0.13665624 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua

Abstract: This paper proposes to generate appropriate answers for opinion questions about products by exploiting the hierarchical organization of consumer reviews. The hierarchy organizes product aspects as nodes following their parent-child relations. For each aspect, the reviews and corresponding opinions on this aspect are stored. We develop a new framework for opinion Questions Answering, which enables accurate question analysis and effective answer generation by making use the hierarchy. In particular, we first identify the (explicit/implicit) product aspects asked in the questions and their sub-aspects by referring to the hierarchy. We then retrieve the corresponding review fragments relevant to the aspects from the hierarchy. In order to gener- ate appropriate answers from the review fragments, we develop a multi-criteria optimization approach for answer generation by simultaneously taking into account review salience, coherence, diversity, and parent-child relations among the aspects. We conduct evaluations on 11 popular products in four domains. The evaluated corpus contains 70,359 consumer reviews and 220 questions on these products. Experimental results demonstrate the effectiveness of our approach.

3 0.1008168 102 emnlp-2012-Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems

Author: Nina Dethlefs ; Helen Hastie ; Verena Rieser ; Oliver Lemon

Abstract: Incremental processing allows system designers to address several discourse phenomena that have previously been somewhat neglected in interactive systems, such as backchannels or barge-ins, but that can enhance the responsiveness and naturalness of systems. Unfortunately, prior work has focused largely on deterministic incremental decision making, rendering system behaviour less flexible and adaptive than is desirable. We present a novel approach to incremental decision making that is based on Hierarchical Reinforcement Learning to achieve an interactive optimisation of Information Presentation (IP) strategies, allowing the system to generate and comprehend backchannels and barge-ins, by employing the recent psycholinguistic hypothesis of information density (ID) (Jaeger, 2010). Results in terms of average rewards and a human rating study show that our learnt strategy outperforms several baselines that are | v not sensitive to ID by more than 23%.

4 0.093409546 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification

Author: Natalia Ponomareva ; Mike Thelwall

Abstract: This paper presents a comparative study of graph-based approaches for cross-domain sentiment classification. In particular, the paper analyses two existing methods: an optimisation problem and a ranking algorithm. We compare these graph-based methods with each other and with the other state-ofthe-art approaches and conclude that graph domain representations offer a competitive solution to the domain adaptation problem. Analysis of the best parameters for graphbased algorithms reveals that there are no optimal values valid for all domain pairs and that these values are dependent on the characteristics of corresponding domains.

5 0.076409116 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

Author: Lizhen Qu ; Rainer Gemulla ; Gerhard Weikum

Abstract: We propose the weakly supervised MultiExperts Model (MEM) for analyzing the semantic orientation of opinions expressed in natural language reviews. In contrast to most prior work, MEM predicts both opinion polarity and opinion strength at the level of individual sentences; such fine-grained analysis helps to understand better why users like or dislike the entity under review. A key challenge in this setting is that it is hard to obtain sentence-level training data for both polarity and strength. For this reason, MEM is weakly supervised: It starts with potentially noisy indicators obtained from coarse-grained training data (i.e., document-level ratings), a small set of diverse base predictors, and, if available, small amounts of fine-grained training data. We integrate these noisy indicators into a unified probabilistic framework using ideas from ensemble learning and graph-based semi-supervised learning. Our experiments indicate that MEM outperforms state-of-the-art methods by a significant margin.

6 0.073601231 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers

7 0.06650617 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model

8 0.063793972 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

9 0.062383484 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

10 0.058122389 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

11 0.049474113 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification

12 0.039230324 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

13 0.034464471 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?

14 0.033452027 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

15 0.030078365 17 emnlp-2012-An "AI readability" Formula for French as a Foreign Language

16 0.030001739 120 emnlp-2012-Streaming Analysis of Discourse Participants

17 0.029375682 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

18 0.028961405 121 emnlp-2012-Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

19 0.027249575 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

20 0.027116459 117 emnlp-2012-Sketch Algorithms for Estimating Point Queries in NLP

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.118), (1, 0.069), (2, 0.03), (3, 0.177), (4, 0.074), (5, -0.068), (6, -0.071), (7, -0.025), (8, 0.065), (9, 0.05), (10, 0.07), (11, 0.019), (12, -0.06), (13, -0.086), (14, 0.021), (15, -0.0), (16, 0.076), (17, 0.034), (18, 0.008), (19, -0.008), (20, -0.048), (21, -0.055), (22, -0.029), (23, -0.048), (24, -0.069), (25, -0.018), (26, 0.061), (27, 0.011), (28, 0.02), (29, -0.055), (30, -0.161), (31, -0.229), (32, -0.186), (33, 0.013), (34, 0.018), (35, 0.204), (36, -0.091), (37, -0.133), (38, -0.081), (39, -0.158), (40, 0.22), (41, -0.043), (42, -0.079), (43, -0.025), (44, -0.009), (45, 0.065), (46, 0.011), (47, 0.29), (48, 0.056), (49, 0.192)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96384287 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions

Author: Victor Chahuneau ; Kevin Gimpel ; Bryan R. Routledge ; Lily Scherlis ; Noah A. Smith

2 0.39149058 34 emnlp-2012-Do Neighbours Help? An Exploration of Graph-based Algorithms for Cross-domain Sentiment Classification

Author: Natalia Ponomareva ; Mike Thelwall

3 0.35142887 121 emnlp-2012-Supervised Text-based Geolocation Using Language Models on an Adaptive Grid

Author: Stephen Roller ; Michael Speriosu ; Sarat Rallapalli ; Benjamin Wing ; Jason Baldridge

Abstract: The geographical properties of words have recently begun to be exploited for geolocating documents based solely on their text, often in the context of social media and online content. One common approach for geolocating texts is rooted in information retrieval. Given training documents labeled with latitude/longitude coordinates, a grid is overlaid on the Earth and pseudo-documents constructed by concatenating the documents within a given grid cell; then a location for a test document is chosen based on the most similar pseudo-document. Uniform grids are normally used, but they are sensitive to the dispersion of documents over the earth. We define an alternative grid construction using k-d trees that more robustly adapts to data, especially with larger training sets. We also provide a better way of choosing the locations for pseudo-documents. We evaluate these strategies on existing Wikipedia and Twitter corpora, as well as a new, larger Twitter corpus. The adaptive grid achieves competitive results with a uniform grid on small training sets and outperforms it on the large Twitter corpus. The two grid constructions can also be combined to produce consistently strong results across all training sets.

4 0.3452501 102 emnlp-2012-Optimising Incremental Dialogue Decisions Using Information Density for Interactive Systems

Author: Nina Dethlefs ; Helen Hastie ; Verena Rieser ; Oliver Lemon

5 0.3172684 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua

6 0.30666003 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers

7 0.23654424 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

8 0.2295994 60 emnlp-2012-Generative Goal-Driven User Simulation for Dialog Management

9 0.20562786 15 emnlp-2012-Active Learning for Imbalanced Sentiment Classification

10 0.18968895 101 emnlp-2012-Opinion Target Extraction Using Word-Based Translation Model

11 0.18014599 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

12 0.17622827 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

13 0.17093284 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

14 0.16236393 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

15 0.1551892 17 emnlp-2012-An "AI readability" Formula for French as a Foreign Language

16 0.14898257 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

17 0.14846289 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

18 0.14224248 28 emnlp-2012-Collocation Polarity Disambiguation Using Web-based Pseudo Contexts

19 0.14092791 84 emnlp-2012-Linking Named Entities to Any Database

20 0.13954483 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.017), (14, 0.011), (16, 0.029), (25, 0.014), (34, 0.049), (60, 0.065), (63, 0.058), (64, 0.018), (65, 0.031), (73, 0.013), (74, 0.042), (76, 0.064), (80, 0.02), (86, 0.043), (89, 0.39), (95, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75337541 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions

Author: Victor Chahuneau ; Kevin Gimpel ; Bryan R. Routledge ; Lily Scherlis ; Noah A. Smith

2 0.61997712 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

3 0.32166156 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

4 0.31654507 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

Author: Jianxing Yu ; Zheng-Jun Zha ; Tat-Seng Chua

5 0.31486103 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

Author: Heeyoung Lee ; Marta Recasens ; Angel Chang ; Mihai Surdeanu ; Dan Jurafsky

Abstract: We introduce a novel coreference resolution system that models entities and events jointly. Our iterative method cautiously constructs clusters of entity and event mentions using linear regression to model cluster merge operations. As clusters are built, information flows between entity and event clusters through features that model semantic role dependencies. Our system handles nominal and verbal events as well as entities, and our joint formulation allows information from event coreference to help entity coreference, and vice versa. In a cross-document domain with comparable documents, joint coreference resolution performs significantly better (over 3 CoNLL F1 points) than two strong baselines that resolve entities and events separately.

6 0.31334144 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

7 0.30885163 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction

8 0.30729538 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

9 0.30712709 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

10 0.30515379 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

11 0.30457476 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

12 0.30260354 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

13 0.30241996 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

14 0.30217594 120 emnlp-2012-Streaming Analysis of Discourse Participants

15 0.30076605 97 emnlp-2012-Natural Language Questions for the Web of Data

16 0.30030781 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts

17 0.29970041 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

18 0.29902425 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

19 0.29802036 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

20 0.29791445 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media