acl acl2013 acl2013-254 knowledge-graph by maker-knowledge-mining

254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

Source: pdf

Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang

Abstract: In this paper, we address the problem for predicting cQA answer quality as a classification task. We propose a multimodal deep belief nets based approach that operates in two stages: First, the joint representation is learned by taking both textual and non-textual features into a deep learning network. Then, the joint representation learned by the network is used as input features for a linear classifier. Extensive experimental results conducted on two cQA datasets demonstrate the effectiveness of our proposed approach.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Multimodal DBN for Predicting High-Quality Answers in cQA portals Haifeng Hu, Bingquan Liu, Baoxun Wang, Ming Liu, Xiaolong Wang School of Computer Science and Technology Harbin Institute of Technology, China {hfhu , l iubq, bxwang, ml iu , wangxl} @ insun . [sent-1, score-0.112]

2 cn Abstract In this paper, we address the problem for predicting cQA answer quality as a classification task. [sent-4, score-0.317]

3 We propose a multimodal deep belief nets based approach that operates in two stages: First, the joint representation is learned by taking both textual and non-textual features into a deep learning network. [sent-5, score-1.063]

4 Then, the joint representation learned by the network is used as input features for a linear classifier. [sent-6, score-0.183]

5 Extensive experimental results conducted on two cQA datasets demonstrate the effectiveness of our proposed approach. [sent-7, score-0.033]

6 1 Introduction Predicting the quality of answers in community based Question Answering (cQA) portals is a challenging task. [sent-8, score-0.313]

7 One straightforward approach is to use textual features as a text classification task (Agichtein et al. [sent-9, score-0.301]

8 However, due to the word over-sparsity and inherent noise of usergenerated content, the classical bag-of-words representation, is not appropriate to estimate the quality of short texts (Huang et al. [sent-11, score-0.164]

9 Another typical approach is to leverage non-textual features to automatically identify high quality answers (Jeon et al. [sent-13, score-0.286]

10 However, in this way, the mining of meaningful textual features usually tends to be ignored. [sent-16, score-0.265]

11 Intuitively, combining both textual and nontextual information extracted from answers is helpful to improve the performance for predicting the answer quality. [sent-17, score-0.705]

12 However, textual and nontextual features usually have different kinds ofrepresentations and the correlations between them are highly non-linear. [sent-18, score-0.512]

13 , 2011) has shown that it is hard for a shallow model to discover the correlations over multiple sources. [sent-20, score-0.113]

14 To this end, a deep learning approach, called multimodal deep belief nets (mDBN), is intro- duced to address the above problems to predict the answer quality. [sent-21, score-0.918]

15 The approach includes two stages: feature learning and supervised training. [sent-22, score-0.06]

16 In the former stage, a specially designed deep network is given to learn the unified representation using both textual and non-textual information. [sent-23, score-0.622]

17 In the latter stage, the outputs of the network are then used as inputs for a linear classifier to make prediction. [sent-24, score-0.132]

18 2 Related Work The typical way to predict the answer quality is exploring various features and employing machine learning methods. [sent-28, score-0.35]

19 (2006) have proposed a framework to predict the quality of answers by incorporating non-textual features into a maximum entropy model. [sent-30, score-0.321]

20 (2009) both leverage a larger range offeatures to find high quality answers. [sent-33, score-0.086]

21 The deep research on evaluating answer quality has been taken by Shah and Pomerantz (2010) using the logistic regression model. [sent-34, score-0.556]

22 In deep learning field, extensive studies have been done by Hinton and his co-workers (Hinton et al. [sent-36, score-0.247]

23 , 2006; Hinton and Salakhutdinov, 2006; Salakhutdinov and Hinton, 2009), who initially propose the deep belief nets (DBN). [sent-37, score-0.369]

24 al (2010; 2011) firstly apply the DBNs to model semantic relevance for qa pairs in social communities. [sent-39, score-0.093]

25 Meanwhile, the feature learning for disparate sources has also been the hot research topic. [sent-40, score-0.102]

26 (2009) demonstrate that the hidden representations computed by a convolutional DBN make excellent features for visual recognition. [sent-42, score-0.172]

27 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 843–847, 3 Approach We consider the problem of high-quality answer prediction as a classification task. [sent-45, score-0.181]

28 First, textual features and non-textual features exLFeaearntuinrge Fusion Representation Supervised Training Figure 1T:eaxtuFraesl mewoACrckQhiAvoesfurpNoFne-atpuexrotusaledpCHrilAgoahns-qawuicferslihty. [sent-47, score-0.322]

29 tracted from cQA portals are used to train two DBN models to learn the high-level representations independently for answers. [sent-48, score-0.11]

30 The two high-level representations learned by the deep architectures are then joined together to train a RBM model. [sent-49, score-0.22]

31 Finally, a linear classifier is trained with the final shared representation as input to make prediction. [sent-50, score-0.06]

32 In this section, a deep network for the cQA answer quality prediction is presented. [sent-51, score-0.517]

33 Textual and non-textual features are typically characterized by distinct statistical properties and the correlations between them are highly non-linear. [sent-52, score-0.145]

34 It is very dif- ficult for a shallow model to discover these correlations and form an informative unified representation. [sent-53, score-0.181]

35 Our motivation of proposing the mDBN is to tackle these problems using an unified representation to enhance the classification performance. [sent-54, score-0.164]

36 1 The Restricted Boltzmann Machines The basic building block of our feature leaning component is the Restricted Boltzmann Machine (RBM). [sent-56, score-0.059]

37 The classical RBM is a two-layer undirected graphical model with stochastic visible units v and stochastic hidden units h. [sent-57, score-0.363]

38 The visible layer and the hidden layer are fully connected to the units in the other layer by a symmetric matrix w. [sent-58, score-0.632]

39 The classical RBM has been used effectively in modeling distributions over binary-value data. [sent-59, score-0.081]

40 As for real-value inputs, the gaussian RBM (Bengio et al. [sent-60, score-0.059]

41 Different from the former, the hypothesis for the visible unit in the gaussian RBM is the normal distribution. [sent-62, score-0.156]

42 2 Feature Learning The illustration of the feature learning model is given by Figure 2. [sent-64, score-0.06]

43 , V -H1, H1-H2), each data modality is modeled by a two-layer DBN separately. [sent-68, score-0.076]

44 For clarity, we take the textual modality as an example to illustrate the construction of the mDBN in this part. [sent-69, score-0.284]

45 Given a textual input vector v, the visible layer generates the hidden vector h, by p(hj = 1|v) = σ(cj + Pi wijvi) . [sent-70, score-0.503]

46 where σ(x) = (1 + =e− σx(b)−1 dPenotes the logistic function. [sent-72, score-0.064]

47 After learning the RBMs in the bottom layer, we treat the activation probabilities of its hidden units driven by the inputs, as the training data for training a new layer. [sent-74, score-0.218]

48 The construction procedures for the non-textual modality are similar to the textual one, except that we use the gaussian RBM to model the real-value inputs in the bottom layer. [sent-75, score-0.461]

49 The training method is also similar to the bottom’s, but the input vector is the concatenation of the mapped textual vector and the mapped non-textual vector. [sent-77, score-0.208]

50 Figure 2: mDBN for Feature Learning It should be noted in the network, the bottom part is essential to form the joint representation because the correlations between the textual and non-textual features are highly non-linear. [sent-78, score-0.465]

51 It is hard for a RBM directly combining the two disparate sources to learn their correlations. [sent-79, score-0.042]

52 3 Supervised Training and Classification After the above steps, a deep network for feature learning between textual and non-textual data is established. [sent-81, score-0.554]

53 Classifiers, either support vector machine (SVM) or logistic regression (LR), can then be trained with the unified representation (Ngiam 844 et al. [sent-82, score-0.233]

54 4 Basic Features Textual Feature s : The textual features extract from 1,500 most frequent words in the training dataset after standard preprocessing steps, namely word segmentation, stopwords removal and stemming1 . [sent-86, score-0.265]

55 As a result, each answer is represented as a vector containing 1,500 distinct terms weighted by binary scheme. [sent-87, score-0.145]

56 , 2006; Shah and Pomerantz, 2010), we adopt some features used in theirs and also explore three additional features marked by ‡ sign. [sent-89, score-0.114]

57 One dataset comes from Baidu Zhidao2, which contains 33,740 resolved questions crawled by us from the “travel” category. [sent-93, score-0.057]

58 We refer to these two datasets as ZHIDAO and YAHOO respectively and randomly sample 10,000 questions from each to form our experimental datasets. [sent-96, score-0.083]

59 According to the user name, we have crawled all the user profile web pages for non-textual feature collection. [sent-97, score-0.129]

60 To alleviate unnecessary noise, we only select those questions with number of answers no less than 3 (one 1The stemming step is only used in English corpus. [sent-98, score-0.168]

61 com best answer among them), and those answers at least have 10 tokens. [sent-103, score-0.288]

62 (1) Logistic Regression (LR): We implement the approach used by Shah and Pomerantz (2010) with textual features LR-T, nontextual features LR-N and their simple combination LR-C. [sent-109, score-0.481]

63 (2) DBN: Similar to the mDBN, the outputs of the last hidden layer by the DBN are used as inputs for LR model. [sent-110, score-0.264]

64 Based on the feature sets, we have DBN-T for textual features and DBN-N for non-textual features. [sent-111, score-0.298]

65 Since we mainly focus on the high quality answers, theprecision, recall andf1 for positive class and the overall accuracy for both classes are employed as our evaluation metrics. [sent-112, score-0.086]

66 On the joint layer of the network, the layer contains 1000 real-value units. [sent-114, score-0.264]

67 0002 is used, and the learning rate for textual data modal is 0. [sent-117, score-0.235]

68 More details for training the deep architecture can be found in Hinton (2012). [sent-124, score-0.26]

69 To make a fare comparison, we use the liblinear toolkit4 for logistic regression model with L2 regularization and randomly select 70% QA pairs as training data 4available at http://www. [sent-127, score-0.105]

70 The main reason for the improvements is that the joint representation learned by mDBN is able to complement each modality perfectly. [sent-139, score-0.136]

71 In addition, the mDBN can extract stronger representation through modeling semantic relationship between textual and non-textual information, which can effectively help distinguish more complicated answers from high quality to low quality. [sent-140, score-0.524]

72 The classification performance of the textual features are worse on average compared with non-textual features, even when the feature learning strategy is employed. [sent-146, score-0.361]

73 More interestingly, we find the simple combinations of textual and nontextual features don’t improve the classification results compared with using non-textual features alone. [sent-147, score-0.517]

74 We conjecture that there are mainly three reasons for the phenomena: First, this is due to the fact that user-generated content is inherently noisy with low word frequency, resulting in the sparsity of employing textual feature. [sent-148, score-0.208]

75 , answer length) usually own strongly statistical properties and feature sparsity problem can be better relieved to some extent. [sent-151, score-0.204]

76 Finally, s- ince correlations between the textual features and non-textual features are highly non-linear, concatenating these features simply sometimes can submerge classification performance. [sent-152, score-0.567]

77 In contrast, mDBN enjoys the advantage of the shared representation between textual features and non-textual features using the deep learning architecture. [sent-153, score-0.655]

78 We also note that neither the mDBN nor other approaches perform very well in predicting answer quality across the two datasets. [sent-154, score-0.281]

79 4%, which means that there are nearly half of the high quality answers not effectively identified. [sent-157, score-0.256]

80 One of the possible reason is that the quality of the corpora influences the result significantly. [sent-158, score-0.119]

81 As shown in Table 2, each question on average receives more than 4 answers on ZHIDAO and more than 10 on YAHOO. [sent-159, score-0.143]

82 Therefore, it is possible that there are several answers with high quality to the same question. [sent-160, score-0.229]

83 Selecting only one as the high quality answer is relatively difficult for our humans, not to mention for the models. [sent-161, score-0.231]

84 Figure 3: Influences of iterations for mDBN In the second experiment, we intend to examine the performance of mDBN with different number of iterations. [sent-162, score-0.038]

85 From the result, the first observation is that increasing the number of iterations the performance of mDBN can improve significantly, obtaining the best results for iteration of 1000. [sent-164, score-0.038]

86 This clearly shows the representation power of the mDBN again. [sent-165, score-0.06]

87 However, after a large number of iterations (large than 1000), the mDBN has a detrimental performance. [sent-166, score-0.038]

88 This may be explained by with large number of iterations, the deep learning architecture is easier to be overfitting. [sent-167, score-0.287]

89 5 Conclusions and Future work In this paper, we have provided a new perspective to predict the cQA answer quality: learning an informative unified representation between textual and non-textual features instead of concatenating them simply. [sent-169, score-0.636]

90 Specifically, we have proposed a multimodal deep learning framework to 846 form the unified representation. [sent-170, score-0.437]

91 We compare this with the basic features both in isolation and in combination. [sent-171, score-0.057]

92 Experimental results have demonstrated that our proposed approach can capture the complementarity between textual and non-textual features, which is helpful to improve the performance for cQA answer quality prediction. [sent-172, score-0.465]

93 For the future work, we plan to explore more semantic analysis to approach the issue for short text quality evaluation. [sent-173, score-0.086]

94 Additionally, more research will be taken to put forward other approaches for learning multimodal representation. [sent-174, score-0.149]

95 Expertise analysis in a question answer portal for author ranking. [sent-201, score-0.173]

96 A framework to predict the quality of answers with non-textual features. [sent-244, score-0.264]

97 Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. [sent-254, score-0.323]

98 Modeling semantic relevance for question-answer pairs in web social communities. [sent-293, score-0.095]

99 Deep learning approaches to semantic relevance modeling for chinese question-answer pairs. [sent-303, score-0.059]

100 Exploiting user profile information for answer ranking in cqa. [sent-313, score-0.178]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mdbn', 0.542), ('rbm', 0.31), ('dbn', 0.223), ('deep', 0.22), ('textual', 0.208), ('cqa', 0.178), ('nontextual', 0.159), ('hinton', 0.152), ('answer', 0.145), ('answers', 0.143), ('layer', 0.132), ('zhidao', 0.122), ('boltzmann', 0.122), ('multimodal', 0.122), ('salakhutdinov', 0.098), ('visible', 0.097), ('jeon', 0.093), ('agichtein', 0.089), ('correlations', 0.088), ('quality', 0.086), ('pomerantz', 0.084), ('portals', 0.084), ('shah', 0.081), ('modality', 0.076), ('belief', 0.076), ('yahoo', 0.074), ('units', 0.073), ('ngiam', 0.073), ('nets', 0.073), ('unified', 0.068), ('network', 0.066), ('hidden', 0.066), ('inputs', 0.066), ('lr', 0.066), ('logistic', 0.064), ('representation', 0.06), ('gaussian', 0.059), ('features', 0.057), ('bian', 0.056), ('classical', 0.054), ('bottom', 0.052), ('predicting', 0.05), ('srivastava', 0.049), ('convolutional', 0.049), ('neural', 0.047), ('wang', 0.044), ('disparate', 0.042), ('regression', 0.041), ('architecture', 0.04), ('iterations', 0.038), ('bengio', 0.037), ('classification', 0.036), ('concatenating', 0.036), ('predict', 0.035), ('influences', 0.033), ('zhou', 0.033), ('profile', 0.033), ('feature', 0.033), ('datasets', 0.033), ('stage', 0.032), ('cd', 0.032), ('crawled', 0.032), ('social', 0.032), ('liu', 0.032), ('contrastive', 0.032), ('relevance', 0.032), ('sun', 0.031), ('web', 0.031), ('layers', 0.031), ('stages', 0.031), ('lee', 0.03), ('qa', 0.029), ('yandong', 0.028), ('khosla', 0.028), ('nva', 0.028), ('cond', 0.028), ('ince', 0.028), ('insun', 0.028), ('portal', 0.028), ('ranganath', 0.028), ('wangxl', 0.028), ('restricted', 0.028), ('effectively', 0.027), ('learning', 0.027), ('complementarity', 0.026), ('relieved', 0.026), ('iti', 0.026), ('tracted', 0.026), ('enjoys', 0.026), ('leaning', 0.026), ('donato', 0.026), ('discover', 0.025), ('questions', 0.025), ('respectively', 0.025), ('usergenerated', 0.024), ('hongyuan', 0.024), ('xiaolong', 0.024), ('surveyed', 0.024), ('borrow', 0.024), ('gionis', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang

2 0.16586815 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

Author: Xipeng Qiu ; Le Tian ; Xuanjing Huang

Abstract: Retrieving similar questions is very important in community-based question answering(CQA) . In this paper, we propose a unified question retrieval model based on latent semantic indexing with tensor analysis, which can capture word associations among different parts of CQA triples simultaneously. Thus, our method can reduce lexical chasm of question retrieval with the help of the information of question content and answer parts. The experimental result shows that our method outperforms the traditional methods.

3 0.14641367 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network

Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu

Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.

4 0.14627504 107 acl-2013-Deceptive Answer Prediction with User Preference Graph

Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai

Abstract: In Community question answering (QA) sites, malicious users may provide deceptive answers to promote their products or services. It is important to identify and filter out these deceptive answers. In this paper, we first solve this problem with the traditional supervised learning methods. Two kinds of features, including textual and contextual features, are investigated for this task. We further propose to exploit the user relationships to identify the deceptive answers, based on the hypothesis that similar users will have similar behaviors to post deceptive or authentic answers. To measure the user similarity, we propose a new user preference graph based on the answer preference expressed by users, such as “helpful” voting and “best answer” selection. The user preference graph is incorporated into traditional supervised learning framework with the graph regularization technique. The experiment results demonstrate that the user preference graph can indeed help improve the performance of deceptive answer prediction.

5 0.12011782 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

Author: Wen-tau Yih ; Ming-Wei Chang ; Christopher Meek ; Andrzej Pastusiak

Abstract: In this paper, we study the answer sentence selection problem for question answering. Unlike previous work, which primarily leverages syntactic analysis through dependency tree matching, we focus on improving the performance using models of lexical semantic resources. Experiments show that our systems can be consistently and significantly improved with rich lexical semantic information, regardless of the choice of learning algorithms. When evaluated on a benchmark dataset, the MAP and MRR scores are increased by 8 to 10 points, compared to one of our baseline systems using only surface-form matching. Moreover, our best system also outperforms pervious work that makes use of the dependency tree structure by a wide margin.

6 0.11867366 219 acl-2013-Learning Entity Representation for Entity Disambiguation

7 0.10566568 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

8 0.099781796 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval

9 0.087292373 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering

10 0.08551047 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

11 0.082755193 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

12 0.071197554 379 acl-2013-Utterance-Level Multimodal Sentiment Analysis

13 0.069102526 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

14 0.068538152 297 acl-2013-Recognizing Partial Textual Entailment

15 0.065913543 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

16 0.060208924 292 acl-2013-Question Classification Transfer

17 0.059301991 249 acl-2013-Models of Semantic Representation with Visual Attributes

18 0.05783226 248 acl-2013-Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

19 0.052985344 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

20 0.052429166 294 acl-2013-Re-embedding words

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.15), (1, 0.049), (2, 0.003), (3, -0.054), (4, 0.041), (5, 0.006), (6, 0.021), (7, -0.194), (8, 0.067), (9, 0.092), (10, -0.01), (11, -0.082), (12, 0.039), (13, -0.078), (14, 0.007), (15, 0.037), (16, -0.041), (17, 0.002), (18, 0.038), (19, -0.107), (20, 0.066), (21, -0.059), (22, -0.052), (23, -0.091), (24, -0.033), (25, -0.027), (26, 0.062), (27, -0.092), (28, 0.084), (29, -0.054), (30, -0.086), (31, -0.0), (32, 0.024), (33, 0.015), (34, 0.033), (35, 0.031), (36, -0.076), (37, -0.028), (38, -0.036), (39, -0.051), (40, 0.018), (41, 0.054), (42, -0.037), (43, 0.06), (44, -0.033), (45, -0.023), (46, 0.008), (47, 0.056), (48, 0.011), (49, -0.055)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90908587 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang

2 0.66777033 107 acl-2013-Deceptive Answer Prediction with User Preference Graph

Author: Fangtao Li ; Yang Gao ; Shuchang Zhou ; Xiance Si ; Decheng Dai

3 0.64558423 266 acl-2013-PAL: A Chatterbot System for Answering Domain-specific Questions

Author: Yuanchao Liu ; Ming Liu ; Xiaolong Wang ; Limin Wang ; Jingjing Li

Abstract: In this paper, we propose PAL, a prototype chatterbot for answering non-obstructive psychological domain-specific questions. This system focuses on providing primary suggestions or helping people relieve pressure by extracting knowledge from online forums, based on which the chatterbot system is constructed. The strategies used by PAL, including semantic-extension-based question matching, solution management with personal information consideration, and XML-based knowledge pattern construction, are described and discussed. We also conduct a primary test for the feasibility of our system.

4 0.63254905 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language

Author: Tiberiu Boros ; Radu Ion ; Dan Tufis

Abstract: Radu Ion Research Institute for ?????????? ???????????? ?????? Dr?????????? Romanian Academy radu@ racai . ro Dan Tufi? Research Institute for ?????????? ???????????? ?????? Dr?????????? Romanian Academy tufi s @ racai . ro Networks (Marques and Lopes, 1996) and Conditional Random Fields (CRF) (Lafferty et Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories). For this reason, a number of alternative methods have been proposed over the years. One of the most successful methods used for this task, ?????? ?????? ??????? ??????, 1999), exploits a reduced set of tags derived by removing several recoverable features from the lexicon morpho-syntactic descriptions. A second phase is aimed at recovering the full set of morpho-syntactic features. In this paper we present an alternative method to Tiered Tagging, based on local optimizations with Neural Networks and we show how, by properly encoding the input sequence in a general Neural Network architecture, we achieve results similar to the Tiered Tagging methodology, significantly faster and without requiring extensive linguistic knowledge as implied by the previously mentioned method. 1

5 0.62147504 241 acl-2013-Minimum Bayes Risk based Answer Re-ranking for Question Answering

Author: Nan Duan

Abstract: This paper presents two minimum Bayes risk (MBR) based Answer Re-ranking (MBRAR) approaches for the question answering (QA) task. The first approach re-ranks single QA system’s outputs by using a traditional MBR model, by measuring correlations between answer candidates; while the second approach reranks the combined outputs of multiple QA systems with heterogenous answer extraction components by using a mixture model-based MBR model. Evaluations are performed on factoid questions selected from two different domains: Jeopardy! and Web, and significant improvements are achieved on all data sets.

6 0.60164177 218 acl-2013-Latent Semantic Tensor Indexing for Community-based Question Answering

7 0.58418667 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models

8 0.58136529 84 acl-2013-Combination of Recurrent Neural Networks and Factored Language Models for Code-Switching Language Modeling

9 0.57554233 239 acl-2013-Meet EDGAR, a tutoring agent at MONSERRATE

10 0.57099247 60 acl-2013-Automatic Coupling of Answer Extraction and Information Retrieval

11 0.56266147 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network

12 0.559358 292 acl-2013-Question Classification Transfer

13 0.54555959 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

14 0.54135138 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation

15 0.50828058 294 acl-2013-Re-embedding words

16 0.48878631 141 acl-2013-Evaluating a City Exploration Dialogue System with Integrated Question-Answering and Pedestrian Navigation

17 0.48780003 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

18 0.45611602 219 acl-2013-Learning Entity Representation for Entity Disambiguation

19 0.42808244 14 acl-2013-A Novel Classifier Based on Quantum Computation

20 0.42807156 290 acl-2013-Question Analysis for Polish Question Answering

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.086), (6, 0.033), (11, 0.049), (24, 0.053), (26, 0.042), (35, 0.081), (42, 0.065), (48, 0.051), (63, 0.011), (67, 0.025), (70, 0.081), (88, 0.045), (90, 0.023), (95, 0.068), (97, 0.2)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82196289 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals

Author: Haifeng Hu ; Bingquan Liu ; Baoxun Wang ; Ming Liu ; Xiaolong Wang

2 0.80132222 242 acl-2013-Mining Equivalent Relations from Linked Data

Author: Ziqi Zhang ; Anna Lisa Gentile ; Isabelle Augenstein ; Eva Blomqvist ; Fabio Ciravegna

Abstract: Linking heterogeneous resources is a major research challenge in the Semantic Web. This paper studies the task of mining equivalent relations from Linked Data, which was insufficiently addressed before. We introduce an unsupervised method to measure equivalency of relation pairs and cluster equivalent relations. Early experiments have shown encouraging results with an average of 0.75~0.87 precision in predicting relation pair equivalency and 0.78~0.98 precision in relation clustering. 1

3 0.77425355 338 acl-2013-Task Alternation in Parallel Sentence Retrieval for Twitter Translation

Author: Felix Hieber ; Laura Jehl ; Stefan Riezler

Abstract: We present an approach to mine comparable data for parallel sentences using translation-based cross-lingual information retrieval (CLIR). By iteratively alternating between the tasks of retrieval and translation, an initial general-domain model is allowed to adapt to in-domain data. Adaptation is done by training the translation system on a few thousand sentences retrieved in the step before. Our setup is time- and memory-efficient and of similar quality as CLIR-based adaptation on millions of parallel sentences.

4 0.72206193 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

5 0.68509769 249 acl-2013-Models of Semantic Representation with Visual Attributes

Author: Carina Silberer ; Vittorio Ferrari ; Mirella Lapata

Abstract: We consider the problem of grounding the meaning of words in the physical world and focus on the visual modality which we represent by visual attributes. We create a new large-scale taxonomy of visual attributes covering more than 500 concepts and their corresponding 688K images. We use this dataset to train attribute classifiers and integrate their predictions with text-based distributional models of word meaning. We show that these bimodal models give a better fit to human word association data compared to amodal models and word representations based on handcrafted norming data.

6 0.68428379 252 acl-2013-Multigraph Clustering for Unsupervised Coreference Resolution

7 0.67847848 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

8 0.67821109 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction

9 0.67705595 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

10 0.67614448 275 acl-2013-Parsing with Compositional Vector Grammars

11 0.67333269 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

12 0.6732657 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

13 0.67253011 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

14 0.67181528 224 acl-2013-Learning to Extract International Relations from Political Context

15 0.67158216 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

16 0.66992545 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

17 0.66891241 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

18 0.66855675 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction

19 0.6680963 222 acl-2013-Learning Semantic Textual Similarity with Structural Representations

20 0.66710615 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging