acl acl2010 acl2010-177 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Manoj Kumar Chinnakotla ; Karthik Raman ; Pushpak Bhattacharyya
Abstract: In a previous work of ours Chinnakotla et al. (2010) we introduced a novel framework for Pseudo-Relevance Feedback (PRF) called MultiPRF. Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. MulitiPRF showed remarkable improvement over plain Model Based Feedback (MBF) uniformly for 4 languages, viz., French, German, Hungarian and Finnish with English as the assisting language. This fact inspired us to study the effect of any source-assistant pair on MultiPRF performance from out of a set of languages with widely different characteristics, viz., Dutch, English, Finnish, French, German and Spanish. Carrying this further, we looked into the effect of using two assisting languages together on PRF. The present paper is a report of these investigations, their results and conclusions drawn therefrom. While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e.g., French and Spanish.
Reference: text
sentIndex sentText sentNum sentScore
1 Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. [sent-7, score-0.339]
2 , French, German, Hungarian and Finnish with English as the assisting language. [sent-9, score-0.579]
3 Carrying this further, we looked into the effect of using two assisting languages together on PRF. [sent-12, score-0.648]
4 While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. [sent-14, score-1.31]
5 Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e. [sent-15, score-0.756]
6 The problem of matching the user’s query to the documents is rendered difficult by natural language phenomena like morphological variations, polysemy and synonymy. [sent-19, score-0.299]
7 Relevance Feedback (RF) tries to overcome these problems by eliciting user feedback on the relevance of documents obtained from the initial ranking and then uses it to automatically refine the query. [sent-20, score-0.393]
8 Based on the above assumption, the terms in the feedback document set are analyzed to choose the most distinguishing set of terms that characterize the feedback documents and as a result the relevance of a document. [sent-24, score-0.762]
9 It does so by taking the help of a different language called the assisting language. [sent-31, score-0.579]
10 In MultiPRF, given a query in source language L1, the query is automatically translated into the assisting language L2 and PRF performed in the assisting language. [sent-32, score-1.77]
11 The translated feedback 1346 Proce dingUsp opfs thaela 4, 8Stwhe Adnen u,a 1l1- M16e Jtiunlgy o 2f0 t1h0e. [sent-34, score-0.334]
12 c s 2o0c1ia0ti Aosnso focria Ctio nm fpourta Ctoiomnpault Laitniognuaislt Licisn,g puaigsetisc 1s346–1356, model, is then combined with the original feedback model of L1 to obtain the final model which is used to re-rank the corpus. [sent-36, score-0.329]
13 , French, German, Hungarian and Finnish with English as the assisting language. [sent-38, score-0.579]
14 Carrying this further, we looked into the effect of using two assisting languages together on PRF. [sent-41, score-0.648]
15 While performance improvement on PRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. [sent-43, score-1.31]
16 Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e. [sent-44, score-0.756]
17 Section 6 presents the results, and studies the effect of varying the assisting language and incorporates multiple assisting languages. [sent-51, score-1.158]
18 Some of the representative techniques are (i) Refining the feedback document set (Mitra et al. [sent-57, score-0.31]
19 , 2004) and (iv) Varying the importance of documents in the feedback set (Tao and Zhai, 2006). [sent-62, score-0.335]
20 The intuition behind the above approach is that if the query does not have many relevant documents in the collection then any improvements in the modeling of PRF is bound to perform poorly due to query drift. [sent-65, score-0.581]
21 Several approaches have been proposed for including different types of lexically and semantically related terms during query expansion. [sent-66, score-0.315]
22 Voorhees (1994) use Wordnet for query expansion and report negative results. [sent-67, score-0.295]
23 Our proposed approach is especially attractive in the case of resource-constrained languages where the original retrieval is bad due to poor coverage of the collection and/or inherent complexity of query processing (for example term conflation) in those languages. [sent-72, score-0.367]
24 they use blind relevance feedback on a larger, more reliable parallel corpus, to improve retrieval performance on imperfect transcriptions of speech. [sent-76, score-0.396]
25 Since our method uses a corpus in the assisting language from a similar time period, it can be likened to the work by Talvensaari et al. [sent-81, score-0.579]
26 In the LM approach, documents and queries are modeled using multinomial distribution over words called document language model P(w|D) and query language menotd laenl P(w|ΘQ) respectively. [sent-90, score-0.36]
27 KL(ΘQ||D) =XwP(w|ΘQ) · logPP((ww||ΘDQ)) Since the query length is short, it is difficult to estimate ΘQ accurately using the query alone. [sent-92, score-0.506]
28 In PRF, the top k documents obtained through the initial ranking algorithm are assumed to be relevant and used as feedback for improving the estimation of ΘQ. [sent-93, score-0.356]
29 The feedback documents contain both relevant and noisy terms from which the feedback language model is inferred based on a Generative Mixture Model (Zhai and Lafferty, 2001). [sent-94, score-0.684]
30 Zhai and Lafferty (Zhai and Lafferty, 2001) model the feedback document set DF as a mixture of two distributions: (a) the feedback language model and (b) the collection model P(w|C). [sent-99, score-0.659]
31 The feedback language mlleocdtieol nis m iondfeerlre Pd( using t Theh eE fMee Algorithm (Dempster et al. [sent-100, score-0.289]
32 terms which are more frequent in the feedback document set than in the entire collection. [sent-103, score-0.35]
33 To maintain query focus the final converged feedback model, ΘF is interpolated with the initial query model ΘQ to obtain the final query model ΘFinal. [sent-104, score-1.088]
34 Given a query Q in the source language L1, we automatically translate the query into the assisting language L2. [sent-108, score-1.146]
35 We then rank the documents in the L2 collection using the query likelihood ranking function (John Lafferty and Chengxiang Zhai, 2003). [sent-109, score-0.32]
36 Using the top k documents, we estimate the feedback model using MBF as described in the previous section. [sent-110, score-0.309]
37 Similarly, we also estimate a feedback model using the original query and the top k documents retrieved from the initial ranking in L1. [sent-111, score-0.629]
38 Hence, the probabilistic bi-lingual dictionary acts as a rich source of morphologically and semantically related feedback terms. [sent-122, score-0.371]
39 Thus, during this step, of translating the feedback model as given in Equation 1, the translation model adds related terms in L1 which have their source as the term from feedback model ΘLF2. [sent-123, score-0.783]
40 The parameters β and γ control the relative importance of the original query model, feedback model of L1 and the translated feedback model obtained from L1 and are tuned based on the choice of L1 and L2. [sent-125, score-0.916]
41 Note that, in each experiment, we choose assisting collections such that the topics in the source language are covered in the assisting collection so as to get meaningful feedback terms. [sent-128, score-1.58]
42 We demonstrate approach the performance with French, source languages German of MultiPRF and Finnish as and Dutch, English and Span- ish as the assisting language. [sent-131, score-0.734]
43 We later vary the assisting language, for each source language and study the effects. [sent-132, score-0.64]
44 83 Table 4: Results comparing the performance of MultiPRF over baseline MBF on CLEF collections with English (EN), Spanish (ES) and Dutch (NL) as assisting languages. [sent-366, score-0.64]
45 We use Google Translate as the query translation system as it has been shown to perform well for the task (Wu et al. [sent-375, score-0.297]
46 6 Results and Discussion In Table 4, we see the performance of the MultiPRF approach for three assisting languages, and how it compares with the baseline MBF methods. [sent-382, score-0.604]
47 Furthermore we notice these trends hold across different assisting languages, with Spanish and Dutch outperforming English as the assisting language on some of the French and German collections. [sent-391, score-1.158]
48 On performing a more detailed study of the results we identify the main reason for improvements in our approach is the ability to obtain good feedback terms in the assisting language coupled with the introduction of lexically and semantically related terms during the backtranslation step. [sent-392, score-1.051]
49 In Table 5, we see some examples, which illustrates the feedback terms brought by the MultiPRF method. [sent-393, score-0.329]
50 In this case the original feedback model also performs 1350 MBF- Top Representative Terms MultiPRF- Top Representative TOPIC NO ASSIST LANG. [sent-396, score-0.309]
51 Although there is no significant topic drift in this case, there are not many relevant terms apart from the query terms. [sent-418, score-0.316]
52 However the same query performs very well in English with all the documents in the feedback set of the English corpus being relevant, thus resulting in informative feedback terms such as {bovin, scientif, recherch}. [sent-419, score-0.917]
53 : (Abn)ot Fhiendr sinitguation in which MultiPRF leads to large improvements is when it finds semantically/lexically related terms to the query terms which the original feedback model was unable to. [sent-421, score-0.671]
54 While the feedback model was unable to find any of the synonyms of the query terms, due to their lack of co-occurence with the query terms, the MultiPRF model was able to get these terms, ´n which are introduced primarily during the backtranslation process. [sent-423, score-0.867]
55 (c) Combination of Above Factors: Sometimes a combination of the above two factors causes improvements in the performance as in the German query “O¨lkatastrophein Sibirien ”. [sent-426, score-0.307]
56 For this query, MultiPRF finds good feedback terms such as {russisch, russland} while also obtaining semantically rieslcahte, rdu tsesrlmans s}uc whh as {olverschmutz, erdol, olunfall}. [sent-427, score-0.349]
57 Although all of the previously described examples had good quality translations of the query in the assisting language, as mentioned in (Chinnakotla et al. [sent-428, score-0.852]
58 To see how MultiPRF leads to improvements even with errors in query translation consider the German Query “Siegerinnen von Wimbledon ”. [sent-430, score-0.347]
59 However, while the MultiPRF model has some terms pertaining to Men’s Winners of Wimbledon as well, the original feedback model suffers from severe topic drift, with irrelevant terms such as {telefonbuch, telekom} also amongst the top hte arsm {st. [sent-433, score-0.409]
60 eTfohnubs we nteolteikcoem th}a atl despite tghset error in query translation MultiPRF still manages to correct the drift of the original feedback model, while also introducing relevant terms such as {verfecht, steffi, martina, novotna, navratilova} as wrfeelcl. [sent-434, score-0.649]
61 , 2010), having a better query translation system can only lead to better performance. [sent-436, score-0.297]
62 We also perform a detailed error analysis and found three main reasons for MultiPRF failing: (i) Inaccuracies in query translation (including the presence of out-of-vocabulary terms). [sent-437, score-0.297]
63 Consider the French query Les droits de l’enfant, for which due to topic drift in English, MultiPRF performance reduces. [sent-440, score-0.322]
64 1 Parameter Sensitivity Analysis The MultiPRF parameters β and γ in Equation 2 control the relative importance assigned to the original feedback model in source language L1, the translated feedback model obtained from assisting language L2 and the original query terms. [sent-444, score-1.556]
65 We varied the β and γ parameters for French, German and Finnish collections with English, Dutch and Spanish as assisting languages and studied its effect on MAP of MultiPRF. [sent-445, score-0.684]
66 2 Effect of Assisting Language Choice In this section, we discuss the effect of varying the assisting language. [sent-452, score-0.579]
67 For each source language, we use the other languages as assisting collections and study the performance of MultiPRF. [sent-456, score-0.77]
68 Since query translation quality varies across language pairs, we analyze the behaviour of MultiPRF in the following two scenarios: (a) Using ideal query translation (b) Using Google Translate for query translation. [sent-457, score-0.847]
69 In ideal query translation setup, in order to elim- inate its effect, we skip the query translation step and use the corresponding original topics for each target language instead. [sent-458, score-0.63]
70 From the results, we firstly observe that besides English, other languages such as French, Spanish, German and Dutch act as good assisting languages and help in improving performance over monolingual MBF. [sent-460, score-0.787]
71 We also observe that the best assisting language varies with the source language. [sent-461, score-0.64]
72 However, the crucial factors of the assisting language which influence the performance of MultiPRF are: (a) Monolingual PRF Performance: The main motivation for using a different language was to get good feedback terms, especially in case of queries which fail in the source language. [sent-462, score-0.994]
73 Hence, an assisting language in which the monolingual feedback performance itself is poor, is unlikely to give any performance gains. [sent-463, score-0.943]
74 3105 Table 6: Results showing the performance of MultiPRF with different source and assisting languages using Google Translate for query translation step. [sent-654, score-1.031]
75 (b) Familial Similarity Between Languages: We observe that the performance of MultiPRF is good if the assisting language is from the same language family. [sent-657, score-0.624]
76 Hence, the query translation and back translation quality improves if the source and assisting languages belong to the same family. [sent-660, score-1.081]
77 In some cases, we observe that MultiPRF scores decent improvements even when the assisting language does not belong to the same language family as witnessed in French-English and English-French. [sent-663, score-0.642]
78 3 Effect of Language Family Translation Performance on Back As already mentioned, the performance of MultiPRF is good if the source and assisting languages belong to the same family. [sent-666, score-0.754]
79 The experiment designed is as follows: Given a query in source language L1, the ideal translation in assisting language L2 is used to compute the query model in L2 using only the query terms. [sent-668, score-1.463]
80 is directly back translated from L2 into L1 and finally documents are re-ranked using this translated feedback model. [sent-680, score-0.456]
81 Since the automatic query translation and PRF steps have been eliminated, the only factor which influences the MultiPRF performance is the back-translation step. [sent-681, score-0.322]
82 For each source language, the best performing assisting languages have been highlighted. [sent-684, score-0.709]
83 Hence, familial closeness of the assisting language helps in boosting the MultiPRF performance. [sent-687, score-0.632]
84 An exception to this trend is English as assisting lan1353 Source Assisting Language Source Lang. [sent-688, score-0.579]
85 3105 Table 7: Results showing the performance of MultiPRF without using automatic query translation i. [sent-883, score-0.322]
86 4 Multiple Assisting Languages So far, we have only considered a single assisting language. [sent-889, score-0.579]
87 However, a natural extension to the method which comes to mind, is using multiple assisting languages. [sent-890, score-0.579]
88 In other words, combining the evidence from all the feedback models of more than one assisting language, to get a feedback model which is better than that obtained using a single assisting language. [sent-891, score-1.756]
89 To check how this simple extension works, we performed experiments using a pair of assisting languages. [sent-892, score-0.579]
90 In these experiments for a given source language (from amongst the 6 previously mentioned languages) we tried using all pairs of assisting languages (for each source language, we have 10 pairs possible). [sent-893, score-0.77]
91 To obtain the final model, we simply interpolate all the feedback models with the initial query model, in a similar manner as done in MultiPRF. [sent-894, score-0.542]
92 As we see, out of the 60 possible combinations of source language and assisting language pairs, we obtain improvements of greater than 3% in 16 cases. [sent-896, score-0.669]
93 Here the improvements are with respect to the best model amongst the two MultiPRF models corresponding to each of the two assisting languages, with the same source language. [sent-897, score-0.689]
94 Thus we observe that a simple linear interpolation of models is not the best way of combining evidence from multiple assisting languages. [sent-898, score-0.579]
95 We also observe than when German or Spanish are used as one of the two assisting languages, they are most likely to Source Assisting Language Pairs with English FR-DE (4. [sent-899, score-0.579]
96 The improvements described above are with respect to maximum MultiPRF MAP obtained using either L1 or L2 alone as assisting language. [sent-916, score-0.608]
97 7 Conclusion and Future Work We studied the effect of different source-assistant pairs and multiple assisting languages on the performance of MultiPRF. [sent-919, score-0.673]
98 Experiments across a wide range of language pairs with varied degree of familial relationships show that MultiPRF improves performance in most cases with the performance improvement being more pronounced when the source and assisting languages are closely related. [sent-920, score-0.834]
99 We also notice that the results are mixed when two assisting languages are used simultaneously. [sent-921, score-0.648]
100 As part of future work, we plan to vary the model interpolation parameters dynamically to improve the performance in case of multiple assisting languages. [sent-922, score-0.624]
wordName wordTfidf (topN-words)
[('assisting', 0.579), ('multiprf', 0.517), ('feedback', 0.289), ('query', 0.253), ('prf', 0.231), ('french', 0.139), ('mbf', 0.137), ('finnish', 0.081), ('dutch', 0.073), ('languages', 0.069), ('german', 0.067), ('spanish', 0.065), ('chinnakotla', 0.063), ('nl', 0.063), ('zhai', 0.062), ('source', 0.061), ('familial', 0.053), ('sigir', 0.05), ('map', 0.047), ('lafferty', 0.047), ('documents', 0.046), ('monde', 0.046), ('translated', 0.045), ('retrieval', 0.045), ('translation', 0.044), ('sda', 0.042), ('spiegel', 0.042), ('tgoeprimca', 0.042), ('expansion', 0.042), ('terms', 0.04), ('es', 0.039), ('chengxiang', 0.038), ('relevance', 0.037), ('topics', 0.036), ('collections', 0.036), ('clef', 0.036), ('family', 0.034), ('trec', 0.033), ('asit', 0.032), ('backtranslation', 0.032), ('braschler', 0.032), ('gmap', 0.032), ('stedra', 0.032), ('wimbledon', 0.032), ('back', 0.031), ('bruce', 0.03), ('improvements', 0.029), ('xu', 0.029), ('whatever', 0.029), ('america', 0.029), ('english', 0.028), ('tao', 0.028), ('lavrenko', 0.028), ('mitra', 0.025), ('amati', 0.025), ('voorhees', 0.025), ('monolingual', 0.025), ('performance', 0.025), ('selective', 0.024), ('drift', 0.023), ('cikm', 0.022), ('lexically', 0.022), ('pronounced', 0.022), ('jones', 0.021), ('dictionary', 0.021), ('droits', 0.021), ('ericain', 0.021), ('ftorepincc', 0.021), ('inews', 0.021), ('jourlin', 0.021), ('karthik', 0.021), ('lateinamerika', 0.021), ('luc', 0.021), ('manoj', 0.021), ('meij', 0.021), ('mprf', 0.021), ('mulitiprf', 0.021), ('navratilova', 0.021), ('nierie', 0.021), ('raman', 0.021), ('sakai', 0.021), ('sourceassisting', 0.021), ('talvensaari', 0.021), ('telefonbuch', 0.021), ('terrier', 0.021), ('tiedemann', 0.021), ('tique', 0.021), ('verfecht', 0.021), ('document', 0.021), ('ranking', 0.021), ('von', 0.021), ('queries', 0.02), ('lm', 0.02), ('buckley', 0.02), ('good', 0.02), ('model', 0.02), ('multilingual', 0.019), ('en', 0.019), ('glasgow', 0.018), ('ofterm', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
Author: Manoj Kumar Chinnakotla ; Karthik Raman ; Pushpak Bhattacharyya
Abstract: In a previous work of ours Chinnakotla et al. (2010) we introduced a novel framework for Pseudo-Relevance Feedback (PRF) called MultiPRF. Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. MulitiPRF showed remarkable improvement over plain Model Based Feedback (MBF) uniformly for 4 languages, viz., French, German, Hungarian and Finnish with English as the assisting language. This fact inspired us to study the effect of any source-assistant pair on MultiPRF performance from out of a set of languages with widely different characteristics, viz., Dutch, English, Finnish, French, German and Spanish. Carrying this further, we looked into the effect of using two assisting languages together on PRF. The present paper is a report of these investigations, their results and conclusions drawn therefrom. While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e.g., French and Spanish.
2 0.11638124 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
3 0.11435448 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
Author: Xu Sun ; Jianfeng Gao ; Daniel Micol ; Chris Quirk
Abstract: This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms cantly its baseline systems. 1 signifi-
4 0.1103209 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
Author: Tingxu Yan ; Tamsin Maxwell ; Dawei Song ; Yuexian Hou ; Peng Zhang
Abstract: p . zhang1 @ rgu .ac .uk Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone.
5 0.07166671 58 acl-2010-Classification of Feedback Expressions in Multimodal Data
Author: Costanza Navarretta ; Patrizia Paggio
Abstract: This paper addresses the issue of how linguistic feedback expressions, prosody and head gestures, i.e. head movements and face expressions, relate to one another in a collection of eight video-recorded Danish map-task dialogues. The study shows that in these data, prosodic features and head gestures significantly improve automatic classification of dialogue act labels for linguistic expressions of feedback.
6 0.068156324 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
7 0.06569583 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
8 0.064822704 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
9 0.064393513 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
10 0.055563841 195 acl-2010-Phylogenetic Grammar Induction
11 0.054790895 79 acl-2010-Cross-Lingual Latent Topic Extraction
12 0.053327274 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
13 0.052980561 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning
14 0.052159213 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
15 0.046677399 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
16 0.044656713 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach
17 0.041415695 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
18 0.040506817 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
19 0.039415997 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis
20 0.038948309 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
topicId topicWeight
[(0, -0.119), (1, -0.0), (2, -0.078), (3, 0.026), (4, 0.032), (5, -0.023), (6, -0.015), (7, -0.001), (8, 0.018), (9, -0.01), (10, 0.008), (11, 0.027), (12, 0.007), (13, -0.073), (14, -0.017), (15, 0.056), (16, -0.043), (17, -0.005), (18, -0.014), (19, -0.004), (20, -0.016), (21, -0.186), (22, 0.033), (23, -0.004), (24, 0.077), (25, -0.013), (26, 0.068), (27, -0.014), (28, -0.015), (29, 0.02), (30, 0.054), (31, -0.033), (32, -0.002), (33, -0.164), (34, 0.126), (35, -0.013), (36, -0.037), (37, -0.09), (38, 0.011), (39, 0.016), (40, 0.166), (41, -0.116), (42, 0.05), (43, -0.018), (44, 0.129), (45, 0.206), (46, 0.022), (47, -0.111), (48, 0.03), (49, 0.069)]
simIndex simValue paperId paperTitle
same-paper 1 0.94463354 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
Author: Manoj Kumar Chinnakotla ; Karthik Raman ; Pushpak Bhattacharyya
Abstract: In a previous work of ours Chinnakotla et al. (2010) we introduced a novel framework for Pseudo-Relevance Feedback (PRF) called MultiPRF. Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. MulitiPRF showed remarkable improvement over plain Model Based Feedback (MBF) uniformly for 4 languages, viz., French, German, Hungarian and Finnish with English as the assisting language. This fact inspired us to study the effect of any source-assistant pair on MultiPRF performance from out of a set of languages with widely different characteristics, viz., Dutch, English, Finnish, French, German and Spanish. Carrying this further, we looked into the effect of using two assisting languages together on PRF. The present paper is a report of these investigations, their results and conclusions drawn therefrom. While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e.g., French and Spanish.
2 0.72473556 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
Author: Xu Sun ; Jianfeng Gao ; Daniel Micol ; Chris Quirk
Abstract: This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms cantly its baseline systems. 1 signifi-
3 0.64419997 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
Author: Xiao Li
Abstract: Determining the semantic intent of web queries not only involves identifying their semantic class, which is a primary focus of previous works, but also understanding their semantic structure. In this work, we formally define the semantic structure of noun phrase queries as comprised of intent heads and intent modifiers. We present methods that automatically identify these constituents as well as their semantic roles based on Markov and semi-Markov conditional random fields. We show that the use of semantic features and syntactic features significantly contribute to improving the understanding performance.
4 0.598764 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion
Author: Tingxu Yan ; Tamsin Maxwell ; Dawei Song ; Yuexian Hou ; Peng Zhang
Abstract: p . zhang1 @ rgu .ac .uk Bag-of-words approaches to information retrieval (IR) are effective but assume independence between words. The Hyperspace Analogue to Language (HAL) is a cognitively motivated and validated semantic space model that captures statistical dependencies between words by considering their co-occurrences in a surrounding window of text. HAL has been successfully applied to query expansion in IR, but has several limitations, including high processing cost and use of distributional statistics that do not exploit syntax. In this paper, we pursue two methods for incorporating syntactic-semantic information from textual ‘events’ into HAL. We build the HAL space directly from events to investigate whether processing costs can be reduced through more careful definition of word co-occurrence, and improve the quality of the pseudo-relevance feedback by applying event information as a constraint during HAL construction. Both methods significantly improve performance results in comparison with original HAL, and interpolation of HAL and relevance model expansion outperforms either method alone.
5 0.38616127 215 acl-2010-Speech-Driven Access to the Deep Web on Mobile Devices
Author: Taniya Mishra ; Srinivas Bangalore
Abstract: The Deep Web is the collection of information repositories that are not indexed by search engines. These repositories are typically accessible through web forms and contain dynamically changing information. In this paper, we present a system that allows users to access such rich repositories of information on mobile devices using spoken language.
6 0.37369135 79 acl-2010-Cross-Lingual Latent Topic Extraction
7 0.33343238 123 acl-2010-Generating Focused Topic-Specific Sentiment Lexicons
8 0.33200076 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
9 0.33129102 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction
10 0.32969123 151 acl-2010-Intelligent Selection of Language Model Training Data
11 0.31295371 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
12 0.3054668 105 acl-2010-Evaluating Multilanguage-Comparability of Subjectivity Analysis Systems
13 0.30474362 61 acl-2010-Combining Data and Mathematical Models of Language Change
14 0.29281795 195 acl-2010-Phylogenetic Grammar Induction
15 0.2770662 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures
16 0.27672133 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications
17 0.27088964 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies
18 0.268426 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features
19 0.26821649 180 acl-2010-On Jointly Recognizing and Aligning Bilingual Named Entities
20 0.26539919 259 acl-2010-WebLicht: Web-Based LRT Services for German
topicId topicWeight
[(14, 0.016), (16, 0.014), (25, 0.034), (33, 0.017), (39, 0.015), (42, 0.029), (54, 0.307), (59, 0.077), (71, 0.013), (72, 0.016), (73, 0.042), (76, 0.017), (78, 0.026), (83, 0.057), (84, 0.038), (98, 0.161)]
simIndex simValue paperId paperTitle
same-paper 1 0.74347031 177 acl-2010-Multilingual Pseudo-Relevance Feedback: Performance Study of Assisting Languages
Author: Manoj Kumar Chinnakotla ; Karthik Raman ; Pushpak Bhattacharyya
Abstract: In a previous work of ours Chinnakotla et al. (2010) we introduced a novel framework for Pseudo-Relevance Feedback (PRF) called MultiPRF. Given a query in one language called Source, we used English as the Assisting Language to improve the performance of PRF for the source language. MulitiPRF showed remarkable improvement over plain Model Based Feedback (MBF) uniformly for 4 languages, viz., French, German, Hungarian and Finnish with English as the assisting language. This fact inspired us to study the effect of any source-assistant pair on MultiPRF performance from out of a set of languages with widely different characteristics, viz., Dutch, English, Finnish, French, German and Spanish. Carrying this further, we looked into the effect of using two assisting languages together on PRF. The present paper is a report of these investigations, their results and conclusions drawn therefrom. While performance improvement on MultiPRF is observed whatever the assisting language and whatever the source, observations are mixed when two assisting languages are used simultaneously. Interestingly, the performance improvement is more pronounced when the source and assisting languages are closely related, e.g., French and Spanish.
2 0.72275126 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
Author: Xu Sun ; Jianfeng Gao ; Daniel Micol ; Chris Quirk
Abstract: This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms cantly its baseline systems. 1 signifi-
3 0.71572751 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
Author: Hirotoshi Taira ; Sanae Fujita ; Masaaki Nagata
Abstract: Maintaining high annotation consistency in large corpora is crucial for statistical learning; however, such work is hard, especially for tasks containing semantic elements. This paper describes predicate argument structure analysis using transformation-based learning. An advantage of transformation-based learning is the readability of learned rules. A disadvantage is that the rule extraction procedure is time-consuming. We present incremental-based, transformation-based learning for semantic processing tasks. As an example, we deal with Japanese predicate argument analysis and show some tendencies of annotators for constructing a corpus with our method.
4 0.68285513 32 acl-2010-Arabic Named Entity Recognition: Using Features Extracted from Noisy Data
Author: Yassine Benajiba ; Imed Zitouni ; Mona Diab ; Paolo Rosso
Abstract: Building an accurate Named Entity Recognition (NER) system for languages with complex morphology is a challenging task. In this paper, we present research that explores the feature space using both gold and bootstrapped noisy features to build an improved highly accurate Arabic NER system. We bootstrap noisy features by projection from an Arabic-English parallel corpus that is automatically tagged with a baseline NER system. The feature space covers lexical, morphological, and syntactic features. The proposed approach yields an improvement of up to 1.64 F-measure (absolute).
5 0.53407043 79 acl-2010-Cross-Lingual Latent Topic Extraction
Author: Duo Zhang ; Qiaozhu Mei ; ChengXiang Zhai
Abstract: Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages. Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Proba- bilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary. Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data.
6 0.53367859 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
7 0.53310317 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
8 0.53201127 133 acl-2010-Hierarchical Search for Word Alignment
9 0.53011662 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
10 0.53006536 136 acl-2010-How Many Words Is a Picture Worth? Automatic Caption Generation for News Images
11 0.52920818 170 acl-2010-Letter-Phoneme Alignment: An Exploration
12 0.5292002 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval
13 0.52877957 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization
14 0.5273701 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
15 0.52685547 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
16 0.52669835 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications
17 0.52542406 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
18 0.52525449 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
19 0.52386117 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction
20 0.52311045 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing