acl acl2010 acl2010-161 knowledge-graph by maker-knowledge-mining

161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

Source: pdf

Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer

Abstract: We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 A Mountain View, CA, USA The Technion, Haifa, Israel dhi l lon@ cis . [sent-9, score-0.066]

2 i l Abstract We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e. [sent-15, score-0.898]

3 Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective. [sent-18, score-0.387]

4 1 Introduction Because of the high-dimensional nature of NLP datasets, estimating a large number of parameters (a parameter for each dimension), often from a limited amount of labeled data, is a challenging task for statistical learners. [sent-19, score-0.061]

5 Faced with this challenge, various unsupervised dimensionality reduction methods have been developed over the years, e. [sent-20, score-0.175]

6 Recently, several supervised metric learning algorithms have been proposed (Davis et al. [sent-23, score-0.389]

7 , 2010) is another such method which exploits labeled as well as unlabeled data during metric learning. [sent-26, score-0.416]

8 These methods learn a Mahalanobis distance metric to compute distance between a pair of data instances, which can also be interpreted as learning a transformation of the input data, as we shall see in Section 2. [sent-27, score-0.597]

9 In this paper, we address that gap: we compare effectiveness of classifiers trained on the transformed spaces learned by metric learning methods to those generated by previously proposed unsupervised dimensionality reduction methods. [sent-31, score-0.786]

10 We find IDML-IT, a semi-supervised metric learning algorithm to be the most effective. [sent-32, score-0.314]

11 1 Relationship between Metric Learning and Linear Projection We first establish the well-known equivalence between learning a Mahalanobis distance measure and Euclidean distance in a linearly transformed space of the data (Weinberger and Saul, 2009). [sent-34, score-0.449]

12 Let A be a d d positive definite matrix which parameAte brieze as d t×hed M poahsiatilavneo dbeifsi distance, dA(xi, xj), a bme-tween instances xi and xj, as shown in Equation 1. [sent-35, score-0.514]

13 Since A is positive definite, we can decompose it as A = P>P, where P is another matrix of size d × d. [sent-36, score-0.238]

14 In this paper, we are interested in learning a better representation of the data (i. [sent-38, score-0.044]

15 , projection matrix P), and we shall achieve that goal by learning the corresponding Mahalanobis distance parameter A. [sent-40, score-0.487]

16 We shall now review two recently proposed metric learning algorithms. [sent-41, score-0.41]

17 , 2007) assumes the availability of prior knowledge about inter-instance distances. [sent-46, score-0.04]

18 In this scheme, two instances are considered similar if the Mahalanobis distance between them is upper bounded, i. [sent-47, score-0.249]

19 Similarly, thweor ein ustances are considered dissimilar if the distance between them is larger than certain threshold l, i. [sent-50, score-0.093]

20 Similar instances are represented by se)t S ≥, w l. [sent-53, score-0.156]

21 In addition to prior knowledge about interinstance distances, sometimes prior information about the matrix A, denoted by A0, itself may also be available. [sent-55, score-0.27]

22 In such cases, we would like the learned matrix A to be as close as possible to the prior matrix A0. [sent-59, score-0.492]

23 ITML combines these two types of prior information, i. [sent-60, score-0.04]

24 , knowledge about inter-instance distances, and prior matrix A0, in order to learn the matrix A by solving the optimization problem shown in (2). [sent-62, score-0.42]

25 exactly solving the problem in (2) is not possible, slack variables may be introduced to the ITML objective. [sent-68, score-0.028]

26 Let X be the d n matrix of n instances in a dL-edtim Xen bseio tnhael space. [sent-73, score-0.346]

27 mOauttr oxf othfe n n instances, nl instances are labeled, while the remaining nu instances are unlabeled, with n = nl + nu. [sent-74, score-0.752]

28 Let S be ×× a n n diagonal matrix with Sii = 1iff instance xi is labeled. [sent-75, score-0.393]

29 Y is the n m matrix storing training label information, ei fn any. [sent-77, score-0.257]

30 , output moaft any classifier, dw litahdenoting score of label lat node i. [sent-80, score-0.067]

31 The ITML metric learning algorithm, which we reviewed in Section 2. [sent-82, score-0.314]

32 2, is supervised in nature, and hence it does not exploit widely available unlabeled data. [sent-83, score-0.16]

33 , 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined. [sent-85, score-1.37]

34 In self-training styled iterations, IDML alternates between metric learning and label inference; with output of label inference used during next round of metric learning, and so on. [sent-86, score-0.769]

35 IDML starts out with the assumption that existing supervised metric learning algorithms, such Yˆ m Yˆil as ITML, can learn a better metric if the number of available labeled instances is increased. [sent-87, score-0.876]

36 Since we are focusing on the semi-supervised learning (SSL) setting with nl labeled and nu unlabeled instances, the idea is to automatically label the unlabeled instances using a graph based SSL algorithm, and then include instances with low assigned label entropy (i. [sent-88, score-1.073]

37 , high confidence label assignments) in the next round of metric learning. [sent-90, score-0.388]

38 The number of instances added in each iteration depends on the threshold β1 . [sent-91, score-0.156]

39 This process is continued until no new instances can be added to the set of labeled instances, which can happen when either all the instances are already exhausted, or when none of the remaining unlabeled instances can be assigned labels with high confidence. [sent-92, score-0.692]

40 In Line 3, any supervised metric learner, such as ITML, may be used as the METRICLEARNER. [sent-94, score-0.345]

41 Using the distance metric learned in Line 3, a new k-NN graph is constructed in Line 4 , whose edge weight matrix is stored in W. [sent-95, score-0.661]

42 , Uii = 0, ∀i) 10: return A cian, and D is a diagonal matrix with Dii = Pj Wij. [sent-103, score-0.227]

43 The constraint, = in (3) mPakes sure that labels on training instances are not changed during inference. [sent-104, score-0.19]

44 , = 0) is considered a new labeled training instance, i. [sent-107, score-0.061]

45 , Uii = 1, for next round of metric learning if the instance has been assigned labels with high confidence in the current iteration, i. [sent-109, score-0.432]

46 Finally in Line 7, training instance label) )in ≤foβr m). [sent-114, score-0.033]

47 This iterative process is continued till no new labeled instance can be added, i. [sent-116, score-0.138]

48 1 Setup SˆYˆ0, TableE1:KWDBe icaVtobcerasKoDhkcsenBritcpsonD1i74m8f35 t9eh258n4e613s 9id5oantseBaY luae ns ecd inSec- tion 3. [sent-122, score-0.028]

49 All datasets are binary with 1500 total instances in each. [sent-123, score-0.239]

50 Description of the datasets used during experiments in Section 3 are presented in Table 1. [sent-124, score-0.083]

51 The first four datasets Electronics, Books, Kitchen, and DVDs are from the sentiment domain and previously used in (Blitzer et al. [sent-125, score-0.083]

52 For details regarding features and data pre-processing, we refer the reader to the origin of these datasets cited above. [sent-128, score-0.083]

53 We experiment with the following ways of estimating transformation matrix P: Original2: We set P = I, where I the is d d identity matrix. [sent-131, score-0.23]

54 RP: The data is first projected into a lower dimensional space using the Random Projection (RP) method (Bingham and Mannila, 2001). [sent-133, score-0.16]

55 Dimensionality of the target space was set at = as prescribed in d0 ? [sent-134, score-0.056]

56 We use the projection matrix constructed by RP as P. [sent-137, score-0.293]

57 25 for the experiments in Section 3, which has the effect of projecting the data into a much lower dimensional space (84 for the experiments in this section). [sent-140, score-0.141]

58 This presents an interesting evaluation setting as we already run evaluations in much higher di- mensional space (e. [sent-141, score-0.056]

59 PCA: Data instances are first projected into a lower dimensional space using Principal Components Analysis (PCA) (Jolliffe, 2002) . [sent-144, score-0.316]

60 Following (Weinberger and Saul, 2009), dimensionality of the projected space was set at 250 for all experiments. [sent-145, score-0.232]

61 In this case, we used the projection matrix generated by PCA as P. [sent-146, score-0.293]

62 ITML: A is learned by applying ITML (see Section 2. [sent-147, score-0.072]

63 2) on the Original space (above), and then we decompose A as A = to obtain P. [sent-148, score-0.104]

64 P>P 2Note that “Original” in the results tables refers to original space with features occurring more than 20 times. [sent-149, score-0.056]

65 All hyperparameters are tuned on a separate random split. [sent-156, score-0.205]

66 All hyperparameters are tuned on a separate random split. [sent-162, score-0.205]

67 IDML-IT: A is learned by applying IDML (Algorithm 1) (see Section 2. [sent-163, score-0.072]

68 3) on the Original space (above); with ITML used as METRICLEARNER in IDML (Line 3 in Algorithm 1). [sent-164, score-0.056]

69 In this case, we treat the set of test instances (without their gold labels) as the unlabeled data. [sent-165, score-0.241]

70 In other words, we essentially work in the transductive setting (Vapnik, 2000). [sent-166, score-0.052]

71 Once again, we decompose A as A = P>P to obtain P. [sent-167, score-0.048]

72 We also experimented with the supervised large-margin metric learning algorithm (LMNN) presented in (Weinberger and Saul, 2009). [sent-168, score-0.389]

73 Each input instance, x, is now pro- jected into the transformed space as Px. [sent-170, score-0.188]

74 We now train different classifiers on this transformed space. [sent-171, score-0.132]

75 2 Supervised Classification We train a SVM classifier, with an RBF kernel, on the transformed space generated by the projection matrix P. [sent-174, score-0.481]

76 SVM hyperparameter, C and RBF kernel bandwidth, were tuned on a separate development split. [sent-175, score-0.147]

77 Experimental results with 50 and 100 labeled instances are shown in Table 2, and Table 3, respectively. [sent-176, score-0.217]

78 3 Semi-Supervised Classification In this section, we trained the GRF classifier (see Equation 3), a graph-based semi-supervised learning (SSL) algorithm (Zhu et al. [sent-180, score-0.073]

79 , 2003), using Gaussian kernel parameterized by A = P>P to set edge weights. [sent-181, score-0.07]

80 During graph construction, each node was connected to its k nearest neighbors, with k treated as a hyperparameter and tuned on a separate development set. [sent-182, score-0.21]

81 Experimental results with 50 and 100 labeled instances are shown in Table 4, and Table 5, respectively. [sent-183, score-0.217]

82 As before, we experimented with nl = 50 and nl = 100. [sent-184, score-0.314]

83 Once again, we observe that IDML-IT is the most effective method, with the GRF classifier trained on the data representation learned by IDML-IT achieving best performance in all settings. [sent-185, score-0.101]

84 All hyperparameters are tuned on a separate random split. [sent-189, score-0.205]

85 All results are averaged over using different methods (see Section 3. [sent-191, score-0.059]

86 All hyperparameters are tuned on a separate random split. [sent-194, score-0.205]

87 4 Conclusion In this paper, we compared the effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e. [sent-195, score-0.825]

88 Through a variety of experiments on different real-world NLP datasets, we demonstrated that supervised as well as semisupervised classifiers trained on the space learned by IDML-IT consistently result in the lowest classification errors. [sent-199, score-0.276]

89 Random projection in dimensionality reduction: applications to image and text data. [sent-210, score-0.222]

90 Distance metric learning for large margin nearest neighbor classification. [sent-261, score-0.344]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('itml', 0.452), ('idml', 0.29), ('metric', 0.27), ('mahalanobis', 0.194), ('matrix', 0.19), ('xj', 0.179), ('pca', 0.17), ('weinberger', 0.161), ('nl', 0.157), ('instances', 0.156), ('xi', 0.133), ('transformed', 0.132), ('nu', 0.126), ('dimensionality', 0.119), ('dhillon', 0.113), ('projection', 0.103), ('saul', 0.097), ('bingham', 0.097), ('grf', 0.097), ('pxj', 0.097), ('distance', 0.093), ('ssl', 0.085), ('uii', 0.085), ('pxi', 0.085), ('unlabeled', 0.085), ('datasets', 0.083), ('davis', 0.081), ('supervised', 0.075), ('semisupervised', 0.073), ('learned', 0.072), ('tuned', 0.067), ('label', 0.067), ('cis', 0.066), ('hyperparameters', 0.065), ('dld', 0.065), ('graphlabelinf', 0.065), ('lmnn', 0.065), ('mannila', 0.065), ('metriclearner', 0.065), ('subramanya', 0.065), ('technion', 0.065), ('labeled', 0.061), ('principal', 0.061), ('euclidean', 0.059), ('spaces', 0.059), ('tr', 0.059), ('averaged', 0.059), ('rp', 0.057), ('projected', 0.057), ('shall', 0.057), ('space', 0.056), ('reduction', 0.056), ('da', 0.054), ('ten', 0.053), ('zhu', 0.052), ('talukdar', 0.052), ('transductive', 0.052), ('line', 0.051), ('round', 0.051), ('uy', 0.049), ('partha', 0.049), ('decompose', 0.048), ('dimensional', 0.047), ('rbf', 0.046), ('kitchen', 0.044), ('continued', 0.044), ('learning', 0.044), ('electronics', 0.042), ('separate', 0.042), ('prior', 0.04), ('transformation', 0.04), ('recently', 0.039), ('projecting', 0.038), ('distances', 0.038), ('kernel', 0.038), ('diagonal', 0.037), ('graph', 0.036), ('definite', 0.035), ('hyperparameter', 0.035), ('koby', 0.035), ('blitzer', 0.035), ('effectiveness', 0.034), ('labels', 0.034), ('entropy', 0.033), ('instance', 0.033), ('parameterized', 0.032), ('linearly', 0.031), ('pennsylvania', 0.031), ('springer', 0.031), ('random', 0.031), ('gaussian', 0.03), ('nearest', 0.03), ('classifier', 0.029), ('driven', 0.028), ('ecd', 0.028), ('slack', 0.028), ('bregman', 0.028), ('alukdar', 0.028), ('anew', 0.028), ('bilmes', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer

2 0.11773825 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

Author: Partha Pratim Talukdar ; Fernando Pereira

Abstract: Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area.

3 0.10725436 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.

4 0.086773895 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

Author: Yejin Choi ; Claire Cardie

Abstract: Automatic opinion recognition involves a number of related tasks, such as identifying the boundaries of opinion expression, determining their polarity, and determining their intensity. Although much progress has been made in this area, existing research typically treats each of the above tasks in isolation. In this paper, we apply a hierarchical parameter sharing technique using Conditional Random Fields for fine-grained opinion analysis, jointly detecting the boundaries of opinion expressions as well as determining two of their key attributes polarity and intensity. Our experimental results show that our proposed approach improves the performance over a baseline that does not — exploit hierarchical structure among the classes. In addition, we find that the joint approach outperforms a baseline that is based on cascading two separate components.

5 0.086056903 66 acl-2010-Compositional Matrix-Space Models of Language

Author: Sebastian Rudolph ; Eugenie Giesbrecht

Abstract: We propose CMSMs, a novel type of generic compositional models for syntactic and semantic aspects of natural language, based on matrix multiplication. We argue for the structural and cognitive plausibility of this model and show that it is able to cover and combine various common compositional NLP approaches ranging from statistical word space models to symbolic grammar formalisms.

6 0.081400394 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

7 0.070862792 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

8 0.064558335 25 acl-2010-Adapting Self-Training for Semantic Role Labeling

9 0.057457287 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

10 0.053707313 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree

11 0.050238818 141 acl-2010-Identifying Text Polarity Using Random Walks

12 0.047686704 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

13 0.046974588 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

14 0.046598379 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

15 0.046456862 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

16 0.045277763 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning

17 0.045080625 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

18 0.044243492 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

19 0.043018512 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

20 0.042668737 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.136), (1, 0.029), (2, -0.023), (3, 0.04), (4, 0.001), (5, -0.019), (6, 0.022), (7, 0.017), (8, 0.019), (9, 0.054), (10, -0.103), (11, 0.045), (12, 0.023), (13, -0.073), (14, -0.019), (15, -0.005), (16, 0.041), (17, -0.028), (18, 0.004), (19, -0.005), (20, 0.078), (21, 0.096), (22, -0.04), (23, -0.002), (24, -0.046), (25, 0.008), (26, 0.035), (27, 0.032), (28, 0.045), (29, 0.078), (30, -0.032), (31, 0.032), (32, -0.055), (33, 0.056), (34, -0.042), (35, 0.111), (36, -0.096), (37, 0.005), (38, -0.042), (39, -0.099), (40, -0.07), (41, -0.171), (42, 0.138), (43, -0.124), (44, 0.03), (45, 0.002), (46, 0.045), (47, -0.059), (48, -0.128), (49, -0.072)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95782739 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer

2 0.60592234 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

Author: Partha Pratim Talukdar ; Fernando Pereira

3 0.57283527 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers

Author: Anders Sogaard

Abstract: Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).

4 0.54684424 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

Author: Ruihong Huang ; Ellen Riloff

Abstract: This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words. The learning process begins by inducing a classifier that only has access to contextual features, forcing it to generalize beyond the seeds. The contextual classifier then labels new instances, to expand and diversify the training set. Next, a cross-category bootstrapping process simultaneously trains a suite of classifiers for multiple semantic classes. The positive instances for one class are used as negative instances for the others in an iterative bootstrapping cycle. We also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynam- ically create semantic features. We evaluate our approach by inducing six semantic taggers from a collection of veterinary medicine message board posts.

5 0.46009898 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning

Author: Joseph Turian ; Lev-Arie Ratinov ; Yoshua Bengio

Abstract: If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http ://metaoptimize com/proj ects/wordreprs/ .

6 0.44797522 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

7 0.44674259 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms

8 0.44358435 141 acl-2010-Identifying Text Polarity Using Random Walks

9 0.43845049 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

10 0.437022 25 acl-2010-Adapting Self-Training for Semantic Role Labeling

11 0.42424291 66 acl-2010-Compositional Matrix-Space Models of Language

12 0.40845391 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

13 0.40844077 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

14 0.40361193 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing

15 0.39356858 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

16 0.39027172 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

17 0.38870028 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

18 0.38195807 151 acl-2010-Intelligent Selection of Language Model Training Data

19 0.37830123 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

20 0.37574005 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.046), (42, 0.02), (44, 0.011), (59, 0.098), (71, 0.459), (73, 0.035), (78, 0.03), (83, 0.069), (84, 0.018), (98, 0.127)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76729584 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning

Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer

2 0.75090855 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification

Author: Israela Becker ; Vered Aharonson

Abstract: Two psycholinguistic and psychophysical experiments show that in order to efficiently extract polarity of written texts such as customerreviews on the Internet, one should concentrate computational efforts on messages in the final position of the text.

3 0.67898542 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models

Author: Tanel Alumae ; Mikko Kurimo

Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.

4 0.4279415 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning

Author: Peter Prettenhofer ; Benno Stein

Abstract: We present a new approach to crosslanguage text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce taskspecific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of interlanguage correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.

5 0.41995037 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications

Author: Bin Wei ; Christopher Pal

Abstract: In this paper, we study the problem of using an annotated corpus in English for the same natural language processing task in another language. While various machine translation systems are available, automated translation is still far from perfect. To minimize the noise introduced by translations, we propose to use only key ‘reliable” parts from the translations and apply structural correspondence learning (SCL) to find a low dimensional representation shared by the two languages. We perform experiments on an EnglishChinese sentiment classification task and compare our results with a previous cotraining approach. To alleviate the problem of data sparseness, we create extra pseudo-examples for SCL by making queries to a search engine. Experiments on real-world on-line review data demonstrate the two techniques can effectively improvetheperformancecomparedtoprevious work.

6 0.4128328 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

7 0.39296517 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

8 0.39261419 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging

9 0.39163119 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

10 0.39079657 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

11 0.38904566 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

12 0.38849786 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

13 0.38838279 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

14 0.3877579 214 acl-2010-Sparsity in Dependency Grammar Induction

15 0.38696635 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

16 0.3860603 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

17 0.37433869 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

18 0.37366664 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

19 0.37272519 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

20 0.37215263 169 acl-2010-Learning to Translate with Source and Target Syntax