acl acl2010 acl2010-161 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer
Abstract: We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective.
Reference: text
sentIndex sentText sentNum sentScore
1 A Mountain View, CA, USA The Technion, Haifa, Israel dhi l lon@ cis . [sent-9, score-0.066]
2 i l Abstract We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e. [sent-15, score-0.898]
3 Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective. [sent-18, score-0.387]
4 1 Introduction Because of the high-dimensional nature of NLP datasets, estimating a large number of parameters (a parameter for each dimension), often from a limited amount of labeled data, is a challenging task for statistical learners. [sent-19, score-0.061]
5 Faced with this challenge, various unsupervised dimensionality reduction methods have been developed over the years, e. [sent-20, score-0.175]
6 Recently, several supervised metric learning algorithms have been proposed (Davis et al. [sent-23, score-0.389]
7 , 2010) is another such method which exploits labeled as well as unlabeled data during metric learning. [sent-26, score-0.416]
8 These methods learn a Mahalanobis distance metric to compute distance between a pair of data instances, which can also be interpreted as learning a transformation of the input data, as we shall see in Section 2. [sent-27, score-0.597]
9 In this paper, we address that gap: we compare effectiveness of classifiers trained on the transformed spaces learned by metric learning methods to those generated by previously proposed unsupervised dimensionality reduction methods. [sent-31, score-0.786]
10 We find IDML-IT, a semi-supervised metric learning algorithm to be the most effective. [sent-32, score-0.314]
11 1 Relationship between Metric Learning and Linear Projection We first establish the well-known equivalence between learning a Mahalanobis distance measure and Euclidean distance in a linearly transformed space of the data (Weinberger and Saul, 2009). [sent-34, score-0.449]
12 Let A be a d d positive definite matrix which parameAte brieze as d t×hed M poahsiatilavneo dbeifsi distance, dA(xi, xj), a bme-tween instances xi and xj, as shown in Equation 1. [sent-35, score-0.514]
13 Since A is positive definite, we can decompose it as A = P>P, where P is another matrix of size d × d. [sent-36, score-0.238]
14 In this paper, we are interested in learning a better representation of the data (i. [sent-38, score-0.044]
15 , projection matrix P), and we shall achieve that goal by learning the corresponding Mahalanobis distance parameter A. [sent-40, score-0.487]
16 We shall now review two recently proposed metric learning algorithms. [sent-41, score-0.41]
17 , 2007) assumes the availability of prior knowledge about inter-instance distances. [sent-46, score-0.04]
18 In this scheme, two instances are considered similar if the Mahalanobis distance between them is upper bounded, i. [sent-47, score-0.249]
19 Similarly, thweor ein ustances are considered dissimilar if the distance between them is larger than certain threshold l, i. [sent-50, score-0.093]
20 Similar instances are represented by se)t S ≥, w l. [sent-53, score-0.156]
21 In addition to prior knowledge about interinstance distances, sometimes prior information about the matrix A, denoted by A0, itself may also be available. [sent-55, score-0.27]
22 In such cases, we would like the learned matrix A to be as close as possible to the prior matrix A0. [sent-59, score-0.492]
23 ITML combines these two types of prior information, i. [sent-60, score-0.04]
24 , knowledge about inter-instance distances, and prior matrix A0, in order to learn the matrix A by solving the optimization problem shown in (2). [sent-62, score-0.42]
25 exactly solving the problem in (2) is not possible, slack variables may be introduced to the ITML objective. [sent-68, score-0.028]
26 Let X be the d n matrix of n instances in a dL-edtim Xen bseio tnhael space. [sent-73, score-0.346]
27 mOauttr oxf othfe n n instances, nl instances are labeled, while the remaining nu instances are unlabeled, with n = nl + nu. [sent-74, score-0.752]
28 Let S be ×× a n n diagonal matrix with Sii = 1iff instance xi is labeled. [sent-75, score-0.393]
29 Y is the n m matrix storing training label information, ei fn any. [sent-77, score-0.257]
30 , output moaft any classifier, dw litahdenoting score of label lat node i. [sent-80, score-0.067]
31 The ITML metric learning algorithm, which we reviewed in Section 2. [sent-82, score-0.314]
32 2, is supervised in nature, and hence it does not exploit widely available unlabeled data. [sent-83, score-0.16]
33 , 2010), a recently proposed metric learning framework which combines an existing supervised metric learning algorithm (such as ITML) along with transductive graph-based label inference to learn a new distance metric from labeled as well as unlabeled data combined. [sent-85, score-1.37]
34 In self-training styled iterations, IDML alternates between metric learning and label inference; with output of label inference used during next round of metric learning, and so on. [sent-86, score-0.769]
35 IDML starts out with the assumption that existing supervised metric learning algorithms, such Yˆ m Yˆil as ITML, can learn a better metric if the number of available labeled instances is increased. [sent-87, score-0.876]
36 Since we are focusing on the semi-supervised learning (SSL) setting with nl labeled and nu unlabeled instances, the idea is to automatically label the unlabeled instances using a graph based SSL algorithm, and then include instances with low assigned label entropy (i. [sent-88, score-1.073]
37 , high confidence label assignments) in the next round of metric learning. [sent-90, score-0.388]
38 The number of instances added in each iteration depends on the threshold β1 . [sent-91, score-0.156]
39 This process is continued until no new instances can be added to the set of labeled instances, which can happen when either all the instances are already exhausted, or when none of the remaining unlabeled instances can be assigned labels with high confidence. [sent-92, score-0.692]
40 In Line 3, any supervised metric learner, such as ITML, may be used as the METRICLEARNER. [sent-94, score-0.345]
41 Using the distance metric learned in Line 3, a new k-NN graph is constructed in Line 4 , whose edge weight matrix is stored in W. [sent-95, score-0.661]
42 , Uii = 0, ∀i) 10: return A cian, and D is a diagonal matrix with Dii = Pj Wij. [sent-103, score-0.227]
43 The constraint, = in (3) mPakes sure that labels on training instances are not changed during inference. [sent-104, score-0.19]
44 , = 0) is considered a new labeled training instance, i. [sent-107, score-0.061]
45 , Uii = 1, for next round of metric learning if the instance has been assigned labels with high confidence in the current iteration, i. [sent-109, score-0.432]
46 Finally in Line 7, training instance label) )in ≤foβr m). [sent-114, score-0.033]
47 This iterative process is continued till no new labeled instance can be added, i. [sent-116, score-0.138]
48 1 Setup SˆYˆ0, TableE1:KWDBe icaVtobcerasKoDhkcsenBritcpsonD1i74m8f35 t9eh258n4e613s 9id5oantseBaY luae ns ecd inSec- tion 3. [sent-122, score-0.028]
49 All datasets are binary with 1500 total instances in each. [sent-123, score-0.239]
50 Description of the datasets used during experiments in Section 3 are presented in Table 1. [sent-124, score-0.083]
51 The first four datasets Electronics, Books, Kitchen, and DVDs are from the sentiment domain and previously used in (Blitzer et al. [sent-125, score-0.083]
52 For details regarding features and data pre-processing, we refer the reader to the origin of these datasets cited above. [sent-128, score-0.083]
53 We experiment with the following ways of estimating transformation matrix P: Original2: We set P = I, where I the is d d identity matrix. [sent-131, score-0.23]
54 RP: The data is first projected into a lower dimensional space using the Random Projection (RP) method (Bingham and Mannila, 2001). [sent-133, score-0.16]
55 Dimensionality of the target space was set at = as prescribed in d0 ? [sent-134, score-0.056]
56 We use the projection matrix constructed by RP as P. [sent-137, score-0.293]
57 25 for the experiments in Section 3, which has the effect of projecting the data into a much lower dimensional space (84 for the experiments in this section). [sent-140, score-0.141]
58 This presents an interesting evaluation setting as we already run evaluations in much higher di- mensional space (e. [sent-141, score-0.056]
59 PCA: Data instances are first projected into a lower dimensional space using Principal Components Analysis (PCA) (Jolliffe, 2002) . [sent-144, score-0.316]
60 Following (Weinberger and Saul, 2009), dimensionality of the projected space was set at 250 for all experiments. [sent-145, score-0.232]
61 In this case, we used the projection matrix generated by PCA as P. [sent-146, score-0.293]
62 ITML: A is learned by applying ITML (see Section 2. [sent-147, score-0.072]
63 2) on the Original space (above), and then we decompose A as A = to obtain P. [sent-148, score-0.104]
64 P>P 2Note that “Original” in the results tables refers to original space with features occurring more than 20 times. [sent-149, score-0.056]
65 All hyperparameters are tuned on a separate random split. [sent-156, score-0.205]
66 All hyperparameters are tuned on a separate random split. [sent-162, score-0.205]
67 IDML-IT: A is learned by applying IDML (Algorithm 1) (see Section 2. [sent-163, score-0.072]
68 3) on the Original space (above); with ITML used as METRICLEARNER in IDML (Line 3 in Algorithm 1). [sent-164, score-0.056]
69 In this case, we treat the set of test instances (without their gold labels) as the unlabeled data. [sent-165, score-0.241]
70 In other words, we essentially work in the transductive setting (Vapnik, 2000). [sent-166, score-0.052]
71 Once again, we decompose A as A = P>P to obtain P. [sent-167, score-0.048]
72 We also experimented with the supervised large-margin metric learning algorithm (LMNN) presented in (Weinberger and Saul, 2009). [sent-168, score-0.389]
73 Each input instance, x, is now pro- jected into the transformed space as Px. [sent-170, score-0.188]
74 We now train different classifiers on this transformed space. [sent-171, score-0.132]
75 2 Supervised Classification We train a SVM classifier, with an RBF kernel, on the transformed space generated by the projection matrix P. [sent-174, score-0.481]
76 SVM hyperparameter, C and RBF kernel bandwidth, were tuned on a separate development split. [sent-175, score-0.147]
77 Experimental results with 50 and 100 labeled instances are shown in Table 2, and Table 3, respectively. [sent-176, score-0.217]
78 3 Semi-Supervised Classification In this section, we trained the GRF classifier (see Equation 3), a graph-based semi-supervised learning (SSL) algorithm (Zhu et al. [sent-180, score-0.073]
79 , 2003), using Gaussian kernel parameterized by A = P>P to set edge weights. [sent-181, score-0.07]
80 During graph construction, each node was connected to its k nearest neighbors, with k treated as a hyperparameter and tuned on a separate development set. [sent-182, score-0.21]
81 Experimental results with 50 and 100 labeled instances are shown in Table 4, and Table 5, respectively. [sent-183, score-0.217]
82 As before, we experimented with nl = 50 and nl = 100. [sent-184, score-0.314]
83 Once again, we observe that IDML-IT is the most effective method, with the GRF classifier trained on the data representation learned by IDML-IT achieving best performance in all settings. [sent-185, score-0.101]
84 All hyperparameters are tuned on a separate random split. [sent-189, score-0.205]
85 All results are averaged over using different methods (see Section 3. [sent-191, score-0.059]
86 All hyperparameters are tuned on a separate random split. [sent-194, score-0.205]
87 4 Conclusion In this paper, we compared the effectiveness of the transformed spaces learned by recently proposed supervised, and semi-supervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e. [sent-195, score-0.825]
88 Through a variety of experiments on different real-world NLP datasets, we demonstrated that supervised as well as semisupervised classifiers trained on the space learned by IDML-IT consistently result in the lowest classification errors. [sent-199, score-0.276]
89 Random projection in dimensionality reduction: applications to image and text data. [sent-210, score-0.222]
90 Distance metric learning for large margin nearest neighbor classification. [sent-261, score-0.344]
wordName wordTfidf (topN-words)
[('itml', 0.452), ('idml', 0.29), ('metric', 0.27), ('mahalanobis', 0.194), ('matrix', 0.19), ('xj', 0.179), ('pca', 0.17), ('weinberger', 0.161), ('nl', 0.157), ('instances', 0.156), ('xi', 0.133), ('transformed', 0.132), ('nu', 0.126), ('dimensionality', 0.119), ('dhillon', 0.113), ('projection', 0.103), ('saul', 0.097), ('bingham', 0.097), ('grf', 0.097), ('pxj', 0.097), ('distance', 0.093), ('ssl', 0.085), ('uii', 0.085), ('pxi', 0.085), ('unlabeled', 0.085), ('datasets', 0.083), ('davis', 0.081), ('supervised', 0.075), ('semisupervised', 0.073), ('learned', 0.072), ('tuned', 0.067), ('label', 0.067), ('cis', 0.066), ('hyperparameters', 0.065), ('dld', 0.065), ('graphlabelinf', 0.065), ('lmnn', 0.065), ('mannila', 0.065), ('metriclearner', 0.065), ('subramanya', 0.065), ('technion', 0.065), ('labeled', 0.061), ('principal', 0.061), ('euclidean', 0.059), ('spaces', 0.059), ('tr', 0.059), ('averaged', 0.059), ('rp', 0.057), ('projected', 0.057), ('shall', 0.057), ('space', 0.056), ('reduction', 0.056), ('da', 0.054), ('ten', 0.053), ('zhu', 0.052), ('talukdar', 0.052), ('transductive', 0.052), ('line', 0.051), ('round', 0.051), ('uy', 0.049), ('partha', 0.049), ('decompose', 0.048), ('dimensional', 0.047), ('rbf', 0.046), ('kitchen', 0.044), ('continued', 0.044), ('learning', 0.044), ('electronics', 0.042), ('separate', 0.042), ('prior', 0.04), ('transformation', 0.04), ('recently', 0.039), ('projecting', 0.038), ('distances', 0.038), ('kernel', 0.038), ('diagonal', 0.037), ('graph', 0.036), ('definite', 0.035), ('hyperparameter', 0.035), ('koby', 0.035), ('blitzer', 0.035), ('effectiveness', 0.034), ('labels', 0.034), ('entropy', 0.033), ('instance', 0.033), ('parameterized', 0.032), ('linearly', 0.031), ('pennsylvania', 0.031), ('springer', 0.031), ('random', 0.031), ('gaussian', 0.03), ('nearest', 0.03), ('classifier', 0.029), ('driven', 0.028), ('ecd', 0.028), ('slack', 0.028), ('bregman', 0.028), ('alukdar', 0.028), ('anew', 0.028), ('bilmes', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning
Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer
Abstract: We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective.
2 0.11773825 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
Author: Partha Pratim Talukdar ; Fernando Pereira
Abstract: Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area.
3 0.10725436 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
Author: Wenbin Jiang ; Qun Liu
Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
4 0.086773895 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
Author: Yejin Choi ; Claire Cardie
Abstract: Automatic opinion recognition involves a number of related tasks, such as identifying the boundaries of opinion expression, determining their polarity, and determining their intensity. Although much progress has been made in this area, existing research typically treats each of the above tasks in isolation. In this paper, we apply a hierarchical parameter sharing technique using Conditional Random Fields for fine-grained opinion analysis, jointly detecting the boundaries of opinion expressions as well as determining two of their key attributes polarity and intensity. Our experimental results show that our proposed approach improves the performance over a baseline that does not — exploit hierarchical structure among the classes. In addition, we find that the joint approach outperforms a baseline that is based on cascading two separate components.
5 0.086056903 66 acl-2010-Compositional Matrix-Space Models of Language
Author: Sebastian Rudolph ; Eugenie Giesbrecht
Abstract: We propose CMSMs, a novel type of generic compositional models for syntactic and semantic aspects of natural language, based on matrix multiplication. We argue for the structural and cognitive plausibility of this model and show that it is able to cover and combine various common compositional NLP approaches ranging from statistical word space models to symbolic grammar formalisms.
6 0.081400394 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
7 0.070862792 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning
8 0.064558335 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
9 0.057457287 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers
10 0.053707313 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
11 0.050238818 141 acl-2010-Identifying Text Polarity Using Random Walks
12 0.047686704 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications
13 0.046974588 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data
14 0.046598379 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
15 0.046456862 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion
16 0.045277763 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
17 0.045080625 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
18 0.044243492 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
19 0.043018512 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
20 0.042668737 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
topicId topicWeight
[(0, -0.136), (1, 0.029), (2, -0.023), (3, 0.04), (4, 0.001), (5, -0.019), (6, 0.022), (7, 0.017), (8, 0.019), (9, 0.054), (10, -0.103), (11, 0.045), (12, 0.023), (13, -0.073), (14, -0.019), (15, -0.005), (16, 0.041), (17, -0.028), (18, 0.004), (19, -0.005), (20, 0.078), (21, 0.096), (22, -0.04), (23, -0.002), (24, -0.046), (25, 0.008), (26, 0.035), (27, 0.032), (28, 0.045), (29, 0.078), (30, -0.032), (31, 0.032), (32, -0.055), (33, 0.056), (34, -0.042), (35, 0.111), (36, -0.096), (37, 0.005), (38, -0.042), (39, -0.099), (40, -0.07), (41, -0.171), (42, 0.138), (43, -0.124), (44, 0.03), (45, 0.002), (46, 0.045), (47, -0.059), (48, -0.128), (49, -0.072)]
simIndex simValue paperId paperTitle
same-paper 1 0.95782739 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning
Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer
Abstract: We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective.
2 0.60592234 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
Author: Partha Pratim Talukdar ; Fernando Pereira
Abstract: Graph-based semi-supervised learning (SSL) algorithms have been successfully used to extract class-instance pairs from large unstructured and structured text collections. However, a careful comparison of different graph-based SSL algorithms on that task has been lacking. We compare three graph-based SSL algorithms for class-instance acquisition on a variety of graphs constructed from different domains. We find that the recently proposed MAD algorithm is the most effective. We also show that class-instance extraction can be significantly improved by adding semantic information in the form of instance-attribute edges derived from an independently developed knowledge base. All of our code and data will be made publicly available to encourage reproducible research in this area.
3 0.57283527 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers
Author: Anders Sogaard
Abstract: Most attempts to train part-of-speech taggers on a mixture of labeled and unlabeled data have failed. In this work stacked learning is used to reduce tagging to a classification task. This simplifies semisupervised training considerably. Our prefered semi-supervised method combines tri-training (Li and Zhou, 2005) and disagreement-based co-training. On the Wall Street Journal, we obtain an error reduction of 4.2% with SVMTool (Gimenez and Marquez, 2004).
4 0.54684424 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
Author: Ruihong Huang ; Ellen Riloff
Abstract: This research explores the idea of inducing domain-specific semantic class taggers using only a domain-specific text collection and seed words. The learning process begins by inducing a classifier that only has access to contextual features, forcing it to generalize beyond the seeds. The contextual classifier then labels new instances, to expand and diversify the training set. Next, a cross-category bootstrapping process simultaneously trains a suite of classifiers for multiple semantic classes. The positive instances for one class are used as negative instances for the others in an iterative bootstrapping cycle. We also explore a one-semantic-class-per-discourse heuristic, and use the classifiers to dynam- ically create semantic features. We evaluate our approach by inducing six semantic taggers from a collection of veterinary medicine message board posts.
5 0.46009898 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
Author: Joseph Turian ; Lev-Arie Ratinov ; Yoshua Bengio
Abstract: If we take an existing supervised NLP system, a simple and general way to improve accuracy is to use unsupervised word representations as extra word features. We evaluate Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeddings of words on both NER and chunking. We use near state-of-the-art supervised baselines, and find that each of the three word representations improves the accuracy of these baselines. We find further improvements by combining different word representations. You can download our word features, for off-the-shelf use in existing NLP systems, as well as our code, here: http ://metaoptimize com/proj ects/wordreprs/ .
6 0.44797522 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning
7 0.44674259 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
8 0.44358435 141 acl-2010-Identifying Text Polarity Using Random Walks
9 0.43845049 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
10 0.437022 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
11 0.42424291 66 acl-2010-Compositional Matrix-Space Models of Language
12 0.40845391 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
13 0.40844077 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models
14 0.40361193 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing
15 0.39356858 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
16 0.39027172 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
17 0.38870028 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
18 0.38195807 151 acl-2010-Intelligent Selection of Language Model Training Data
19 0.37830123 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.
20 0.37574005 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
topicId topicWeight
[(25, 0.046), (42, 0.02), (44, 0.011), (59, 0.098), (71, 0.459), (73, 0.035), (78, 0.03), (83, 0.069), (84, 0.018), (98, 0.127)]
simIndex simValue paperId paperTitle
same-paper 1 0.76729584 161 acl-2010-Learning Better Data Representation Using Inference-Driven Metric Learning
Author: Paramveer S. Dhillon ; Partha Pratim Talukdar ; Koby Crammer
Abstract: We initiate a study comparing effectiveness of the transformed spaces learned by recently proposed supervised, and semisupervised metric learning algorithms to those generated by previously proposed unsupervised dimensionality reduction methods (e.g., PCA). Through a variety of experiments on different realworld datasets, we find IDML-IT, a semisupervised metric learning algorithm to be the most effective.
2 0.75090855 157 acl-2010-Last but Definitely Not Least: On the Role of the Last Sentence in Automatic Polarity-Classification
Author: Israela Becker ; Vered Aharonson
Abstract: Two psycholinguistic and psychophysical experiments show that in order to efficiently extract polarity of written texts such as customerreviews on the Internet, one should concentrate computational efforts on messages in the final position of the text.
3 0.67898542 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models
Author: Tanel Alumae ; Mikko Kurimo
Abstract: We investigate a recently proposed Bayesian adaptation method for building style-adapted maximum entropy language models for speech recognition, given a large corpus of written language data and a small corpus of speech transcripts. Experiments show that the method consistently outperforms linear interpolation which is typically used in such cases.
4 0.4279415 78 acl-2010-Cross-Language Text Classification Using Structural Correspondence Learning
Author: Peter Prettenhofer ; Benno Stein
Abstract: We present a new approach to crosslanguage text classification that builds on structural correspondence learning, a recently proposed theory for domain adaptation. The approach uses unlabeled documents, along with a simple word translation oracle, in order to induce taskspecific, cross-lingual word correspondences. We report on analyses that reveal quantitative insights about the use of unlabeled data and the complexity of interlanguage correspondence modeling. We conduct experiments in the field of cross-language sentiment classification, employing English as source language, and German, French, and Japanese as target languages. The results are convincing; they demonstrate both the robustness and the competitiveness of the presented ideas.
5 0.41995037 80 acl-2010-Cross Lingual Adaptation: An Experiment on Sentiment Classifications
Author: Bin Wei ; Christopher Pal
Abstract: In this paper, we study the problem of using an annotated corpus in English for the same natural language processing task in another language. While various machine translation systems are available, automated translation is still far from perfect. To minimize the noise introduced by translations, we propose to use only key ‘reliable” parts from the translations and apply structural correspondence learning (SCL) to find a low dimensional representation shared by the two languages. We perform experiments on an EnglishChinese sentiment classification task and compare our results with a previous cotraining approach. To alleviate the problem of data sparseness, we create extra pseudo-examples for SCL by making queries to a search engine. Experiments on real-world on-line review data demonstrate the two techniques can effectively improvetheperformancecomparedtoprevious work.
6 0.4128328 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
7 0.39296517 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project
8 0.39261419 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging
9 0.39163119 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
10 0.39079657 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision
11 0.38904566 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
12 0.38849786 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
13 0.38838279 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification
14 0.3877579 214 acl-2010-Sparsity in Dependency Grammar Induction
15 0.38696635 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization
16 0.3860603 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
17 0.37433869 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
18 0.37366664 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
19 0.37272519 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
20 0.37215263 169 acl-2010-Learning to Translate with Source and Target Syntax