acl acl2013 acl2013-14 knowledge-graph by maker-knowledge-mining

14 acl-2013-A Novel Classifier Based on Quantum Computation

Source: pdf

Author: Ding Liu ; Xiaofang Yang ; Minghu Jiang

Abstract: In this article, we propose a novel classifier based on quantum computation theory. Different from existing methods, we consider the classification as an evolutionary process of a physical system and build the classifier by using the basic quantum mechanics equation. The performance of the experiments on two datasets indicates feasibility and potentiality of the quantum classifier.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 j iang mh@mai l Abstract In this article, we propose a novel classifier based on quantum computation theory. [sent-4, score-0.943]

2 Different from existing methods, we consider the classification as an evolutionary process of a physical system and build the classifier by using the basic quantum mechanics equation. [sent-5, score-1.233]

3 The performance of the experiments on two datasets indicates feasibility and potentiality of the quantum classifier. [sent-6, score-0.876]

4 1 Introduction Taking modern natural science into account, the quantum mechanics theory (QM) is one of the most famous and profound theory which brings a world-shaking revolution for physics. [sent-7, score-1.038]

5 Since QM was born, it has been considered as a significant part of theoretic physics and has shown its power in explaining experimental results. [sent-8, score-0.022]

6 Furthermore, some scientists believe that QM is the final principle of physics even the whole natural science. [sent-9, score-0.069]

7 Thus, more and more researchers have expanded the study of QM in other fields of science, and it has affected almost every aspect of natural science and technology deeply, such as quantum computation. [sent-10, score-0.847]

8 The principle of quantum computation has also affected a lot of scientific researches in computer science, specifically in computational modeling, cryptography theory as well as information theory. [sent-11, score-0.99]

9 Some researchers have employed the principle and technology of quantum computation to improve the studies on Machine Learning (ML) (Aїmeur et al. [sent-12, score-0.935]

10 , 2008; Gambs, 2008; Horn and Gottlieb, 2001 ; Nasios and Bors, 2007), a field which studies theories and constructions of systems that can learn from data, among which classification is a typical task. [sent-15, score-0.065]

11 cn build a computational model based on quantum computation theory to handle classification tasks in order to prove the feasibility of applying the QM model to machine learning. [sent-21, score-0.984]

12 In this article, we present a method that considers the classifier as a physical system amenable to QM and treat the entire process of classification as the evolutionary process of a closed quantum system. [sent-22, score-1.168]

13 According to QM, the evolution of quantum system can be described by a unitary operator. [sent-23, score-1.086]

14 Therefore, the primary problem of building a quantum classifier (QC) is to find the correct or optimal unitary operator. [sent-24, score-1.159]

15 We applied classical optimization algorithms to deal with the problem, and the experimental results have confirmed our theory. [sent-25, score-0.073]

16 First, the basic principle and structure of QC is introduced in section 2. [sent-27, score-0.062]

17 2 Basic principle of quantum classifier As we mentioned in the introduction, the major principle of quantum classifier (QC) is to consider the classifier as a physical system and the whole process of classification as the evolutionary process of a closed quantum system. [sent-30, score-3.03]

18 Thus, the evolution of the quantum system can be described by a unitary operator (unitary matrix), and the remaining job is to find the correct or optimal unitary operator. [sent-31, score-1.448]

19 1 Architecture of quantum classifier The architecture and the whole procedure of data processing of QC are illustrated in Figure 1. [sent-33, score-0.921]

20 As is shown, the key aspect of QC is the optimization part where we employ the optimization algorithm to find an optimal unitary operator ? [sent-34, score-0.436]

21 Architecture of quantum classifier The detailed information about each phase of the process will be explained thoroughly in the fol- lowing sections. [sent-40, score-0.941]

22 2 Encode input state and target state In quantum mechanics theory, the state of a physical system can be described as a superposition of the so called eigenstates which are orthogonal. [sent-42, score-1.244]

23 Any state, including the eigenstate, can be represented by a complex number vector. [sent-43, score-0.025]

24 We use Dirac’s braket notation to formalize the data as equation 1: |? [sent-44, score-0.045]

25 ) Based on the hypothesis that QC can be considered as a quantum system, the input data should be transformed to an available format in quantum theory — the complex number vector. [sent-79, score-1.721]

26 According to Euler’s formula, a complex number z can be denoted as ? [sent-80, score-0.025]

27 denote the module and the phase of the complex coefficient respectively. [sent-103, score-0.065]

28 Specifically, in our experiment, we assigned the term frequency, a feature frequently used in text classification to ? [sent-109, score-0.05]

29 as a constant, since we found the phase makes little contribution to the classification. [sent-113, score-0.04]

30 , we calculate the corresponding input complex number vector by equation 3, which is illustrated in detail in Figure 2. [sent-121, score-0.088]

31 Process of calculating the input state Each eigenstate |? [sent-139, score-0.142]

32 As is mentioned above, the evolutionary process of a closed physical system can be described by a unitary operator, depicted by a matrix as in equation 4: |? [sent-151, score-0.547]

33 ⟩ denote the final state and the initial state respectively. [sent-158, score-0.086]

34 The approach to determine the unitary operator will be discussed in 485 section 2. [sent-159, score-0.324]

35 Like the Vector Space Model(VSM), we use a label matrix to represent each class as in Figure 3. [sent-162, score-0.05]

36 , we generate the corresponding target complex number vector according to equation? [sent-171, score-0.025]

37 3 Finding the Hamiltonian matrix and the Unitary operator As is mentioned in the first section, finding a unitary operator to describe the evolutionary process is the vital step in building a QC. [sent-201, score-0.544]

38 As a basic quantum mechanics theory, a unitary operator can be represented by a unitary matrix with the property ? [sent-202, score-1.554]

39 , and a unitary operator can also be written as equation 6: ? [sent-207, score-0.369]

40 (6) where H is the Hamiltonian matrix and ℏ is the reduced Planck constant. [sent-213, score-0.05]

41 Moreover, the Hamiltonian H is a Hermitian matrix with the property ? [sent-214, score-0.05]

42 The remaining job, therefore, is to find an optimal Hamiltonian matrix. [sent-220, score-0.018]

43 free real parameters, provided that the dimension of H is (m+w). [sent-224, score-0.021]

44 Thus, the problem of determining H can be regarded as a classical optimization problem, which can be resolved by various optimization algorithms (Chen and Kudlek, 2001). [sent-225, score-0.12]

45 denoting the target, input, and output state respectively, and ? [sent-256, score-0.043]

46 ⟩ (8) In the optimization phase, we employed several optimization algorithm, including BFGS, Generic Algorithm, and a multi-objective optimization algorithm SQP (sequential quadratic programming) to optimize the error function. [sent-269, score-0.141]

47 We compared the performance of QC with several classi- cal classification methods, including Support Vector Machine (SVM) and K-nearest neighbor (KNN). [sent-274, score-0.05]

48 2, we evaluated the performance on multi-class classification using an oral conversation datasets and analyzed the results. [sent-276, score-0.165]

49 1 Reuters-21578 The Reuters dataset we tested contains 3,964 texts belonging to “earnings” category and 8,938 texts belonging to “others” categories. [sent-278, score-0.059]

50 In this classification task, we selected the features by calculating the ? [sent-279, score-0.067]

51 score of each term from the “earnings” category (Manning and Schütze, 2002). [sent-281, score-0.021]

52 For the convenience of counting, we adopted 3,900 “earnings” documents and 8,900 “others” documents and divided them into two groups: the training pool and the testing sets. [sent-282, score-0.089]

53 Since we focused on the performance of QC trained by small-scale training sets in our experiment, we each selected 1,000 samples from the “earnings” and the “others” category as our training pool and took the rest of the samples (2,900 “earnings” and 7,900 “others” documents) as our testing sets. [sent-283, score-0.249]

54 We randomly selected training samples from the training pool ten times to train QC, SVM, and KNN classifier respectively and then verified the three trained classifiers on the testing sets, the results of which are illustrated in Figure 4. [sent-284, score-0.247]

55 We noted that the QC performed better than both KNN and SVM on small-scale training sets, when the number of training samples is less than 50. [sent-285, score-0.09]

56 Classification accuracy for Reuters21578 datasets Generally speaking, the QC trained by a large training set may not always has an ideal performance. [sent-287, score-0.041]

57 Whereas some single training sample pair led to a favorable result when we used only one sample from each category to train the QC. [sent-288, score-0.071]

58 Actually, some single samples could lead to an accuracy of more than 90%, while some others may produce an accuracy lower than 30%. [sent-289, score-0.072]

59 Therefore, the most significant factor for QC is the quality of the training samples rather than the quantity. [sent-290, score-0.07]

60 2 Oral conversation datasets Besides the binary QC, we also built a multiclass version and tested its performance on an oral conversation dataset which was collected by the Laboratory of Computational Linguistics of Tsinghua university. [sent-292, score-0.154]

61 We still took the term frequency as the feature, the dimension of which exceeded 1,000. [sent-294, score-0.056]

62 We, therefore, utilized the primary component analysis (PCA) to reduce the high dimension of the features in order to de- crease the computational complexity. [sent-295, score-0.058]

63 In this experiment, we chose the top 10 primary components of the outcome of PCA, which contained nearly 60% information of the original data. [sent-296, score-0.022]

64 Again, we focused on the performance of QC trained by small-scale training sets. [sent-297, score-0.02]

65 We selected 100 samples from each class to construct the training pool and took the rest of the data as the testing sets. [sent-298, score-0.158]

66 1, we randomly selected the training samples from the training pool ten times to train QC, SVM, and KNN classifier respectively and veri fied the models on the testing sets, the results of which are shown in Figure 5. [sent-300, score-0.247]

67 Classification accuracy for oral conversation datasets 4 Discussion We present here our model of text classification and compare it with SVM and KNN on two datasets. [sent-302, score-0.165]

68 We find that it is feasible to build a supervised learning model based on quantum mechanics theory. [sent-303, score-0.93]

69 Previous studies focus on combining quantum method with existing classification models such as neural network (Chen et al. [sent-304, score-0.895]

70 Our work, however, focuses on developing a novel method which explores the relationship between machine learning model with physical world, in order to investigate these models by physical rule which describe our universe. [sent-306, score-0.206]

71 Moreover, the QC performs well in text classification compared with SVM and KNN and outperforms them on small-scale training sets. [sent-307, score-0.07]

72 Additionally, the time complexity of QC depends on the optimization algorithm and the amounts of features we adopt. [sent-308, score-0.047]

73 Generally speaking, simulating quantum computing on classical computer always requires more computation resources, and we believe that quantum computer will tackle the difficulty in the forthcoming future. [sent-309, score-1.729]

74 Actually, Google and NASA have launched a quantum computing AI lab this year, and we regard the project as an exciting beginning. [sent-310, score-0.879]

75 Future studies include: We hope to find a more suitable optimization algorithm for QC and a more reasonable physical explanation towards the “quantum nature” of the QC. [sent-311, score-0.165]

76 We hope our attempt will shed some light upon the application of quantum theory into the field of machine learning. [sent-312, score-0.881]

77 Tsinghua University Self-determination Research Project (201 11081023 & 201 11081010) and Human & liberal arts development foundation (2010WKHQ009) References Esma Aїmeur, Gilles Brassard, and Sébastien Gambs. [sent-314, score-0.034]

78 An efficient quantum neuro-fuzzy classifier based on fuzzy entropy and compensatory operation. [sent-341, score-0.884]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('quantum', 0.83), ('qc', 0.283), ('unitary', 0.235), ('physical', 0.103), ('qm', 0.1), ('mechanics', 0.1), ('operator', 0.089), ('knn', 0.083), ('eigenstate', 0.082), ('eigenstates', 0.082), ('hamiltonian', 0.082), ('meur', 0.082), ('earnings', 0.075), ('evolutionary', 0.064), ('bastien', 0.061), ('nasios', 0.061), ('oral', 0.055), ('classifier', 0.054), ('samples', 0.05), ('pool', 0.05), ('classification', 0.05), ('matrix', 0.05), ('optimization', 0.047), ('principle', 0.047), ('equation', 0.045), ('state', 0.043), ('tsinghua', 0.043), ('computation', 0.043), ('brassard', 0.041), ('esma', 0.041), ('hermitian', 0.041), ('sqp', 0.041), ('phase', 0.04), ('conversation', 0.039), ('bors', 0.036), ('theory', 0.036), ('svm', 0.034), ('closed', 0.033), ('gilles', 0.033), ('horn', 0.033), ('pca', 0.031), ('classical', 0.026), ('complex', 0.025), ('feasibility', 0.025), ('chen', 0.024), ('arxiv', 0.023), ('physics', 0.022), ('others', 0.022), ('primary', 0.022), ('evolution', 0.021), ('category', 0.021), ('joseph', 0.021), ('dimension', 0.021), ('datasets', 0.021), ('training', 0.02), ('job', 0.02), ('architecture', 0.019), ('experiment', 0.019), ('testing', 0.019), ('speaking', 0.019), ('took', 0.019), ('belonging', 0.019), ('optimal', 0.018), ('illustrated', 0.018), ('hamburg', 0.018), ('arts', 0.018), ('assaf', 0.018), ('dirac', 0.018), ('fukumoto', 0.018), ('fumiyo', 0.018), ('kernelbased', 0.018), ('nasa', 0.018), ('profound', 0.018), ('revolution', 0.018), ('veri', 0.018), ('lab', 0.017), ('process', 0.017), ('affected', 0.017), ('calculating', 0.017), ('canadian', 0.017), ('cryptography', 0.017), ('yoshimi', 0.017), ('foundation', 0.016), ('ten', 0.016), ('launched', 0.016), ('deeply', 0.016), ('exceeded', 0.016), ('exciting', 0.016), ('iang', 0.016), ('inghua', 0.016), ('nikolaos', 0.016), ('sample', 0.015), ('studies', 0.015), ('crease', 0.015), ('nielsen', 0.015), ('planck', 0.015), ('shed', 0.015), ('basic', 0.015), ('article', 0.014), ('sch', 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 14 acl-2013-A Novel Classifier Based on Quantum Computation

Author: Ding Liu ; Xiaofang Yang ; Minghu Jiang

2 0.035124052 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis

Author: Burak Kerim AkkuÅ� ; Ruket Cakici

Abstract: Morphologically rich languages such as Turkish may benefit from morphological analysis in natural language tasks. In this study, we examine the effects of morphological analysis on text categorization task in Turkish. We use stems and word categories that are extracted with morphological analysis as main features and compare them with fixed length stemmers in a bag of words approach with several learning algorithms. We aim to show the effects of using varying degrees of morphological information.

3 0.033338752 263 acl-2013-On the Predictability of Human Assessment: when Matrix Completion Meets NLP Evaluation

Author: Guillaume Wisniewski

Abstract: This paper tackles the problem of collecting reliable human assessments. We show that knowing multiple scores for each example instead of a single score results in a more reliable estimation of a system quality. To reduce the cost of collecting these multiple ratings, we propose to use matrix completion techniques to predict some scores knowing only scores of other judges and some common ratings. Even if prediction performance is pretty low, decisions made using the predicted score proved to be more reliable than decision based on a single rating of each example.

4 0.032313094 342 acl-2013-Text Classification from Positive and Unlabeled Data using Misclassified Data Correction

Author: Fumiyo Fukumoto ; Yoshimi Suzuki ; Suguru Matsuyoshi

Abstract: This paper addresses the problem of dealing with a collection of labeled training documents, especially annotating negative training documents and presents a method of text classification from positive and unlabeled data. We applied an error detection and correction technique to the results of positive and negative documents classified by the Support Vector Machines (SVM). The results using Reuters documents showed that the method was comparable to the current state-of-the-art biasedSVM method as the F-score obtained by our method was 0.627 and biased-SVM was 0.614.

5 0.031688984 351 acl-2013-Topic Modeling Based Classification of Clinical Reports

Author: Efsun Sarioglu ; Kabir Yadav ; Hyeong-Ah Choi

Abstract: Kabir Yadav Emergency Medicine Department The George Washington University Washington, DC, USA kyadav@ gwu . edu Hyeong-Ah Choi Computer Science Department The George Washington University Washington, DC, USA hcho i gwu . edu @ such as recommending the need for a certain medical test while avoiding intrusive tests or medical Electronic health records (EHRs) contain important clinical information about pa- tients. Some of these data are in the form of free text and require preprocessing to be able to used in automated systems. Efficient and effective use of this data could be vital to the speed and quality of health care. As a case study, we analyzed classification of CT imaging reports into binary categories. In addition to regular text classification, we utilized topic modeling of the entire dataset in various ways. Topic modeling of the corpora provides interpretable themes that exist in these reports. Representing reports according to their topic distributions is more compact than bag-of-words representation and can be processed faster than raw text in subsequent automated processes. A binary topic model was also built as an unsupervised classification approach with the assumption that each topic corresponds to a class. And, finally an aggregate topic classifier was built where reports are classified based on a single discriminative topic that is determined from the training dataset. Our proposed topic based classifier system is shown to be competitive with existing text classification techniques and provides a more efficient and interpretable representation.

6 0.027454112 283 acl-2013-Probabilistic Domain Modelling With Contextualized Distributional Semantic Vectors

7 0.025930984 225 acl-2013-Learning to Order Natural Language Texts

8 0.025927052 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

9 0.024985181 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain

10 0.024852009 131 acl-2013-Dual Training and Dual Prediction for Polarity Classification

11 0.02451219 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

12 0.024342682 142 acl-2013-Evolutionary Hierarchical Dirichlet Process for Timeline Summarization

13 0.023937861 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

14 0.023802971 81 acl-2013-Co-Regression for Cross-Language Review Rating Prediction

15 0.023506496 154 acl-2013-Extracting bilingual terminologies from comparable corpora

16 0.023106474 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

17 0.023084344 309 acl-2013-Scaling Semi-supervised Naive Bayes with Feature Marginals

18 0.022683568 292 acl-2013-Question Classification Transfer

19 0.022206048 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

20 0.021862457 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.066), (1, 0.014), (2, -0.005), (3, -0.006), (4, 0.013), (5, -0.023), (6, 0.029), (7, -0.0), (8, -0.008), (9, 0.001), (10, -0.004), (11, -0.005), (12, -0.012), (13, -0.001), (14, -0.021), (15, -0.002), (16, -0.045), (17, 0.02), (18, -0.021), (19, -0.027), (20, 0.032), (21, 0.017), (22, -0.001), (23, 0.028), (24, 0.013), (25, 0.018), (26, 0.0), (27, -0.018), (28, 0.033), (29, 0.018), (30, 0.008), (31, 0.019), (32, 0.045), (33, -0.003), (34, -0.026), (35, 0.018), (36, -0.054), (37, 0.01), (38, -0.037), (39, 0.036), (40, -0.017), (41, 0.046), (42, -0.011), (43, -0.011), (44, -0.044), (45, 0.023), (46, 0.044), (47, 0.003), (48, 0.004), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84382558 14 acl-2013-A Novel Classifier Based on Quantum Computation

Author: Ding Liu ; Xiaofang Yang ; Minghu Jiang

2 0.57678843 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia

Author: Oliver Ferschke ; Iryna Gurevych ; Marc Rittberger

Abstract: With the increasing amount of user generated reference texts in the web, automatic quality assessment has become a key challenge. However, only a small amount of annotated data is available for training quality assessment systems. Wikipedia contains a large amount of texts annotated with cleanup templates which identify quality flaws. We show that the distribution of these labels is topically biased, since they cannot be applied freely to any arbitrary article. We argue that it is necessary to consider the topical restrictions of each label in order to avoid a sampling bias that results in a skewed classifier and overly optimistic evaluation results. . We factor out the topic bias by extracting reliable training instances from the revision history which have a topic distribution similar to the labeled articles. This approach better reflects the situation a classifier would face in a real-life application.

3 0.54936147 163 acl-2013-From Natural Language Specifications to Program Input Parsers

Author: Tao Lei ; Fan Long ; Regina Barzilay ; Martin Rinard

Abstract: We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0% F-Score accu- , racy compared to an F-Score of 66.7% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing).1

4 0.53836107 220 acl-2013-Learning Latent Personas of Film Characters

Author: David Bamman ; Brendan O'Connor ; Noah A. Smith

Abstract: We present two latent variable models for learning character types, or personas, in film, in which a persona is defined as a set of mixtures over latent lexical classes. These lexical classes capture the stereotypical actions of which a character is the agent and patient, as well as attributes by which they are described. As the first attempt to solve this problem explicitly, we also present a new dataset for the text-driven analysis of film, along with a benchmark testbed to help drive future work in this area.

5 0.52986175 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing

Author: Dan Goldwasser ; Dan Roth

Abstract: Semantic parsing is a domain-dependent process by nature, as its output is defined over a set of domain symbols. Motivated by the observation that interpretation can be decomposed into domain-dependent and independent components, we suggest a novel interpretation model, which augments a domain dependent model with abstract information that can be shared by multiple domains. Our experiments show that this type of information is useful and can reduce the annotation effort significantly when moving between domains.

6 0.51678658 356 acl-2013-Transfer Learning Based Cross-lingual Knowledge Extraction for Wikipedia

7 0.51373458 321 acl-2013-Sign Language Lexical Recognition With Propositional Dynamic Logic

8 0.50903302 370 acl-2013-Unsupervised Transcription of Historical Documents

9 0.50307405 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization

10 0.49657539 72 acl-2013-Bridging Languages through Etymology: The case of cross language text categorization

11 0.49537301 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics

12 0.48596641 89 acl-2013-Computerized Analysis of a Verbal Fluency Test

13 0.48274094 78 acl-2013-Categorization of Turkish News Documents with Morphological Analysis

14 0.48114538 191 acl-2013-Improved Bayesian Logistic Supervised Topic Models with Data Augmentation

15 0.46840793 340 acl-2013-Text-Driven Toponym Resolution using Indirect Supervision

16 0.46661291 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data

17 0.46211258 277 acl-2013-Part-of-speech tagging with antagonistic adversaries

18 0.45533669 310 acl-2013-Semantic Frames to Predict Stock Price Movement

19 0.45201021 248 acl-2013-Modelling Annotator Bias with Multi-task Gaussian Processes: An Application to Machine Translation Quality Estimation

20 0.45162457 192 acl-2013-Improved Lexical Acquisition through DPP-based Verb Clustering

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.035), (6, 0.031), (11, 0.034), (15, 0.024), (24, 0.042), (26, 0.036), (35, 0.046), (42, 0.052), (48, 0.056), (70, 0.056), (80, 0.37), (88, 0.032), (90, 0.017), (95, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75558716 14 acl-2013-A Novel Classifier Based on Quantum Computation

Author: Ding Liu ; Xiaofang Yang ; Minghu Jiang

2 0.6863547 227 acl-2013-Learning to lemmatise Polish noun phrases

Author: Adam Radziszewski

Abstract: We present a novel approach to noun phrase lemmatisation where the main phase is cast as a tagging problem. The idea draws on the observation that the lemmatisation of almost all Polish noun phrases may be decomposed into transformation of singular words (tokens) that make up each phrase. We perform evaluation, which shows results similar to those obtained earlier by a rule-based system, while our approach allows to separate chunking from lemmatisation.

3 0.64065123 91 acl-2013-Connotation Lexicon: A Dash of Sentiment Beneath the Surface Meaning

Author: Song Feng ; Jun Seok Kang ; Polina Kuznetsova ; Yejin Choi

Abstract: Understanding the connotation of words plays an important role in interpreting subtle shades of sentiment beyond denotative or surface meaning of text, as seemingly objective statements often allude nuanced sentiment of the writer, and even purposefully conjure emotion from the readers’ minds. The focus of this paper is drawing nuanced, connotative sentiments from even those words that are objective on the surface, such as “intelligence ”, “human ”, and “cheesecake ”. We propose induction algorithms encoding a diverse set of linguistic insights (semantic prosody, distributional similarity, semantic parallelism of coordination) and prior knowledge drawn from lexical resources, resulting in the first broad-coverage connotation lexicon.

4 0.52820468 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures

Author: Jose G. Moreno ; Gael Dias ; Guillaume Cleuziou

Abstract: Post-retrieval clustering is the task of clustering Web search results. Within this context, we propose a new methodology that adapts the classical K-means algorithm to a third-order similarity measure initially developed for NLP tasks. Results obtained with the definition of a new stopping criterion over the ODP-239 and the MORESQUE gold standard datasets evidence that our proposal outperforms all reported text-based approaches.

5 0.52722067 135 acl-2013-English-to-Russian MT evaluation campaign

Author: Pavel Braslavski ; Alexander Beloborodov ; Maxim Khalilov ; Serge Sharoff

Abstract: This paper presents the settings and the results of the ROMIP 2013 MT shared task for the English→Russian language directfioorn. t Teh Een quality Rofu generated utraagnsel datiiroencswas assessed using automatic metrics and human evaluation. We also discuss ways to reduce human evaluation efforts using pairwise sentence comparisons by human judges to simulate sort operations.

6 0.5232504 196 acl-2013-Improving pairwise coreference models through feature space hierarchy learning

7 0.34597397 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation

8 0.34208268 275 acl-2013-Parsing with Compositional Vector Grammars

9 0.34017694 80 acl-2013-Chinese Parsing Exploiting Characters

10 0.33877432 318 acl-2013-Sentiment Relevance

11 0.33747503 225 acl-2013-Learning to Order Natural Language Texts

12 0.33681649 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study

13 0.33624142 329 acl-2013-Statistical Machine Translation Improves Question Retrieval in Community Question Answering via Matrix Factorization

14 0.33578122 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing

15 0.33575958 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

16 0.33537841 224 acl-2013-Learning to Extract International Relations from Political Context

17 0.33505884 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

18 0.33501303 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering

19 0.33460358 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

20 0.33425638 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction