paper-mining acl knowledge-graph by maker-knowledge-mining
Association for Computational Linguistics
The Association for Computational Linguistics (ACL) is the international scientific and professional society for people working on problems involving natural language and computation. An annual meeting is held each summer in locations where significant computational linguistics research is carried out. It was founded in 1962, originally named the Association for Machine Translation and Computational Linguistics (AMTCL). It became the ACL in 1968.
The ACL has a European (EACL) and a North American (NAACL) chapter.
The ACL journal, Computational Linguistics, is the primary forum for research on computational linguistics and natural language processing. Since 1988, the journal has been published for the ACL by MIT Press.
The ACL book series, Studies in Natural Language Processing, is published by Cambridge University Press.
Each year ACL and its chapters organize international conferences in different countries. ACL 2013 was held in Sofia, Bulgaria.
Special Interest Groups
ACL has a large number of Special Interest Groups (SIGs), focusing on specific areas of natural language processing. Some current SIGs within ACL are:
Linguistic Annotation: SIGANN
SIGBIOMED
Linguistic data and corpus-based approaches: SIGDAT
Dialogue Processing: SIGDIAL
SIGFSM
Natural Language Generation: SIGGEN
Chinese Language Processing: SIGHAN
Language Technologies for the Socio-Economic Sciences and the Humanities: SIGHUM
Lexicon: SIGLEX (SIGLEX is the umbrella organization for the SemEval semantic evaluations and SENSEVAL word-sense evaluation exercises.)
Mathematics of Language: SIGMOL
Machine Translation: SIGMT
Natural Language Learning: SIGNLL
Natural Language Parsing: SIGPARSE
Computational Morphology and Phonology: SIGMORPHON
Computational Semantics: SIGSEM
Speech & Language Processing for Assistive Technologies: SIGSLPAT
Computational Approaches to Semitic Languages: SEMITIC
Web as Corpus SIGWAC
from wiki http://en.wikipedia.org/wiki/Association_for_Computational_Linguistics
What is Computational Linguistics?
Computational linguistics is the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena. These models may be "knowledge-based" ("hand-crafted") or "data-driven" ("statistical" or "empirical"). Work in computational linguistics is in some cases motivated from a scientific perspective in that one is trying to provide a computational explanation for a particular linguistic or psycholinguistic phenomenon; and in other cases the motivation may be more purely technological in that one wants to provide a working component of a speech or natural language system. Indeed, the work of computational linguists is incorporated into many working systems today, including speech recognition systems, text-to-speech synthesizers, automated voice response systems, web search engines, text editors, language instruction materials, to name just a few.
Popular computational linguistics textbooks include:
Christopher Manning and Hinrich Schütze (1999) Foundations of Statistical Natural Language Processing, Cambridge, Massachusetts, USA. MIT Press.
Also see the book's supplemental materials website at Stanford.
Daniel Jurafsky and James Martin (2008) An Introduction to Natural Language Processing,
Computational Linguistics, and Speech Recognition, Second Edition. Prentice Hall.
from aclweb http://aclweb.org/what-is-cl
Forty Years of ACL Meetings, 1963-2002
After a foundational meeting on June 13, 1962, The Association for Machine Translation and Computational Linguistics (AMTCL) held its first Annual Meeting at the Denver Hilton on Aug. 25-26, 1963, on the two days preceeding the national conference of the Association for Computing Machinery. The proceedings of this first AMTCL meeting were published later in the journal Mechanical Translation 7(2). The association changed its name in 1968 to the Association for Computational Linguistics (ACL), but without affecting the continuity of the organization. So ACL meetings have been held annually since 1963, for what now amounts to forty years, and we now have the Proceedings of the Fortieth Annual Meeting of the ACL.
Victor Yngve of the University of Chicago was the first program chair and first Association president. Other officers of the founding board were David Hayes, Vice-President, and Harry Josselson, Secretary-Treasurer, who were joined by Ida Rhodes, Paul Garvin, and Winfred Lehman as council members. Richard See, Anthony Oettinger and Sydney Lamb constituted the nominating committee. The published purpose of the organization was to "encourage high standards by sponsoring meetings, publication and other exchange of information."
It is remarkable that this group of scientists had the foresight and faith in computational linguistics (CL) to found the ACL and conduct annual meetings as early as 1963, but, in fact the founding meeting was preceeded by ten years of discussion on studying language computationally, dating back to a first meeting on CL convened by Yehoshua Bar-Hillel at the M.I.T. faculty club on June 17-20, 1952. Victor Yngve relates how that first meeting led to the founding of the ACL in his contribution to a twentieth anniversary celebration in the 1982 ACL Proceedings (pp.92ff). One of the fantastic results of Steven Bird's ACL Anthology effort is that all of this sort of material will now be easily available. The ACL's history will no longer languish on older members' shelves.
In many ways the annual meetings are the heart and soul of the ACL, and the Proceedings therefore the best record of the remarkable scientific activities and accomplishments of its members. There are many reasons for this. First, the association's journal (begun in 1974) publishes a smaller number of longer papers, and therefore illuminates a narrower band of the research results. But second and more importantly, ACL has a (controversial) tradition of insisting on an unusual level of quality in its contributed papers. The typical ACL call for papers solicits contributions "on substantial, original, and unpublished research on all aspects of computational linguistics" and stresses that "[papers] should emphasize completed work rather than intended work."
The stringent formulation is reflected in an acceptance rate many journals envy, typically around 25% during those years for which statistics are available (most years since 1983). As a consequence, researchers in CL typically view ACL Proceedings as archival, i.e., the sort of publication which need not be duplicated in a journal. Unfortunately, this has not been reflected in general availability until the current Anthology, another reason to celebrate its advent. The selectivity of the conference has been a matter of pride for many members, but it is also controversial, first because the acceptance of a contributed paper is a condition for many members for having their conference travel costs reimbursed, so that selectivity tends to depress conference attendance rates, and, less importantly, because external evaluations of research in academia -- which are unaware of the ACL's selectivity -- may discount contributions in conference proceedings entirely as evidence of scientific contribution. A solution to the selectivity dilemma has emerged over the past five years, and this has involved adding specialized satellite workshops to the main program of the conference, whose publication is separate. There has likewise been experimentation with demonstration and poster sessions, and again, these have normally been distinguished in publication. In this way publication in the Proceedings continues to carry professional prestige, even while larger numbers are attracted to the annual conference. This electronic publication is welcome among other reasons for the uniform accessibility it gives to the large mass of ACL "grey literature" alongside the Proceedings.
The concerns of the research have narrowed over the years, even while some very general issues recur frequently. While early researchers had a sense of probing not only language, but also the nature of symbolic computation, artificial intelligence, and even human intelligence considered philosophically and psychologically, later volumes restrict themselves much more soberly to language and a linguistic perspective. On the other hand, a very consistent thread throughout the forty years of ACL proceedings is the conviction of its researchers that CL is of enormous practical use -- even very theoretical papers often express their wish to help realize the dream of the machine that communicates in natural language, with all the practical benefits that will bring with it. It is gratifying to see papers from more recent years celebrate the ways in which language technology is being applied practically. The general issue of the proper balance of practical and applied work may not often be addressed explicitly, but one certainly gets the sense that it has been present all along, with significant swings occasionally in the one or the other direction.
Since 1997 ACL has taken great pains to involve all of the world's computational linguistics in its conferences, and this has increased interest and attendance again. Mitch Marcus, Eva Hajicova, Phil Cohen and Wolfgang Wahlster played especially important roles in this. The increased interest motivates in yet another way efforts to improve the easy availability of scientific literature, certainly for all those who are now discovering the ACL's forty years of research on language and computation. The ACL should not hide its light under a bushel!
John Nerbonne, ACL President 2002 Last Revision: June 4, 2002
from aclweb http://aclweb.org/anthology//docs/acl.html
Computational linguistics
Computational linguistics is an interdisciplinary field concerned with the statistical or rule-based modeling of natural language from a computational perspective.
Traditionally, computational linguistics was usually performed by computer scientists who had specialized in the application of computers to the processing of a natural language. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists, computer scientists, experts in artificial intelligence, mathematicians, logicians, philosophers, cognitive scientists, cognitive psychologists, psycholinguists, anthropologists and neuroscientists, among others.
Computational linguistics has theoretical and applied components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science, and applied computational linguistics focuses on the practical outcome of modeling human language use.
Origins
Computational linguistics as a field predates artificial intelligence, a field under which it is often grouped. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since computers can make arithmetic calculations much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language.
When machine translation (also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed. Computational linguistics was born as the name of the new field of study devoted to developing algorithms and software for intelligently processing language data. When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages.[citation needed]
In order to translate one language into another, it was observed that one had to understand the grammar of both languages, including both morphology (the grammar of word forms) and syntax (the grammar of sentence structure). In order to understand syntax, one had to also understand the semantics and the lexicon (or 'vocabulary'), and even to understand something of the pragmatics of language use. Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.
Nowadays research within the scope of computational linguistics is done at computational linguistics departments, computational linguistics laboratories, computer science departments, and linguistics departments. Some research in the field of computational linguistics aim to create working speech or text processing systems while others aim to create a system allowing human-machine interaction. Programs meant for human-machine communication are called conversational agents.
Approaches
Just as computational linguistics can be performed by experts in a variety of fields, and through a wide assortment of departments, so too can the research fields broach a diverse range of topics. The following sections discuss some of the literature available across the entire field broken into four main area of discourse: developmental linguistics, structural linguistics, linguistic production, and linguistic comprehension.
Developmental Approaches
Language is a skill which develops throughout the life of an individual. This developmental process has been examined using a number of techniques, and a computational approach is one of them. Human language development does provide some constraints which make it feasible to apply a computational method to understanding it. For instance, during language acquisition, human children are largely only exposed to positive evidence. This means that during the linguistic development of an individual, only evidence for what is a correct form is provided, and not evidence for what is not correct. This is insufficient information for a simple hypothesis testing procedure for information as complex as language, and so provides certain boundaries for a computational approach to modeling language development and acquisition in an individual.
Attempts have been made to model the developmental process of language acquisition in children from a computational angle. Work in this realm has also been proposed as a method to explain the evolution of language through history. Using models, it has been shown that languages can be learned most efficiently with a combination of simple input at first presented incrementally and the child develops better memory and longer attention span. This was simultaneously posed as a reason for the long developmental period of human children.[13] Both conclusions were drawn because of the strength of the neural network which the project created.
The ability of infants to develop language has also been modeled using robots in order to test linguistic theories. Enabled to learn as children might, a model was created based on an affordance model in which mappings between actions, perceptions, and effects were created and linked to spoken words. Crucially, these robots were able to acquire functioning word-to-meaning mappings without needing grammatical structure, vastly simplifying the learning process and shedding light on information which furthers the current understanding of linguistic development. It is important to note that this information could only have been empirically tested using a computational approach.
As our understanding of the linguistic development of an individual within a lifetime is continually improved using neural networks and learning robotic systems, it is also important to keep in mind that languages themselves change and develop through time. Computational approaches to understanding this phenomenon have unearthed very interesting information. Using the Price Equation and Pólya urn dynamics, researchers have created a system which not only predicts future linguistic evolution, but also gives insight into the evolutionary history of modern day languages. This modeling effort achieved, through computational linguistics, what would otherwise have been impossible.
It is clear that the understanding of linguistic development in humans as well as throughout evolutionary time has been fantastically improved because of advances in computational linguistics. The ability to model and modify systems at will affords science an ethical method of testing hypotheses that would otherwise be intractable.
Structural Approaches
In order to create better computational models of language, an understanding of language’s structure is crucial. To this end, the English language has been meticulously studied using computational approaches to better understand how the language works on a structural level. One of the most important pieces of being able to study linguistic structure is the availability of large linguistic corpora. This grants computational linguists the raw data necessary to run their models and gain a better understanding of the underlying structures present in the vast amount of data which is contained in any single language. One of the most cited English linguistic corpora is the Penn Treebank.[16] Containing over 4.5 million words of American English, this corpus has been annotated for part-of-speech information. This type of annotated corpus allows other researchers to apply hypotheses and measures that would otherwise be impossible to perform.
Theoretical approaches to the structure of languages have also been submitted. These works allow computational linguistics to have a framework within which to work out hypotheses that will further the understanding of the language in a myriad of ways. One of the original theoretical theses on internalization of grammar and structure of language proposed two types of models. In these models, rules or patterns learned increase in strength with the frequency of their encounter. The work also created a question for computational linguists to answer: how does an infant learn a specific and non-normal grammar (Chomsky Normal Form) without learning an overgeneralized version and getting stuck? Theoretical efforts like these set the direction for research to go early in the lifetime of a field of study, and are crucial to the growth of the field.
Structural information about languages allows for the discovery and implementation of similarity recognition between pairs of text utterances. For instance, it has recently been proven that based on the structural information present in patterns of human discourse, conceptual recurrence plots can be used to model and visualize trends in data and create reliable measures of similarity between natural textual utterances. This technique is a strong tool for further probing the structure of human discourse. Without the computational approach to this question, the vastly complex information present in discourse data would have remained inaccessible to scientists.
Information regarding the structural data of a language is not simply available for English, but can also be found in other languages, such as Japanese. Using computational methods, Japanese sentence corpora were analyzed and a pattern of log-normality was found in relation to sentence length. Though the exact cause of this lognormality remains unknown, it is precisely this sort of intriguing information which computational linguistics is designed to uncover. This information could lead to further important discoveries regarding the underlying structure of Japanese, and could have any number of effects on the understanding of Japanese as a language. Computational linguistics allows for very exciting additions to the scientific knowledge base to happen quickly and with very little room for doubt.
Without a computational approach to the structure of linguistic data, much of the information that is available now would still be hidden under the vastness of data within any single language. Computational linguistics allows scientists to parse huge amounts of data reliably and efficiently, creating the possibility for discoveries unlike any seen in most other approaches.
Production Approaches
The production of language is equally as complex in the information it provides and the necessary skills which a fluent producer must have. That is to say, comprehension is only half the battle of communication. The other half is how a system produces language, and computational linguistics has made some very interesting discoveries in this area.
In a now famous paper published in 1950 Alan Turing proposed the possibility that machines might one day have the ability to "think". As a thought experiment for what might define the concept of thought in machines, he proposed an "imitation test" in which a human subject has two text-only conversations, one with a fellow human and another with a machine attempting to respond like a human. Turing proposes that if the subject cannot tell the difference between the human and the machine, it may be concluded that the machine is capable of thought. Today this test is known as the Turing test and it remains an influential idea in the area of artificial intelligence.
One of the earliest and best known examples of a computer program designed to converse naturally with humans is the ELIZA program developed by Joseph Weizenbaum at MIT in 1966. The program emulated a Rogerian psychotherapist when responding to written statements and questions posed by a user. It appeared capable of understanding what was said to it and responding intelligently, but in truth it simply followed a pattern matching routine that relied on only understanding a few keywords in each sentence. Its responses were generated by recombining the unknown parts of the sentence around properly translated versions of the known words. For example in the phrase "It seems that you hate me" ELIZA understands "you" and "me" which matches the general pattern "you [some words] me", allowing ELIZA to update the words "you" and "me" to "I" and "you" and replying "What makes you think I hate you?". In this example ELIZA has no understanding of the word "hate", but it is not required for a logical response in the context of this type of psychotherapy.
Some projects are still trying to solve the problem which first started computational linguistics off as its own field in the first place. However, the methods have become more refined and clever, and consequently the results generated by computational linguists have become more enlightening. In an effort to improve computer translation, several models have been compared, including hidden Markov models, smoothing techniques, and the specific refinements of those to apply them to verb translation. The model which was found to produce the most natural translations of German and French words was a refined alignment model with a first-order dependence and a fertility model[16]. They also provide efficient training algorithms for the models presented, which can give other scientists the ability to improve further on their results. This type of work is specific to computational linguistics, and has applications which could vastly improve understanding of how language is produced and comprehended by computers.
Work has also been done in making computers produce language in a more naturalistic manner. Using linguistic input from humans, algorithms have been constructed which are able to modify a system’s style of production based on a factor such as linguistic input from a human, or more abstract factors like politeness or any of the five main dimensions of personality. This work takes a computational approach via parameter estimation models to categorize the vast array of linguistic styles we see across individuals and simplify it for a computer to work in the same way, making human-computer interaction much more natural.
Comprehension Approaches
Much of the focus of modern computational linguistics is on comprehension. WIth the proliferation of the internet and the abundance of easily accessible written human language, the ability to create a program capable of understanding human language would have many broad and exciting possibilities, including improved search engines, automated customer service, and online education.
Early work in comprehension included applying Bayesian statistics to the task of optical character recognition, as illustrated by Bledsoe and Browing in 1959 in which a large dictionary of possible letters were generated by "learning" from example letters and then the probability that any one of those learned examples matched the new input was combined to make a final decision. Other attempts at applying Bayesian statistics to language analysis included the work of Mosteller and Wallace (1963) in which an analysis of the words used in the Federalist papers was used to attempt to determine their authorship (concluding that Madison most likely authored the majority of the papers).
In 1979 Terry Winograd developed an early natural language processing engine capable of interpreting naturally written commands within a simple rule governed environment. The primary language parsing program in this project was called SHRDLU, which was capable of carrying out a somewhat natural conversation with the user giving it commands, but only within the scope of the toy environment designed for the task. This environment consisted of different shaped and colored blocks, and SHRDLU was capable of interpreting commands such as "Find a block which is taller than the one you are holding and put it into the box." and asking questions such as "I don't understand which pyramid you mean." in response to the user's input. While impressive, this kind of natural language processing has proven much more difficult outside the limited scope of the toy environment. Similarly a project developed by NASA called LUNAR was designed to provide answers to naturally written questions about the geological analysis of lunar rocks returned by the Apollo missions. These kinds of problems are referred to as question answering.
Initial attempts at understanding spoken language were based on work done in the 1960s and 70s in signal modeling where an unknown signal is analyzed to look for patterns and to make predictions based on its history. An initial and somewhat successful approach to applying this kind of signal modeling to language was achieved with the use of hidden Markov models as detailed by Rabiner in 1989. This approach attempts to determine probabilities for the arbitrary number of models that could be being used in generating speech as well as modeling the probabilities for various words generated from each of these possible models. Similar approaches were employed in early speech recognition attempts starting in the late 70s at IBM using word/part-of-speech pair probabilities.
More recently these kinds of statistical approaches have been applied to more difficult tasks such as topic identification using Bayesian parameter estimation to infer topic probabilities in text documents.
Subfields
Computational linguistics can be divided into major areas depending upon the medium of the language being processed, whether spoken or textual; and upon the task being performed, whether analyzing language (recognition) or synthesizing language (generation).
Speech recognition and speech synthesis deal with how spoken language can be understood or created using computers. Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together. Machine translation remains the sub-division of computational linguistics dealing with having computers translate between languages. The possibility of automatic language translation, however, has yet to be realized and remains a notorious branch of computational linguistics.
Some of the areas of research that are studied by computational linguistics include:
Computational complexity of natural language, largely modeled on automata theory, with the application of context-sensitive grammar and linearly bounded Turing machines.
Computational semantics comprises defining suitable logics for linguistic meaning representation, automatically constructing them and reasoning with them
Computer-aided corpus linguistics, which has been used since the 1970s as a way to make detailed advances in the field of discourse analysis
Design of parsers or chunkers for natural languages
Design of taggers like POS-taggers (part-of-speech taggers)
Machine translation as one of the earliest and most difficult applications of computational linguistics draws on many subfields.
Simulation and study of language evolution in historical linguistics/glottochronology.
The Association for Computational Linguistics defines computational linguistics as:
...the scientific study of language from a computational perspective. Computational linguists are interested in providing computational models of various kinds of linguistic phenomena.
from wiki http://en.wikipedia.org/wiki/Computational_linguistics