acl acl2010 acl2010-64 knowledge-graph by maker-knowledge-mining

64 acl-2010-Complexity Assumptions in Ontology Verbalisation


Source: pdf

Author: Richard Power

Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Complexity assumptions in ontology verbalisation Richard Power Department of Computing Open University, UK r . [sent-1, score-0.348]

2 uk Abstract We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i. [sent-4, score-0.105]

3 , combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. [sent-6, score-1.047]

4 We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers. [sent-7, score-0.351]

5 In this paper we uncover and test some assumptions on which this latter approach is based. [sent-13, score-0.068]

6 Historically, ontology verbalisation evolved from a more general tradition (predating OWL and the Semantic Web) that aimed to support knowledge formation by automatic interpretation of texts authored in Controlled Natural Languages (Fuchs and Schwitter, 1995). [sent-14, score-0.312]

7 With the advent of OWL, some of these CNLs were rapidly adapted to the new opportunity: part of Attempto Controlled English (ACE) was mapped to OWL (Kaljurand and Fuchs, 2007), and Processable English (PENG) evolved to Sydney OWL Syntax (SOS) (Cregan et al. [sent-16, score-0.032]

8 In addition, new CNLs were developed specifically for editing OWL ontologies, such as Rabbit (Hart et al. [sent-18, score-0.074]

9 In detail, these CNLs display some variations: thus an inclusion relationship between the classes Admiral and Sailor would be expressed by the pattern ‘Admirals are a type of sailor’ in CLOnE, ‘Every admiral is a kind of sailor’ in Rabbit, and ‘Every admiral is a sailor’ in ACE and SOS. [sent-21, score-0.82]

10 How— ever, at the level of general strategy, all the CNLs rely on the same set of assumptions concerning the mapping from natural to formal language; for convenience we will refer to these assumptions as the consensus model. [sent-22, score-0.278]

11 In brief, the consensus model assumes that when an ontology is verbalised in natural language, axioms are expressed by sentences, and atomic terms are expressed by entries from the lexicon. [sent-23, score-1.098]

12 In the remainder of this paper we first describe the consensus model in more detail, then show that although 132 UppsalaP,r Sowce ed ein ,g 1s1 o-f16 th Jeu AlyC 2L0 210 1. [sent-25, score-0.117]

13 iCDlceP1:CISOurlonWbmatpseLCmrVlAasotenucsyAetOsirfWoFt(enriLCOotfmenDxi()poCPnre(DCas)Pionsb) in principle it is vulnerable to both the problems just mentioned, in practice these problems almost never arise. [sent-28, score-0.06]

14 2 Consensus model Atomic terms in OWL (or any other language implementing description logic) are principally of three kinds, denoting either individuals, classes or properties1 . [sent-29, score-0.168]

15 Individuals denote entities in the domain, such as Horatio Nelson or the Battle of Trafalgar; classes denote sets of entities, such as people or battles; and properties denote relations between individuals, such as the relation victor of between a person and a battle. [sent-30, score-0.255]

16 From these basic terms, a wide range of com- plex expressions may be constructed for classes, properties and axioms, of which some common examples are shown in table 1. [sent-31, score-0.058]

17 The upper part of the table presents two class constructors (C and D denote any classes; P denotes any property); by combining them we could build the following expression denoting the class of persons that command fleets2: Person u ∃ CommanderOf. [sent-32, score-0.177]

18 Fleet The lower half of the table presents three axiom patterns for making statements about classes and individuals (a, b denote individuals); examples of their usage are as follows: 1. [sent-33, score-0.516]

19 [Nelson, Trafalgar] ∈ VictorOf Note that since class expressions contain classes as constituents, they can become indefinitely complex. [sent-37, score-0.217]

20 For inst 1If data properties are used, there will also be terms for data types and literals (e. [sent-39, score-0.111]

21 2In description logic notation, the constructor C u D formIsn t dhee cinritpertisoencti loong cof n otwtaoti ocnl,ass these acnonds tcroucrrtoerspo Cnd us tDo Boolean conjunction, while the existential restriction ∃P. [sent-42, score-0.101]

22 C fBooromlse tnhe c ncljuasnsc oiofn ,in wdihviliedu thaels hxaisvtienngt tlh ree s rterliactti oonn ∃PP tCo one or more members of class C. [sent-43, score-0.046]

23 Fleet denotes the set ofT ihnudsivi Pdeuralsso xn s uuc h∃ that x is a person and x commands one or more fleets. [sent-45, score-0.13]

24 we could replace atomic class A by a constructed class, thus obtaining perhaps (A1 u A2) u B, and so on ad infinitum. [sent-46, score-0.269]

25 Moreover, sinuce A m)o ust Bax,i aonmd patterns contain classes as constituents, they too can become indefinitely complex. [sent-47, score-0.218]

26 This sketch of knowledge representation in OWL illustrates the central distinction between logical functors (e. [sent-48, score-0.132]

27 , 2010), and atomic terms for individuals, classes and properties (e. [sent-51, score-0.375]

28 Perhaps the fundamental design decision of the Semantic Web is that all domain terms remain unstandardised, leaving ontology developers free to conceptualise the domain in any way they see fit. [sent-54, score-0.347]

29 In the consensus verbalisation model, this distinction is reflected by divid- ing linguistic resources into a generic grammar for realising logical patterns, and an ontology-specific lexicon for realising atomic terms. [sent-55, score-0.663]

30 Consider for instance C v D, the axiom patternC ofnors icdleasrs f oinrcl iunsstioannc. [sent-56, score-0.219]

31 eT hCis purely logical pattern can often be mapped (following ACE and SOS) to the sentence pattern ‘Every [C] is a [D]’, where C and D will be realised by count nouns from the lexicon if they are atomic, or further grammatical rules if they are complex. [sent-57, score-0.208]

32 D can be expressed better by a sentence pattern Dbas ceadn on a pvererbss efrdam beet (‘Every [C] [P]s a [D]’). [sent-59, score-0.06]

33 All these mappings depend entirely on the OWL logical functors, and will work with any lexicalisation of atomic terms that respects the syntactic constraints of the grammar, to yield verbalisations such as the following (for axioms 1-3 above): 1. [sent-60, score-0.751]

34 The CNLs we have cited are more sophisticated than this, allowing a wider range of linguistic patterns (e. [sent-66, score-0.072]

35 , adjectives for classes), but the basic assumptions are the same. [sent-68, score-0.068]

36 The model provides satisfactory verbalisations for the simple examples considered so far, but what happens when the axioms and atomic terms become more complex? [sent-69, score-0.707]

37 3 Complex terms and axioms The distribution ofcontent among axioms depends to some extent on stylistic decisions by ontology developers, in particular with regard to ax133 iom size. [sent-70, score-0.924]

38 This freedom is possible because description logics (including OWL) allow equivalent formulations using a large number of short axioms at one extreme, and a small number of long ones at the other. [sent-71, score-0.44]

39 For many logical patterns, rules can be stated for amalgamating or splitting axioms while leaving overall content unchanged (thus ensuring that exactly the same inferences are drawn by a reasoning engine); such rules are often used in reasoning algorithms. [sent-72, score-0.465]

40 For instance, any set of SubClassOf axioms can be amalgamated into a single ‘metaconstraint’ (Horrocks, 1997) of the form v M, where is the class containing afollr mind >ivid vua Mls ,in w thheer domain, ean cdla sMs iosn a icnliansgs to which any individual respecting the axiom set must belong3. [sent-73, score-0.612]

41 Applying this transformation even to only two axioms (verbalised by 1 and 2 below) will yield an outcome (verbalised by 3) that strains human comprehension: > > 1. [sent-74, score-0.347]

42 ither a non-admiral or something that commands a An example of axiom-splitting rules is found in a computational complexity proof for the description logic EL+ (Baader et al. [sent-80, score-0.181]

43 However, this simplification of v u v v axiom structure can be achieved only by introducing new atomic terms. [sent-86, score-0.442]

44 For example, to simplify an axiom of the form A1 v ∃P. [sent-87, score-0.219]

45 (A2 u A3), the rewriting rules must introduvce a new teurm A A23 ≡ A2 u A3, through which the axiom may be rewri≡tten as AA1 v ∃P. [sent-88, score-0.242]

46 A23 (along with some further axioms expressing Athe definition of A23); depending on the expressions that they replace, the content of such terms may become indefinitely complex. [sent-89, score-0.481]

47 We can often find rules for refactoring an overcomplex axiom by a number of simpler ones, but only at the cost of introducing atomic terms for which no satisfactory lexical realisation may exist. [sent-91, score-0.534]

48 In principle, therefore, there is no guarantee that OWL ontologies 3For an axiom set C1 v D1, C2 v D2 . [sent-92, score-0.324]

49 , whevre tDhe class construct(o¬rsC ¬Ct D(co)mp ule (m¬eCnt otf CD) and C t D (union of C and D) tcoorrsre ¬sCpon (dco tmo pBleoomleeannt onfeg Ca)ti aonnd a Cnd t tdi Dsju (unnctiioonn. [sent-98, score-0.046]

50 o Figure 1: Identifier content can be verbalised transparently within the assumptions of the consensus model. [sent-99, score-0.368]

51 4 Empirical studies of usage We have shown that OWL syntax will permit atomic terms that cannot be lexicalised, and axioms that cannot be expressed clearly in a sentence. [sent-100, score-0.661]

52 However, it remains possible that in practice, ontology developers use OWL in a constrained manner that favours verbalisation by the consensus model. [sent-101, score-0.514]

53 This could happen either because the relevant constraints are psychologically intuitive to developers, or because they are somehow built into the editing tools that they use (e. [sent-102, score-0.074]

54 To investigate this possibility, we have carried out an exploratory study using a corpus of 48 ontologies mostly downloaded from the University of Manchester TONES repository (TONES, 2010). [sent-105, score-0.105]

55 Overall, our sample contains around 45,000 axioms and 25,000 atomic terms. [sent-107, score-0.57]

56 Our first analysis concerns identifier length, which we measure simply by counting the number of words in the identifying phrase. [sent-108, score-0.125]

57 The program recovers the phrase by the following steps: (1) read an identifier (or label if one is provided4); (2) strip off the namespace prefix; (3) segment the resulting string into words. [sent-109, score-0.104]

58 For the third step we 4Some ontology developers use ‘non-semantic’ identifiers such as # 0 0 0 12 3, in which case the meaning ofthe identifier is indicated in an annotation assertion linking the identifier to a label. [sent-110, score-0.546]

59 , batt l o f t ra falgar, Batt leO fT ra falgar), a e rule that holds (in our corpus) almost without exception. [sent-113, score-0.046]

60 Thus for example the axioms Admiral v Sailor and Dog v Animal are bmosth A rdemduicreadl tvo tShaei floorrm an CA v CA, nwimherael the symbol CA means ‘any atomic vclas Cs term’ . [sent-115, score-0.57]

61 In this way we can count the frequencies of all the logical patterns in the corpus, abstracting from the domain-specific identifier names. [sent-116, score-0.248]

62 The results (table 2) show an overwhelming focus on a small number of simple logical patterns5. [sent-117, score-0.072]

63 Concerning class constructors, the most common by far were intersection (C u C) and existential restrictwioenre (∃P. [sent-118, score-0.086]

64 C) was rreicl-- atively rare, so tihvaetr sfaolr example nth (e∀ pattern CA ∀PA. [sent-120, score-0.038]

65 v ∀P 5Most of these patterns have been explained already; the others are disjoint classes (CA uCA v ⊥), equivalent classes (CA ≡ CA u ∃PA. [sent-123, score-0.228]

66 ≡ ≡In C Cthe ulat∃tPer patte)rn a, nDdA d dtean porotepse ar yda atass perrotipoenr t(y[I, ,wLh]ic ∈h differs from an object property (PA) in that it ranges over literals (L) rather than individuals (I). [sent-125, score-0.16]

67 D means ‘Every admiral commands a fleet’, C vI ∀f CP. [sent-127, score-0.438]

68 Dme mane a‘nEsve ‘Eryv eardym aidrmal icroaml cmomanmdas nodnsly a f fl eeeetts’’, C(thi vs w ∀Pil . [sent-129, score-0.028]

69 Drem waiilnl mtrueaen ni f‘ Esvoemrye aa ddmmiirraalls c doom nmota ncdoms omnlaynd fl anything at all). [sent-130, score-0.028]

70 The preference for simple patterns was confirmed by an analysis of argument structure for the OWL functors (e. [sent-131, score-0.156]

71 Overall, 85% of arguments were atomic terms rather than complex class expressions. [sent-134, score-0.333]

72 Interestingly, there was also a clear effect of argument position, with the first argument of a functor being atomic rather than complex in as many as 99. [sent-135, score-0.294]

73 5 Discussion Our results indicate that although in principle the consensus model cannot guarantee transparent realisations, in practice these are almost always attainable, since ontology developers overwhelmingly favour terms and axioms with relatively simple content. [sent-137, score-0.937]

74 An admiral is defined as a person that commands a fleet However, since identifiers containing 3-4 words are fairly common (figure 1), we need to consider whether these formulations will remain transparent when combined with more complex lexical en- tries. [sent-145, score-0.654]

75 For instance, a travel ontology in our corpus contains an axiom (fitting pattern 4) which our prototype verbalises as follows: 4’ . [sent-146, score-0.469]

76 These conclusions are based on analysis of identifier and axiom patterns in a corpus ofontologies; they need to be complemented by studies showing that the resulting verbalisations are understood by ontology developers and other users. [sent-151, score-0.769]

77 Description logics as ontology languages for the semantic web. [sent-160, score-0.226]

78 Rabbit: Developing a control natural language for authoring ontologies. [sent-177, score-0.032]

79 OWL 2 web ontology language: Structural specification and functional-style syntax. [sent-205, score-0.224]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('owl', 0.428), ('axioms', 0.347), ('admiral', 0.341), ('atomic', 0.223), ('axiom', 0.219), ('ontology', 0.189), ('sailor', 0.182), ('nelson', 0.137), ('verbalised', 0.137), ('individuals', 0.123), ('consensus', 0.117), ('developers', 0.117), ('controlled', 0.116), ('cnls', 0.114), ('ontologies', 0.105), ('identifier', 0.104), ('commands', 0.097), ('schwitter', 0.091), ('trafalgar', 0.091), ('verbalisation', 0.091), ('fuchs', 0.08), ('realising', 0.08), ('classes', 0.078), ('editing', 0.074), ('logical', 0.072), ('patterns', 0.072), ('fleet', 0.068), ('indefinitely', 0.068), ('subclassof', 0.068), ('tones', 0.068), ('verbalisations', 0.068), ('yorkshire', 0.068), ('assumptions', 0.068), ('manchester', 0.064), ('functors', 0.06), ('rabbit', 0.06), ('clone', 0.06), ('kaljurand', 0.06), ('prot', 0.06), ('hart', 0.051), ('ca', 0.048), ('west', 0.046), ('class', 0.046), ('batt', 0.046), ('cregan', 0.046), ('falgar', 0.046), ('funk', 0.046), ('horridge', 0.046), ('horrocks', 0.046), ('knublauch', 0.046), ('motik', 0.046), ('pil', 0.046), ('rolf', 0.046), ('sos', 0.046), ('swat', 0.046), ('transparently', 0.046), ('victorof', 0.046), ('terms', 0.041), ('attempto', 0.04), ('baader', 0.04), ('cnd', 0.04), ('constructors', 0.04), ('existential', 0.04), ('victor', 0.039), ('pattern', 0.038), ('principle', 0.037), ('logics', 0.037), ('literals', 0.037), ('realised', 0.037), ('boundary', 0.036), ('web', 0.035), ('overwhelmingly', 0.034), ('properties', 0.033), ('person', 0.033), ('logic', 0.033), ('authoring', 0.032), ('evolved', 0.032), ('experiences', 0.032), ('identifiers', 0.032), ('transparent', 0.032), ('ace', 0.032), ('april', 0.031), ('syntax', 0.028), ('every', 0.028), ('description', 0.028), ('formulations', 0.028), ('satisfactory', 0.028), ('fl', 0.028), ('expressions', 0.025), ('formal', 0.025), ('argument', 0.024), ('denote', 0.024), ('travel', 0.023), ('rules', 0.023), ('complex', 0.023), ('practice', 0.023), ('expressed', 0.022), ('boolean', 0.022), ('denoting', 0.021), ('concerns', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 64 acl-2010-Complexity Assumptions in Ontology Verbalisation

Author: Richard Power

Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.

2 0.070757784 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

Author: Aarne Ranta ; Krasimir Angelov ; Thomas Hallgren

Abstract: This is a system demo for a set of tools for translating texts between multiple languages in real time with high quality. The translation works on restricted languages, and is based on semantic interlinguas. The underlying model is GF (Grammatical Framework), which is an open-source toolkit for multilingual grammar implementations. The demo will cover up to 20 parallel languages. Two related sets of tools are presented: grammarian’s tools helping to build translators for new domains and languages, and translator’s tools helping to translate documents. The grammarian’s tools are designed to make it easy to port the technique to new applications. The translator’s tools are essential in the restricted language context, enabling the author to remain in the fragments recognized by the system. The tools that are demonstrated will be ap- plied and developed further in the European project MOLTO (Multilingual On-Line Translation) which has started in March 2010 and runs for three years. 1 Translation Needs for the Web The best-known translation tools on the web are Google translate1 and Systran2. They are targeted to consumers of web documents: users who want to find out what a given document is about. For this purpose, browsing quality is sufficient, since the user has intelligence and good will, and understands that she uses the translation at her own risk. Since Google and Systran translations can be grammatically and semantically flawed, they don’t reach publication quality, and cannot hence be used by the producers of web documents. For instance, the provider of an e-commerce site cannot take the risk that the product descriptions or selling conditions have errors that change the original intentions. There are very few automatic translation systems actually in use for producers of information. As already 1www .google . com/t rans l e at 2www. systransoft . com noted by Bar-Hillel (1964), machine translation is one of those AI-complete tasks that involves a trade-off between coverage and precision, and the current mainstream systems opt for coverage. This is also what web users expect: they want to be able to throw just anything at the translation system and get something useful back. Precision-oriented approaches, the prime example of which is METEO (Chandioux 1977), have not been popular in recent years. However, from the producer’s point of view, large coverage is not essential: unlike the consumer’s tools, their input is predictable, and can be restricted to very specific domains, and to content that the producers themselves are creating in the first place. But even in such tasks, two severe problems remain: • • The development cost problem: a large amount oTfh ew dorekv eisl onpemedeendt f coors building tmra:n asl laatorgrse afomr new domains and new languages. The authoring problem: since the method does nTohte ew aourkth foorri nalgl input, etmhe: :asu tihnocer othfe eth me source toexest of translation may need special training to write in a way that can be translated at all. These two problems have probably been the main obstacles to making high-quality restricted language translation more wide-spread in tasks where it would otherwise be applicable. We address these problems by providing tools that help developers of translation systems on the one hand, and authors and translators—i.e. the users of the systems—on the other. In the MOLTO project (Multilingual On-Line Translation)3, we have the goal to improve both the development and use of restricted language translation by an order of magnitude, as compared with the state of the art. As for development costs, this means that a system for many languages and with adequate quality can be built in a matter of days rather than months. As for authoring, this means that content production does not require the use of manuals or involve trial and error, both of which can easily make the work ten times slower than normal writing. In the proposed system demo, we will show how some of the building blocks for MOLTO can already now be used in web-based translators, although on a 3 www.molto-project .eu 66 UppsalaP,r Sowceeeddenin,g 1s3 o Jfu tlhye 2 A0C1L0. 2 ?c 01200 S1y0s Atesmso Dcieamtioonns ftorart Cioonms,p puatagteiso 6n6a–l7 L1in,guistics Figure 1: A multilingual GF grammar with reversible mappings from a common abstract syntax to the 15 languages currently available in the GF Resource Grammar Library. smaller scale as regards languages and application domains. A running demo system is available at http : / / grammat i cal framework .org : 4 1 9 6. 2 2 Multilingual Grammars The translation tools are based on GF, Grammatical Framework4 (Ranta 2004). GF is a grammar formalism—that is, a mathematical model of natural language, equipped with a formal notation for writing grammars and a computer program implementing parsing and generation which are declaratively defined by grammars. Thus GF is comparable with formalism such as HPSG (Pollard and Sag 1994), LFG (Bresnan 1982) or TAG (Joshi 1985). The novel feature of GF is the notion of multilingual grammars, which describe several languages simultaneously by using a common representation called abstract syntax; see Figure 1. In a multilingual GF grammar, meaning-preserving translation is provided as a composition of parsing and generation via the abstract syntax, which works as an interlingua. This model of translation is different from approaches based on other comparable grammar formalisms, such as synchronous TAGs (Shieber and Schabes 1990), Pargram (Butt & al. 2002, based on LFG), LINGO Matrix (Bender and Flickinger 2005, based on HPSG), and CLE (Core Language Engine, Alshawi 1992). These approaches use transfer rules between individual languages, separate for each pair of languages. Being interlingua-based, GF translation scales up linearly to new languages without the quadratic blowup of transfer-based systems. In transfer-based sys- 4www.grammaticalframework.org tems, as many as n(n − 1) components (transfer functtieomnss), are naeneyde ads nto( cover a)l cl language pairs nisnf bero tfhu ndci-rections. In an interlingua-based system, 2n + 1components are enough: the interlingua itself, plus translations in both directions between each language and the interlingua. However, in GF, n + 1 components are sufficient, because the mappings from the abstract syntax to each language (the concrete syntaxes) are reversible, i.e. usable for both generation and parsing. Multilingual GF grammars can be seen as an implementation of Curry’s distinction between tectogrammatical and phenogrammatical structure (Curry 1961). In GF, the tectogrammatical structure is called abstract syntax, following standard computer science terminology. It is defined by using a logical framework (Harper & al. 1993), whose mathematical basis is in the type theory of Martin-L o¨f (1984). Two things can be noted about this architecture, both showing im- provements over state-of-the-art grammar-based translation methods. First, the translation interlingua (the abstract syntax) is a powerful logical formalism, able to express semantical structures such as context-dependencies and anaphora (Ranta 1994). In particular, dependent types make it more expressive than the type theory used in Montague grammar (Montague 1974) and employed in the Rosetta translation project (Rosetta 1998). Second, GF uses a framework for interlinguas, rather than one universal interlingua. This makes the interlingual approach more light-weight and feasible than in systems assuming one universal interlingua, such as Rosetta and UNL, Universal Networking Language5 . It also gives more precision to special-purpose translation: the interlingua of a GF translation system (i.e. the abstract syntax of a multilingual grammar) can encode precisely those structures and distinctions that are relevant for the task at hand. Thus an interlingua for mathematical proofs (Hallgren and Ranta 2000) is different from one for commands for operating an MP3 player (Perera and Ranta 2007). The expressive power of the logical framework is sufficient for both kinds of tasks. One important source of inspiration for GF was the WYSIWYM system (Power and Scott 1998), which used domain-specific interlinguas and produced excellent quality in multilingual generation. But the generation components were hard-coded in the program, instead of being defined declaratively as in GF, and they were not usable in the direction of parsing. 3 Grammars and Ontologies Parallel to the first development efforts of GF in the late 1990’s, another framework idea was emerging in web technology: XML, Extensible Mark-up Language, which unlike HTML is not a single mark-up language but a framework for creating custom mark-up lan5www .undl .org 67 guages. The analogy between GF and XML was seen from the beginning, and GF was designed as a formalism for multilingual rendering of semantic content (Dymetman and al. 2000). XML originated as a format for structuring documents and structured data serialization, but a couple ofits descendants, RDF(S) and OWL, developed its potential to formally express the semantics of data and content, serving as the fundaments of the emerging Semantic Web. Practically any meaning representation format can be converted into GF’s abstract syntax, which can then be mapped to different target languages. In particular the OWL language can be seen as a syntactic sugar for a subset of Martin-L o¨f’s type theory so it is trivial to embed it in GF’s abstract syntax. The translation problem defined in terms of an ontology is radically different from the problem of translating plain text from one language to another. Many of the projects in which GF has been used involve precisely this: a meaning representation formalized as GF abstract syntax. Some projects build on previously existing meaning representation and address mathematical proofs (Hallgren and Ranta 2000), software specifications (Beckert & al. 2007), and mathematical exercises (the European project WebALT6). Other projects start with semantic modelling work to build meaning representations from scratch, most notably ones for dialogue systems (Perera and Ranta 2007) in the European project TALK7. Yet another project, and one closest to web translation, is the multilingual Wiki system presented in (Meza Moreno and Bringert 2008). In this system, users can add and modify reviews of restaurants in three languages (English, Spanish, and Swedish). Any change made in any of the languages gets automatically translated to the other languages. To take an example, the OWL-to-GF mapping trans- lates OWL’s classes to GF’s categories and OWL’s properties to GF’s functions that return propositions. As a running example in this and the next section, we will use the class of integers and the two-place property of being divisible (“x is divisible by y”). The correspondences are as follows: Clas s (pp : intege r . . . ) m catm integer Ob j e ctP roperty ( pp :div domain (pp : int ege r ) range ( pp :integer ) ) m funm div : int eger -> 4 int ege r -> prop Grammar Engineer’s Tools In the GF setting, building a multilingual translation system is equivalent to building a multilingual GF 6EDC-22253, webalt .math .he l inki . fi s 7IST-507802, 2004–2006, www .t alk-pro j e ct .org grammar, which in turn consists of two kinds of components: • a language-independent abstract syntax, giving tahe l snegmuaangtei-ci nmdeopdeenl dveinat tw ahbisctrha ctrtan ssylnattiaoxn, gisi performed; • for each language, a concrete syntax mapping abfstorrac eta syntax turaegese ,t oa strings ien s tyhnatta language. While abstract syntax construction is an extra task compared to many other kinds of translation methods, it is technically relatively simple, and its cost is moreover amortized as the system is extended to new languages. Concrete syntax construction can be much more demanding in terms of programming skills and linguistic knowledge, due to the complexity of natural languages. This task is where GF claims perhaps the highest advantage over other approaches to special-purpose grammars. The two main assets are: • • Programming language support: GF is a modern fPuroncgtriaomnaml programming language, w isith a a powerful type system and module system supporting modular and collaborative programming and reuse of code. RGL, the GF Resource Grammar Library, implementing Fthe R bsoausicrc linguistic dre Ltaiiblsr orfy l iamn-guages: inflectional morphology and syntactic combination functions. The RGL covers fifteen languages at the moment, shown in Figure 1; see also Khegai 2006, El Dada and Ranta 2007, Angelov 2008, Ranta 2009a,b, and Enache et al. 2010. To give an example of what the library provides, let us first consider the inflectional morphology. It is presented as a set of lexicon-building functions such as, in English, mkV : St r -> V i.e. function mkV, which takes a string (St r) as its argument and returns a verb (V) as its value. The verb is, internally, an inflection table containing all forms of a verb. The function mkV derives all these forms from its argument string, which is the infinitive form. It predicts all regular variations: (mkV

3 0.068440065 224 acl-2010-Talking NPCs in a Virtual Game World

Author: Tina Kluwer ; Peter Adolphs ; Feiyu Xu ; Hans Uszkoreit ; Xiwen Cheng

Abstract: This paper describes the KomParse system, a natural-language dialog system in the three-dimensional virtual world Twinity. In order to fulfill the various communication demands between nonplayer characters (NPCs) and users in such an online virtual world, the system realizes a flexible and hybrid approach combining knowledge-intensive domainspecific question answering, task-specific and domain-specific dialog with robust chatbot-like chitchat.

4 0.065750793 139 acl-2010-Identifying Generic Noun Phrases

Author: Nils Reiter ; Anette Frank

Abstract: This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rulelike knowledge about kinds or events. Therefore, their identification is important for the automatic construction of knowledge bases. In particular, the distinction between generic and non-generic statements is crucial for the correct encoding of generic and instance-level information. Generic expressions have been studied extensively in formal semantics. Building on this work, we explore a corpus-based learning approach for identifying generic NPs, using selections of linguistically motivated features. Our results perform well above the baseline and existing prior work.

5 0.061981492 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

Author: Hector-Hugo Franco-Penya

Abstract: ―Tree SRL system‖ is a Semantic Role Labelling supervised system based on a tree-distance algorithm and a simple k-NN implementation. The novelty of the system lies in comparing the sentences as tree structures with multiple relations instead of extracting vectors of features for each relation and classifying them. The system was tested with the English CoNLL-2009 shared task data set where 79% accuracy was obtained. 1

6 0.045587324 248 acl-2010-Unsupervised Ontology Induction from Text

7 0.044632874 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands

8 0.043127201 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

9 0.042233892 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

10 0.040137433 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

11 0.03853729 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

12 0.038262233 228 acl-2010-The Importance of Rule Restrictions in CCG

13 0.038004555 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

14 0.037104592 85 acl-2010-Detecting Experiences from Weblogs

15 0.036969606 158 acl-2010-Latent Variable Models of Selectional Preference

16 0.036416203 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

17 0.034660816 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

18 0.031575669 217 acl-2010-String Extension Learning

19 0.030133097 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

20 0.029606353 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.098), (1, 0.039), (2, 0.006), (3, -0.03), (4, 0.006), (5, -0.026), (6, 0.009), (7, 0.009), (8, 0.01), (9, -0.04), (10, 0.001), (11, 0.02), (12, -0.026), (13, -0.017), (14, 0.004), (15, 0.04), (16, 0.021), (17, 0.074), (18, 0.002), (19, 0.008), (20, -0.031), (21, -0.013), (22, -0.02), (23, -0.077), (24, 0.004), (25, -0.015), (26, -0.04), (27, 0.015), (28, -0.018), (29, -0.038), (30, -0.025), (31, -0.072), (32, 0.006), (33, 0.072), (34, -0.015), (35, -0.017), (36, 0.032), (37, -0.004), (38, 0.002), (39, 0.064), (40, -0.106), (41, 0.057), (42, 0.133), (43, 0.05), (44, -0.017), (45, 0.045), (46, -0.062), (47, 0.016), (48, 0.055), (49, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90363884 64 acl-2010-Complexity Assumptions in Ontology Verbalisation

Author: Richard Power

Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.

2 0.62537229 224 acl-2010-Talking NPCs in a Virtual Game World

Author: Tina Kluwer ; Peter Adolphs ; Feiyu Xu ; Hans Uszkoreit ; Xiwen Cheng

Abstract: This paper describes the KomParse system, a natural-language dialog system in the three-dimensional virtual world Twinity. In order to fulfill the various communication demands between nonplayer characters (NPCs) and users in such an online virtual world, the system realizes a flexible and hybrid approach combining knowledge-intensive domainspecific question answering, task-specific and domain-specific dialog with robust chatbot-like chitchat.

3 0.58068728 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

Author: Aarne Ranta ; Krasimir Angelov ; Thomas Hallgren

Abstract: This is a system demo for a set of tools for translating texts between multiple languages in real time with high quality. The translation works on restricted languages, and is based on semantic interlinguas. The underlying model is GF (Grammatical Framework), which is an open-source toolkit for multilingual grammar implementations. The demo will cover up to 20 parallel languages. Two related sets of tools are presented: grammarian’s tools helping to build translators for new domains and languages, and translator’s tools helping to translate documents. The grammarian’s tools are designed to make it easy to port the technique to new applications. The translator’s tools are essential in the restricted language context, enabling the author to remain in the fragments recognized by the system. The tools that are demonstrated will be ap- plied and developed further in the European project MOLTO (Multilingual On-Line Translation) which has started in March 2010 and runs for three years. 1 Translation Needs for the Web The best-known translation tools on the web are Google translate1 and Systran2. They are targeted to consumers of web documents: users who want to find out what a given document is about. For this purpose, browsing quality is sufficient, since the user has intelligence and good will, and understands that she uses the translation at her own risk. Since Google and Systran translations can be grammatically and semantically flawed, they don’t reach publication quality, and cannot hence be used by the producers of web documents. For instance, the provider of an e-commerce site cannot take the risk that the product descriptions or selling conditions have errors that change the original intentions. There are very few automatic translation systems actually in use for producers of information. As already 1www .google . com/t rans l e at 2www. systransoft . com noted by Bar-Hillel (1964), machine translation is one of those AI-complete tasks that involves a trade-off between coverage and precision, and the current mainstream systems opt for coverage. This is also what web users expect: they want to be able to throw just anything at the translation system and get something useful back. Precision-oriented approaches, the prime example of which is METEO (Chandioux 1977), have not been popular in recent years. However, from the producer’s point of view, large coverage is not essential: unlike the consumer’s tools, their input is predictable, and can be restricted to very specific domains, and to content that the producers themselves are creating in the first place. But even in such tasks, two severe problems remain: • • The development cost problem: a large amount oTfh ew dorekv eisl onpemedeendt f coors building tmra:n asl laatorgrse afomr new domains and new languages. The authoring problem: since the method does nTohte ew aourkth foorri nalgl input, etmhe: :asu tihnocer othfe eth me source toexest of translation may need special training to write in a way that can be translated at all. These two problems have probably been the main obstacles to making high-quality restricted language translation more wide-spread in tasks where it would otherwise be applicable. We address these problems by providing tools that help developers of translation systems on the one hand, and authors and translators—i.e. the users of the systems—on the other. In the MOLTO project (Multilingual On-Line Translation)3, we have the goal to improve both the development and use of restricted language translation by an order of magnitude, as compared with the state of the art. As for development costs, this means that a system for many languages and with adequate quality can be built in a matter of days rather than months. As for authoring, this means that content production does not require the use of manuals or involve trial and error, both of which can easily make the work ten times slower than normal writing. In the proposed system demo, we will show how some of the building blocks for MOLTO can already now be used in web-based translators, although on a 3 www.molto-project .eu 66 UppsalaP,r Sowceeeddenin,g 1s3 o Jfu tlhye 2 A0C1L0. 2 ?c 01200 S1y0s Atesmso Dcieamtioonns ftorart Cioonms,p puatagteiso 6n6a–l7 L1in,guistics Figure 1: A multilingual GF grammar with reversible mappings from a common abstract syntax to the 15 languages currently available in the GF Resource Grammar Library. smaller scale as regards languages and application domains. A running demo system is available at http : / / grammat i cal framework .org : 4 1 9 6. 2 2 Multilingual Grammars The translation tools are based on GF, Grammatical Framework4 (Ranta 2004). GF is a grammar formalism—that is, a mathematical model of natural language, equipped with a formal notation for writing grammars and a computer program implementing parsing and generation which are declaratively defined by grammars. Thus GF is comparable with formalism such as HPSG (Pollard and Sag 1994), LFG (Bresnan 1982) or TAG (Joshi 1985). The novel feature of GF is the notion of multilingual grammars, which describe several languages simultaneously by using a common representation called abstract syntax; see Figure 1. In a multilingual GF grammar, meaning-preserving translation is provided as a composition of parsing and generation via the abstract syntax, which works as an interlingua. This model of translation is different from approaches based on other comparable grammar formalisms, such as synchronous TAGs (Shieber and Schabes 1990), Pargram (Butt & al. 2002, based on LFG), LINGO Matrix (Bender and Flickinger 2005, based on HPSG), and CLE (Core Language Engine, Alshawi 1992). These approaches use transfer rules between individual languages, separate for each pair of languages. Being interlingua-based, GF translation scales up linearly to new languages without the quadratic blowup of transfer-based systems. In transfer-based sys- 4www.grammaticalframework.org tems, as many as n(n − 1) components (transfer functtieomnss), are naeneyde ads nto( cover a)l cl language pairs nisnf bero tfhu ndci-rections. In an interlingua-based system, 2n + 1components are enough: the interlingua itself, plus translations in both directions between each language and the interlingua. However, in GF, n + 1 components are sufficient, because the mappings from the abstract syntax to each language (the concrete syntaxes) are reversible, i.e. usable for both generation and parsing. Multilingual GF grammars can be seen as an implementation of Curry’s distinction between tectogrammatical and phenogrammatical structure (Curry 1961). In GF, the tectogrammatical structure is called abstract syntax, following standard computer science terminology. It is defined by using a logical framework (Harper & al. 1993), whose mathematical basis is in the type theory of Martin-L o¨f (1984). Two things can be noted about this architecture, both showing im- provements over state-of-the-art grammar-based translation methods. First, the translation interlingua (the abstract syntax) is a powerful logical formalism, able to express semantical structures such as context-dependencies and anaphora (Ranta 1994). In particular, dependent types make it more expressive than the type theory used in Montague grammar (Montague 1974) and employed in the Rosetta translation project (Rosetta 1998). Second, GF uses a framework for interlinguas, rather than one universal interlingua. This makes the interlingual approach more light-weight and feasible than in systems assuming one universal interlingua, such as Rosetta and UNL, Universal Networking Language5 . It also gives more precision to special-purpose translation: the interlingua of a GF translation system (i.e. the abstract syntax of a multilingual grammar) can encode precisely those structures and distinctions that are relevant for the task at hand. Thus an interlingua for mathematical proofs (Hallgren and Ranta 2000) is different from one for commands for operating an MP3 player (Perera and Ranta 2007). The expressive power of the logical framework is sufficient for both kinds of tasks. One important source of inspiration for GF was the WYSIWYM system (Power and Scott 1998), which used domain-specific interlinguas and produced excellent quality in multilingual generation. But the generation components were hard-coded in the program, instead of being defined declaratively as in GF, and they were not usable in the direction of parsing. 3 Grammars and Ontologies Parallel to the first development efforts of GF in the late 1990’s, another framework idea was emerging in web technology: XML, Extensible Mark-up Language, which unlike HTML is not a single mark-up language but a framework for creating custom mark-up lan5www .undl .org 67 guages. The analogy between GF and XML was seen from the beginning, and GF was designed as a formalism for multilingual rendering of semantic content (Dymetman and al. 2000). XML originated as a format for structuring documents and structured data serialization, but a couple ofits descendants, RDF(S) and OWL, developed its potential to formally express the semantics of data and content, serving as the fundaments of the emerging Semantic Web. Practically any meaning representation format can be converted into GF’s abstract syntax, which can then be mapped to different target languages. In particular the OWL language can be seen as a syntactic sugar for a subset of Martin-L o¨f’s type theory so it is trivial to embed it in GF’s abstract syntax. The translation problem defined in terms of an ontology is radically different from the problem of translating plain text from one language to another. Many of the projects in which GF has been used involve precisely this: a meaning representation formalized as GF abstract syntax. Some projects build on previously existing meaning representation and address mathematical proofs (Hallgren and Ranta 2000), software specifications (Beckert & al. 2007), and mathematical exercises (the European project WebALT6). Other projects start with semantic modelling work to build meaning representations from scratch, most notably ones for dialogue systems (Perera and Ranta 2007) in the European project TALK7. Yet another project, and one closest to web translation, is the multilingual Wiki system presented in (Meza Moreno and Bringert 2008). In this system, users can add and modify reviews of restaurants in three languages (English, Spanish, and Swedish). Any change made in any of the languages gets automatically translated to the other languages. To take an example, the OWL-to-GF mapping trans- lates OWL’s classes to GF’s categories and OWL’s properties to GF’s functions that return propositions. As a running example in this and the next section, we will use the class of integers and the two-place property of being divisible (“x is divisible by y”). The correspondences are as follows: Clas s (pp : intege r . . . ) m catm integer Ob j e ctP roperty ( pp :div domain (pp : int ege r ) range ( pp :integer ) ) m funm div : int eger -> 4 int ege r -> prop Grammar Engineer’s Tools In the GF setting, building a multilingual translation system is equivalent to building a multilingual GF 6EDC-22253, webalt .math .he l inki . fi s 7IST-507802, 2004–2006, www .t alk-pro j e ct .org grammar, which in turn consists of two kinds of components: • a language-independent abstract syntax, giving tahe l snegmuaangtei-ci nmdeopdeenl dveinat tw ahbisctrha ctrtan ssylnattiaoxn, gisi performed; • for each language, a concrete syntax mapping abfstorrac eta syntax turaegese ,t oa strings ien s tyhnatta language. While abstract syntax construction is an extra task compared to many other kinds of translation methods, it is technically relatively simple, and its cost is moreover amortized as the system is extended to new languages. Concrete syntax construction can be much more demanding in terms of programming skills and linguistic knowledge, due to the complexity of natural languages. This task is where GF claims perhaps the highest advantage over other approaches to special-purpose grammars. The two main assets are: • • Programming language support: GF is a modern fPuroncgtriaomnaml programming language, w isith a a powerful type system and module system supporting modular and collaborative programming and reuse of code. RGL, the GF Resource Grammar Library, implementing Fthe R bsoausicrc linguistic dre Ltaiiblsr orfy l iamn-guages: inflectional morphology and syntactic combination functions. The RGL covers fifteen languages at the moment, shown in Figure 1; see also Khegai 2006, El Dada and Ranta 2007, Angelov 2008, Ranta 2009a,b, and Enache et al. 2010. To give an example of what the library provides, let us first consider the inflectional morphology. It is presented as a set of lexicon-building functions such as, in English, mkV : St r -> V i.e. function mkV, which takes a string (St r) as its argument and returns a verb (V) as its value. The verb is, internally, an inflection table containing all forms of a verb. The function mkV derives all these forms from its argument string, which is the infinitive form. It predicts all regular variations: (mkV

4 0.56164187 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text

Author: Jochen Leidner ; Frank Schilder

Abstract: In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk taxonomy with concrete risks (e.g., interest rate changes) at its leaves and more abstract risks (e.g., financial risks) closer to its root node. The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls.

5 0.52393901 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

Author: Karin Murthy ; Tanveer A Faruquie ; L Venkata Subramaniam ; Hima Prasad K ; Mukesh Mohania

Abstract: We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data.

6 0.50088161 248 acl-2010-Unsupervised Ontology Induction from Text

7 0.47143823 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

8 0.46393126 259 acl-2010-WebLicht: Web-Based LRT Services for German

9 0.45718834 61 acl-2010-Combining Data and Mathematical Models of Language Change

10 0.45531398 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

11 0.4446328 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

12 0.44423217 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

13 0.43237868 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages

14 0.43146673 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction

15 0.42465258 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs

16 0.42047113 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

17 0.40936804 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

18 0.38023677 139 acl-2010-Identifying Generic Noun Phrases

19 0.34854382 196 acl-2010-Plot Induction and Evolutionary Search for Story Generation

20 0.32410693 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.076), (39, 0.011), (42, 0.023), (56, 0.418), (59, 0.075), (73, 0.035), (78, 0.051), (83, 0.063), (84, 0.03), (98, 0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.78507888 64 acl-2010-Complexity Assumptions in Ontology Verbalisation

Author: Richard Power

Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.

2 0.50196487 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

Author: Hiroshi Echizen-ya ; Kenji Araki

Abstract: As described in this paper, we propose a new automatic evaluation method for machine translation using noun-phrase chunking. Our method correctly determines the matching words between two sentences using corresponding noun phrases. Moreover, our method determines the similarity between two sentences in terms of the noun-phrase order of appearance. Evaluation experiments were conducted to calculate the correlation among human judgments, along with the scores produced us- ing automatic evaluation methods for MT outputs obtained from the 12 machine translation systems in NTCIR7. Experimental results show that our method obtained the highest correlations among the methods in both sentence-level adequacy and fluency.

3 0.42149726 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment

Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou

Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1

4 0.35066506 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

5 0.34964368 169 acl-2010-Learning to Translate with Source and Target Syntax

Author: David Chiang

Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.

6 0.34802234 71 acl-2010-Convolution Kernel over Packed Parse Forest

7 0.34740934 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

8 0.34660047 158 acl-2010-Latent Variable Models of Selectional Preference

9 0.3459594 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

10 0.34477553 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

11 0.34465444 130 acl-2010-Hard Constraints for Grammatical Function Labelling

12 0.34377778 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

13 0.34364396 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

14 0.34330219 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

15 0.34240475 248 acl-2010-Unsupervised Ontology Induction from Text

16 0.34238645 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

17 0.34235173 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

18 0.34161091 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

19 0.34110239 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

20 0.34000033 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields