acl acl2010 acl2010-216 knowledge-graph by maker-knowledge-mining

216 acl-2010-Starting from Scratch in Semantic Role Labeling


Source: pdf

Author: Michael Connor ; Yael Gertner ; Cynthia Fisher ; Dan Roth

Abstract: A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children learning their first languages begin in solving this problem? In this paper we focus on the parsing and argumentidentification steps that precede Semantic Role Labeling (SRL) training. We combine a simplified SRL with an unsupervised HMM part of speech tagger, and experiment with psycholinguisticallymotivated ways to label clusters resulting from the HMM so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing accuracy, and that the contribution of alternative representations of sentence structure to successful semantic role labeling varies with the integrity of the parsing and argumentidentification stages.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Abstract A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. [sent-5, score-0.211]

2 Syntactic bootstrapping theory proposes that young children use their very partial knowledge of syntax to guide senYael Gertner University of Illinois yge rtne r@ cyru s . [sent-16, score-0.22]

3 Nouns are arguably less dependent on prior linguistic knowledge for their acquisition than are verbs; thus children are assumed to be able to identify the referents of some nouns via cross-situational observation (Gillette et al. [sent-22, score-0.376]

4 Children treat each noun as a candidate argument, and thus interpret the number of nouns in the sentence as a cue to its semantic predicate-argument structure (Fisher, 1996). [sent-25, score-0.363]

5 Third, children represent sentences in an abstract format that permits generalization to new verbs (Gertner et al. [sent-26, score-0.276]

6 In the sentence “Ellen and John laughed”, an intransitive verb appears with two nouns. [sent-29, score-0.37]

7 If young children rely on representations of sentences as simple as an ordered set of nouns, then they should have trouble distinguishing such sentences from transitive sentences. [sent-30, score-0.533]

8 , 2008)) showed that it is possible to learn to assign basic semantic roles based on the shallow sentence representations proposed by the structure-mapping view. [sent-33, score-0.262]

9 The second problem facing the learner is more contentious: Having identified clusters of distributionally-similar words, how do children figure out what role these clusters of words should play in a sentence interpretation system? [sent-47, score-0.508]

10 By using the HMM part-of-speech tagger in this way, we can ask how the simple structural features that we propose children start with stand up to reductions in parsing accuracy. [sent-52, score-0.281]

11 This allows us to ask whether a learner, equipped with particular theoretically-motivated representations of the input, can learn to understand sentences at the level of who did what to whom. [sent-55, score-0.213]

12 The stages are: (1) Parsing the sentence, (2) Identifying potential predicates and arguments based on the parse, (3) Classifying role labels for each potential argument relative to a predicate, (4) Applying constraints to find the best labeling of arguments for a sentence. [sent-62, score-0.607]

13 In this work we attempt to limit the knowledge available at each stage to the automatic output of the previous stage, constrained by knowledge that we argue is available to children in the early stages of language learning. [sent-63, score-0.277]

14 In the parsing stage we use an unsupervised parser based on Hidden Markov Models (HMM), modeling a simple ‘predict the next word’ parser. [sent-64, score-0.24]

15 Next the argument identification stage identifies HMM states that correspond to possible arguments and predicates. [sent-65, score-0.678]

16 The candidate arguments and predicates identified in each input sentence are passed to an SRL classifier that uses simple abstract features based on the number and order of arguments to learn to assign semantic roles. [sent-66, score-0.621]

17 As input to our learner we use samples of natural child directed speech (CDS) from the CHILDES corpora (MacWhinney, 2000). [sent-67, score-0.225]

18 The argument identifier uses a small set of frequent nouns to seed argument states, relying on the assumptions that some concrete nouns can be learned as a prerequisite to sentence interpretation, and are interpreted as candidate arguments. [sent-69, score-1.033]

19 The SRL classifier starts with noisy largely unsupervised argument identification, and receives feedback based on annotation in the PropBank style; in training, each word identified as an argument receives the true role label of the phrase that word is part of. [sent-70, score-0.726]

20 The provision 990 of perfect ‘gold-standard’ feedback over-estimates the real child’s access to this supervision, but allows us to investigate the consequences of noisy argument identification for SRL performance. [sent-72, score-0.415]

21 Our ultimate goal is to ‘close the loop’ of this system, by using learning in the SRL system to improve the initial unsupervised parse and argument identification. [sent-74, score-0.292]

22 The training data were samples of parental speech to three children (Adam, Eve, and Sarah; (Brown, 1973)), available via CHILDES. [sent-75, score-0.229]

23 We allocated a fixed number of states for these function words, and left the rest of the states for the rest of the words. [sent-110, score-0.228]

24 Figure 1 shows the performance of the four systems using Variation of Information to measure match between gold states and unsupervised parsers as we vary the amount of text they receive. [sent-127, score-0.24]

25 Another common measure for unsupervised POS (when there are more states than tags) is a many to one greedy mapping of states to tags. [sent-129, score-0.308]

26 4 Argument Identification The unsupervised parser provides a state label for each word in each sentence; the goal of the argument identification stage is to use these states to label words as potential arguments, predicates or neither. [sent-138, score-0.701]

27 As described in the introduction, core premises of the structure-mapping account offer routes whereby we could label some HMM states as argument or predicate states. [sent-139, score-0.445]

28 Children are assumed to identify the referents of some concrete nouns via cross-situational learning (Gillette et al. [sent-141, score-0.26]

29 We use a small set of known nouns to transform unlabeled word clusters into candidate arguments for the SRL: HMM states that are dominated by known names for animate or inanimate objects are assumed to be argument states. [sent-145, score-0.964]

30 Given text parsed by the HMM parser and a list of known nouns, the argument identifier proceeds in multiple steps as illustrated in figure 2. [sent-146, score-0.333]

31 The first stage identifies as argument states those states that appear at least half the time in the training data with known nouns. [sent-147, score-0.549]

32 This is a two stage process: argument state identification based on statistics collected over entire text and per sentence predicate identification. [sent-150, score-0.596]

33 As a list of known nouns we collected all those nouns that appear three times or more in the child directed speech training data and judged to be either animate or inanimate nouns. [sent-151, score-0.756]

34 The full set of 365 nouns covers over 93% of noun occurences in our data. [sent-152, score-0.28]

35 In upcoming sections we experiment with varying the number of seed nouns used from this set, selecting the most frequent set of nouns. [sent-153, score-0.266]

36 Reflecting the spoken nature of the child directed speech, the most frequent nouns are pronouns, but beyond the top 10 we see nouns naming people (‘daddy’, ‘ursula’) and object nouns (‘chair’, ‘lunch’). [sent-154, score-0.696]

37 A typical SRL model identifies candidate arguments and tries to assign roles to them relative to each verb in the sentence. [sent-156, score-0.275]

38 In principle one might suppose that children learn the meanings of verbs via cross-situational observation just as they learn the meanings of concrete nouns. [sent-157, score-0.5]

39 As a result, early vocabularies are dominated by nouns (Gentner, 2006). [sent-161, score-0.255]

40 We implement this behavior by identifying as predicate states the HMM states that appear commonly with a particular number of previously identified arguments. [sent-165, score-0.422]

41 First, we collect statistics over the entire HMM training corpus regarding how many arguments are identified per sentence, and which states that are not identified as argument states appear with each number of arguments. [sent-166, score-0.724]

42 Next, for each parsed sentence that serves as SRL input, the algorithm chooses as the most likely predicate the word whose state is most likely to appear with the number of arguments found in the current input sentence. [sent-167, score-0.3]

43 Implicitly, the argument count likelihood divides predicate states up into transitive and intransitive predicates based on appearances in the simple sentences of CDS. [sent-169, score-1.004]

44 1 Argument Identification Evaluation Figure 3 shows argument and predicate identification accuracy for each of the four parsers when provided with different numbers of known nouns. [sent-171, score-0.57]

45 Argument identification accuracy is computed against true argument boundaries from hand labeled data. [sent-177, score-0.363]

46 The upper set of results show primary argument (A0-4) identification F1, and bottom lines show predicate identification F1. [sent-178, score-0.633]

47 known nouns represents the assumption that toddlers have already identified pronouns as referential terms. [sent-179, score-0.347]

48 Even 19-month-olds ately different interpretations assign appropri- to novel verbs pre- sented in simple transitive versus intransitive sen- tences with pronoun arguments (“He’s kradding him! [sent-180, score-0.793]

49 Two groups of curves appear in figure 3: the upper group shows the primary argument identification accuracy and the bottom group shows the predicate identification accuracy. [sent-186, score-0.633]

50 We evaluate compared to gold tagged data with true argument and predicate boundaries. [sent-187, score-0.331]

51 The primary argument (A0-4) identification accuracy is the F1 value, with precision calculated as the proportion of identified arguments that appear as part of a true argument, and recall as the proportion of true arguments that have some state identified as an argument. [sent-188, score-0.781]

52 F1 is calculated similarly for predicate identification, as one state per sentence is identified as the predicate. [sent-189, score-0.241]

53 As shown in figure 3, argument identification F1 is higher than predicate identification (which is to be expected, given that predicate identification depends on accurate arguments), and as we add more seed nouns the argument identification improves. [sent-190, score-1.532]

54 Surprisingly, despite the clear differences in un- supervised POS performance seen in figure 1, the different parsers do not yield very different argument and predicate identification. [sent-191, score-0.377]

55 As we will see in the next section, however, when the arguments identified in this step are used to train SRL classifier, distinctions between parsers reappear, suggesting that argument identification F1 masks systematic patterns in the errors. [sent-192, score-0.618]

56 For argument classification we used a linear classifier trained with a regularized perceptron update rule (Grove and Roth, 2001). [sent-194, score-0.253]

57 In the results reported below the BabySRL did not use sentence-level inference for the final classification, every identified argument is classified independently; thus multiple nouns can have the same role. [sent-195, score-0.48]

58 We evaluated SRL performance by testing the BabySRL with constructed sentences like those used for the experiments with children described in the Introduction. [sent-197, score-0.209]

59 All four versions include lexical features consisting of the target argument and predicate (as identified in the previous steps). [sent-200, score-0.452]

60 Noun pattern features indicate how many nouns there are in the sentence and which noun the target is. [sent-204, score-0.373]

61 (1) NounPat features will improve the SRL’s ability to interpret simple transitive test sentences containing two nouns and a novel verb, relative 994 to a lexical baseline. [sent-211, score-0.503]

62 (2) Because NounPat features represent word order solely in terms of a sequence of nouns, an SRL equipped with these features will make the errors predicted by the structure-mapping account and documented in children (Gertner and Fisher, 2006). [sent-214, score-0.327]

63 (3) NounPat features permit the SRL to assign different roles to the subjects of transitive and intransitive sentences that differ in their num- ber of nouns. [sent-215, score-0.65]

64 In addition, VerbPos features eliminated the errors with two-noun intransitive sentences. [sent-218, score-0.307]

65 Given test sentences such as ‘You and Mommy krad’, VerbPos features represented both nouns as pre-verbal, and therefore identified both as likely agents. [sent-219, score-0.375]

66 However, VerbPos features did not help the SRL assign different roles to the subjects of simple transitive and intransitive sentences: ‘Mommy’ in ‘Mommy krads you’ and ’Mommy krads’ are both represented simply as pre-verbal. [sent-220, score-0.681]

67 To test the system’s predictions on transitive and intransitive two noun sentences, we constructed two test sentence templates: ‘A krads B’ and ‘A and B krad’, where A and B were replaced with familiar animate nouns. [sent-221, score-0.802]

68 The animate nouns were selected from all three children’s data in the training set and paired together in the templates such that all pairs are represented. [sent-222, score-0.305]

69 Figure 4 shows SRL performance on test sentences containing a novel verb and two animate nouns. [sent-223, score-0.235]

70 Each plot shows the proportion of test sentences that were assigned an agent-patient (A0A1) role sequence; this sequence is correct for transitive sentences but is an error for two-noun intransitive sentences. [sent-224, score-0.64]

71 The top and bottom panels in Figure 4 differ in the number of nouns provided to seed the argument identification stage. [sent-226, score-0.629]

72 The top row shows performance with 10 seed nouns (the 10 most frequent nouns, mostly animate pronouns), and the bottom row shows performance with 365 concrete (animate or inanimate) nouns treated as known. [sent-227, score-0.638]

73 Relative to the lexical baseline, NounPat features fared well: they promoted the assignment of A0A1 interpretations to transitive sentences, across all parser versions and both sets of known nouns. [sent-228, score-0.432]

74 Both VB estimation and the content-function word split increased the ability of NounPat features to learn that the first of two nouns was an agent, and the second a patient. [sent-229, score-0.279]

75 The NounPat features also promote the predicted error with two-noun intransitive sentences (Figures 4(b), 4(d)). [sent-230, score-0.403]

76 Despite the relatively low accuracy of predicate identification noted in section 4. [sent-231, score-0.27]

77 1, the VerbPos features did succeed in promoting an A0A1 interpretation for transitive sentences containing novel verbs relative to the lexical baseline. [sent-232, score-0.377]

78 In every case the performance of the Combined model that includes both NounPat and VerbPos features exceeds the performance of either NounPat or VerbPos alone, suggesting both contribute to correct predictions for transitive sentences. [sent-233, score-0.249]

79 Most strikingly, the VerbPos features did not eliminate the predicted error with two-noun intransitive sentences, as shown in panels 4(b) and 4(d). [sent-235, score-0.342]

80 Table 1 shows SRL performance on the same transitive test sentences (‘A krads B’), compared to simple one-noun intransitive sentences (‘A krads’). [sent-237, score-0.678]

81 To permit a direct comparison, the table reports the proportion of transitive test sentences for which the first noun was assigned an agent (A0) interpretation, and the proportion of intransitive test sentences with the agent (A0) role assigned to the single noun in the sentence. [sent-238, score-0.93]

82 Here we report only the results from the best-performing parser (trained with VB EM, and content/function word pre-clustering), compared to the same classifiers trained with gold standard argument identification. [sent-239, score-0.255]

83 When trained on arguments identified via the unsupervised POS tagger, noun pattern features promoted agent interpretations of tran- 995 GV B ol+ dF Au rn gc utm 13e60n5st es e d Le0 . [sent-240, score-0.578]

84 c45 tb19i8on e Table 1: SRL result comparison when trained with best unsupervised argument identifier versus trained with gold arguments. [sent-249, score-0.328]

85 Comparison is between agent first prediction of two noun transitive sentences vs. [sent-250, score-0.409]

86 The unsupervised arguments lead the classifier to rely more on noun pattern features; when the true arguments and predicate are known the verb position feature leads the classifier to strongly indicate agent first in both settings. [sent-252, score-0.798]

87 Performance with gold-standard argument identification is included for comparison. [sent-254, score-0.363]

88 Across parses, noun pattern features promote agent-patient (A0A1) interpretations of both transitive (“You krad Mommy”) and two-noun intransitive sentences (“You and Mommy krad”); the latter is an error found in young children. [sent-255, score-0.788]

89 Unsupervised parsing is less accurate in identifying the verb, so verb position features fail to eliminate errors with two-noun intransitive sentences. [sent-256, score-0.419]

90 This differentiation between transitive and intransitive sentences was clearer when more known nouns were provided. [sent-258, score-0.76]

91 Verb position features, in contrast, promote agent interpretations of subjects weakly with unsupervised argument identification, but equally for transitive and intransitive. [sent-259, score-0.61]

92 The behavior of verb position features suggests that variations in the identifiability of different parts of speech can affect the usefulness of alternative representations of sentence structure. [sent-261, score-0.259]

93 Representations that reflect the position of the verb may be powerful guides for understanding simple English sentences, but representations reflecting only the number and order of nouns can dominate early in acquisition, depending on the integrity of parsing decisions. [sent-262, score-0.482]

94 6 Conclusion and Future Work The key innovation in the present work is the combination of unsupervised part-of-speech tagging and argument identification to permit learning in a simplified SRL system. [sent-263, score-0.443]

95 Instead, they must learn to understand sentences starting from scratch, learning the meanings of some words, and using those words and their patterns of arrangement into sentences to bootstrap their way into more mature knowledge. [sent-265, score-0.231]

96 We combined unsupervised parsing with minimal supervision to begin to identify arguments and predicates. [sent-267, score-0.264]

97 An SRL classifier used simple representations built from these identified arguments to extract useful abstract patterns for classifying semantic roles. [sent-268, score-0.346]

98 The next step is to ‘close the loop’, using higher level semantic feedback to improve the earlier argument identification and parsing stages. [sent-270, score-0.501]

99 Perhaps with the help of semantic feedback the system can automatically improve predicate identifi- cation, which in turn allows it to correct the observed intransitive sentence error. [sent-271, score-0.515]

100 Participants are more than physical bodies: 21month-olds assign relational meaning to novel transitive verbs. [sent-539, score-0.237]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('srl', 0.313), ('intransitive', 0.261), ('nounpat', 0.238), ('verbpos', 0.238), ('argument', 0.212), ('transitive', 0.203), ('nouns', 0.193), ('hmm', 0.185), ('babysrl', 0.183), ('gertner', 0.183), ('identification', 0.151), ('children', 0.148), ('fisher', 0.138), ('arguments', 0.134), ('predicate', 0.119), ('vb', 0.115), ('states', 0.114), ('animate', 0.112), ('mommy', 0.11), ('connor', 0.096), ('gillette', 0.092), ('krads', 0.092), ('noun', 0.087), ('child', 0.084), ('unsupervised', 0.08), ('identified', 0.075), ('krad', 0.073), ('seed', 0.073), ('em', 0.073), ('meanings', 0.069), ('stage', 0.067), ('verbs', 0.067), ('concrete', 0.067), ('learner', 0.064), ('cognition', 0.063), ('shi', 0.062), ('verb', 0.062), ('early', 0.062), ('sentences', 0.061), ('clusters', 0.06), ('representations', 0.06), ('agent', 0.058), ('phonological', 0.058), ('interpretations', 0.057), ('cds', 0.055), ('childes', 0.055), ('inanimate', 0.055), ('integrity', 0.055), ('macwhinney', 0.055), ('role', 0.054), ('equipped', 0.052), ('feedback', 0.052), ('parsing', 0.05), ('bloom', 0.048), ('gleitman', 0.048), ('infants', 0.048), ('distributional', 0.047), ('sentence', 0.047), ('parsers', 0.046), ('yuan', 0.046), ('features', 0.046), ('roles', 0.045), ('brent', 0.044), ('speech', 0.044), ('illinois', 0.043), ('parser', 0.043), ('known', 0.042), ('promoted', 0.041), ('classifier', 0.041), ('learn', 0.04), ('pos', 0.039), ('labeling', 0.039), ('pronouns', 0.037), ('grammatical', 0.037), ('reductions', 0.037), ('argumentidentification', 0.037), ('cyru', 0.037), ('demetras', 0.037), ('demuth', 0.037), ('funct', 0.037), ('gomez', 0.037), ('kradding', 0.037), ('lidz', 0.037), ('monaghan', 0.037), ('parental', 0.037), ('preclustering', 0.037), ('rushen', 0.037), ('saffran', 0.037), ('semantic', 0.036), ('comprehension', 0.036), ('identifier', 0.036), ('predicted', 0.035), ('acquisition', 0.035), ('johnson', 0.035), ('bootstrapping', 0.035), ('assign', 0.034), ('predicates', 0.034), ('punyakanok', 0.033), ('priors', 0.033), ('directed', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999899 216 acl-2010-Starting from Scratch in Semantic Role Labeling

Author: Michael Connor ; Yael Gertner ; Cynthia Fisher ; Dan Roth

Abstract: A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children learning their first languages begin in solving this problem? In this paper we focus on the parsing and argumentidentification steps that precede Semantic Role Labeling (SRL) training. We combine a simplified SRL with an unsupervised HMM part of speech tagger, and experiment with psycholinguisticallymotivated ways to label clusters resulting from the HMM so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing accuracy, and that the contribution of alternative representations of sentence structure to successful semantic role labeling varies with the integrity of the parsing and argumentidentification stages.

2 0.37564319 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

3 0.29530802 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

Author: Junhui Li ; Guodong Zhou ; Hwee Tou Ng

Abstract: This paper explores joint syntactic and semantic parsing of Chinese to further improve the performance of both syntactic and semantic parsing, in particular the performance of semantic parsing (in this paper, semantic role labeling). This is done from two levels. Firstly, an integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Secondly, semantic information generated by semantic parsing is incorporated into the syntactic parsing model to better capture semantic information in syntactic parsing. Evaluation on Chinese TreeBank, Chinese PropBank, and Chinese NomBank shows that our integrated parsing approach outperforms the pipeline parsing approach on n-best parse trees, a natural extension of the widely used pipeline parsing approach on the top-best parse tree. Moreover, it shows that incorporating semantic role-related information into the syntactic parsing model significantly improves the performance of both syntactic parsing and semantic parsing. To our best knowledge, this is the first research on exploring syntactic parsing and semantic role labeling for both verbal and nominal predicates in an integrated way. 1

4 0.22726676 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling

Author: Weiwei Sun

Abstract: One deficiency of current shallow parsing based Semantic Role Labeling (SRL) methods is that syntactic chunks are too small to effectively group words. To partially resolve this problem, we propose semantics-driven shallow parsing, which takes into account both syntactic structures and predicate-argument structures. We also introduce several new “path” features to improve shallow parsing based SRL method. Experiments indicate that our new method obtains a significant improvement over the best reported Chinese SRL result.

5 0.21925201 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

Author: Matthew Gerber ; Joyce Chai

Abstract: Despite its substantial coverage, NomBank does not account for all withinsentence arguments and ignores extrasentential arguments altogether. These arguments, which we call implicit, are important to semantic processing, and their recovery could potentially benefit many NLP applications. We present a study of implicit arguments for a select group of frequent nominal predicates. We show that implicit arguments are pervasive for these predicates, adding 65% to the coverage of NomBank. We demonstrate the feasibility of recovering implicit arguments with a supervised classification model. Our results and analyses provide a baseline for future work on this emerging task.

6 0.21535669 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

7 0.20540568 25 acl-2010-Adapting Self-Training for Semantic Role Labeling

8 0.19834603 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

9 0.19604816 238 acl-2010-Towards Open-Domain Semantic Role Labeling

10 0.18819292 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

11 0.16752699 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

12 0.15063609 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

13 0.13505627 158 acl-2010-Latent Variable Models of Selectional Preference

14 0.10928433 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery

15 0.10582872 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

16 0.10570718 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

17 0.092925794 130 acl-2010-Hard Constraints for Grammatical Function Labelling

18 0.087517291 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences

19 0.081527919 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

20 0.080499373 96 acl-2010-Efficient Optimization of an MDL-Inspired Objective Function for Unsupervised Part-Of-Speech Tagging


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.28), (1, 0.164), (2, 0.341), (3, 0.145), (4, 0.044), (5, -0.004), (6, -0.191), (7, -0.041), (8, 0.055), (9, -0.007), (10, 0.036), (11, -0.01), (12, -0.005), (13, -0.057), (14, -0.065), (15, -0.036), (16, -0.007), (17, -0.002), (18, 0.008), (19, 0.065), (20, -0.015), (21, 0.041), (22, -0.006), (23, 0.005), (24, 0.005), (25, 0.013), (26, -0.058), (27, 0.024), (28, 0.006), (29, 0.011), (30, -0.016), (31, -0.025), (32, 0.015), (33, 0.015), (34, 0.108), (35, -0.089), (36, 0.041), (37, 0.013), (38, 0.012), (39, 0.023), (40, -0.04), (41, -0.027), (42, -0.024), (43, 0.047), (44, -0.032), (45, -0.001), (46, 0.031), (47, -0.022), (48, 0.055), (49, 0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94478124 216 acl-2010-Starting from Scratch in Semantic Role Labeling

Author: Michael Connor ; Yael Gertner ; Cynthia Fisher ; Dan Roth

Abstract: A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children learning their first languages begin in solving this problem? In this paper we focus on the parsing and argumentidentification steps that precede Semantic Role Labeling (SRL) training. We combine a simplified SRL with an unsupervised HMM part of speech tagger, and experiment with psycholinguisticallymotivated ways to label clusters resulting from the HMM so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing accuracy, and that the contribution of alternative representations of sentence structure to successful semantic role labeling varies with the integrity of the parsing and argumentidentification stages.

2 0.85276765 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

3 0.78471732 238 acl-2010-Towards Open-Domain Semantic Role Labeling

Author: Danilo Croce ; Cristina Giannone ; Paolo Annesi ; Roberto Basili

Abstract: Current Semantic Role Labeling technologies are based on inductive algorithms trained over large scale repositories of annotated examples. Frame-based systems currently make use of the FrameNet database but fail to show suitable generalization capabilities in out-of-domain scenarios. In this paper, a state-of-art system for frame-based SRL is extended through the encapsulation of a distributional model of semantic similarity. The resulting argument classification model promotes a simpler feature space that limits the potential overfitting effects. The large scale empirical study here discussed confirms that state-of-art accuracy can be obtained for out-of-domain evaluations.

4 0.77347779 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

Author: Omri Abend ; Ari Rappoport

Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.

5 0.7577228 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

Author: Junhui Li ; Guodong Zhou ; Hwee Tou Ng

Abstract: This paper explores joint syntactic and semantic parsing of Chinese to further improve the performance of both syntactic and semantic parsing, in particular the performance of semantic parsing (in this paper, semantic role labeling). This is done from two levels. Firstly, an integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Secondly, semantic information generated by semantic parsing is incorporated into the syntactic parsing model to better capture semantic information in syntactic parsing. Evaluation on Chinese TreeBank, Chinese PropBank, and Chinese NomBank shows that our integrated parsing approach outperforms the pipeline parsing approach on n-best parse trees, a natural extension of the widely used pipeline parsing approach on the top-best parse tree. Moreover, it shows that incorporating semantic role-related information into the syntactic parsing model significantly improves the performance of both syntactic parsing and semantic parsing. To our best knowledge, this is the first research on exploring syntactic parsing and semantic role labeling for both verbal and nominal predicates in an integrated way. 1

6 0.74797946 207 acl-2010-Semantics-Driven Shallow Parsing for Chinese Semantic Role Labeling

7 0.73697108 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

8 0.71771538 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

9 0.70249623 25 acl-2010-Adapting Self-Training for Semantic Role Labeling

10 0.66301924 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

11 0.66249859 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

12 0.56477934 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

13 0.50809437 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs

14 0.48263517 248 acl-2010-Unsupervised Ontology Induction from Text

15 0.46114329 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

16 0.45450816 130 acl-2010-Hard Constraints for Grammatical Function Labelling

17 0.45187786 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet

18 0.44703001 158 acl-2010-Latent Variable Models of Selectional Preference

19 0.44380537 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

20 0.43608484 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.016), (25, 0.063), (39, 0.011), (42, 0.026), (59, 0.103), (73, 0.033), (78, 0.08), (80, 0.013), (83, 0.068), (84, 0.398), (98, 0.095)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96800226 126 acl-2010-GernEdiT - The GermaNet Editing Tool

Author: Verena Henrich ; Erhard Hinrichs

Abstract: GernEdiT (short for: GermaNet Editing Tool) offers a graphical interface for the lexicographers and developers of GermaNet to access and modify the underlying GermaNet resource. GermaNet is a lexical-semantic wordnet that is modeled after the Princeton WordNet for English. The traditional lexicographic development of GermaNet was error prone and time-consuming, mainly due to a complex underlying data format and no opportunity of automatic consistency checks. GernEdiT replaces the earlier development by a more userfriendly tool, which facilitates automatic checking of internal consistency and correctness of the linguistic resource. This paper pre- sents all these core functionalities of GernEdiT along with details about its usage and usability. 1

2 0.94837397 103 acl-2010-Estimating Strictly Piecewise Distributions

Author: Jeffrey Heinz ; James Rogers

Abstract: Strictly Piecewise (SP) languages are a subclass of regular languages which encode certain kinds of long-distance dependencies that are found in natural languages. Like the classes in the Chomsky and Subregular hierarchies, there are many independently converging characterizations of the SP class (Rogers et al., to appear). Here we define SP distributions and show that they can be efficiently estimated from positive data.

3 0.85036027 220 acl-2010-Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure

Author: Jeff Mitchell ; Mirella Lapata ; Vera Demberg ; Frank Keller

Abstract: The analysis of reading times can provide insights into the processes that underlie language comprehension, with longer reading times indicating greater cognitive load. There is evidence that the language processor is highly predictive, such that prior context allows upcoming linguistic material to be anticipated. Previous work has investigated the contributions of semantic and syntactic contexts in isolation, essentially treating them as independent factors. In this paper we analyze reading times in terms of a single predictive measure which integrates a model of semantic composition with an incremental parser and a language model.

same-paper 4 0.84347475 216 acl-2010-Starting from Scratch in Semantic Role Labeling

Author: Michael Connor ; Yael Gertner ; Cynthia Fisher ; Dan Roth

Abstract: A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Each step depends on prior lexical and syntactic knowledge. Where do children learning their first languages begin in solving this problem? In this paper we focus on the parsing and argumentidentification steps that precede Semantic Role Labeling (SRL) training. We combine a simplified SRL with an unsupervised HMM part of speech tagger, and experiment with psycholinguisticallymotivated ways to label clusters resulting from the HMM so that they can be used to parse input for the SRL system. The results show that proposed shallow representations of sentence structure are robust to reductions in parsing accuracy, and that the contribution of alternative representations of sentence structure to successful semantic role labeling varies with the integrity of the parsing and argumentidentification stages.

5 0.74357313 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis

Author: Georgios Paltoglou ; Mike Thelwall

Abstract: Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing. The techniques are tested on a wide selection of data sets and produce the best accuracy to our knowledge.

6 0.69473881 136 acl-2010-How Many Words Is a Picture Worth? Automatic Caption Generation for News Images

7 0.65591168 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser

8 0.64939177 59 acl-2010-Cognitively Plausible Models of Human Language Processing

9 0.57094538 13 acl-2010-A Rational Model of Eye Movement Control in Reading

10 0.5650968 217 acl-2010-String Extension Learning

11 0.56005245 66 acl-2010-Compositional Matrix-Space Models of Language

12 0.53795213 158 acl-2010-Latent Variable Models of Selectional Preference

13 0.53631932 108 acl-2010-Expanding Verb Coverage in Cyc with VerbNet

14 0.53476238 67 acl-2010-Computing Weakest Readings

15 0.52942318 175 acl-2010-Models of Metaphor in NLP

16 0.52679497 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

17 0.52360272 162 acl-2010-Learning Common Grammar from Multilingual Corpus

18 0.51922613 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs

19 0.51850772 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

20 0.51683652 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses