acl acl2010 acl2010-138 knowledge-graph by maker-knowledge-mining

138 acl-2010-Hunting for the Black Swan: Risk Mining from Text


Source: pdf

Author: Jochen Leidner ; Frank Schilder

Abstract: In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk taxonomy with concrete risks (e.g., interest rate changes) at its leaves and more abstract risks (e.g., financial risks) closer to its root node. The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract In the business world, analyzing and dealing with risk permeates all decisions and actions. [sent-5, score-0.694]

2 However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. [sent-6, score-1.294]

3 In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. [sent-7, score-0.728]

4 In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. [sent-8, score-0.617]

5 We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. [sent-9, score-0.658]

6 We describe a system that induces a risk taxonomy with concrete risks (e. [sent-10, score-1.347]

7 , interest rate changes) at its leaves and more abstract risks (e. [sent-12, score-0.57]

8 The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. [sent-15, score-0.105]

9 The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. [sent-16, score-2.887]

10 Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls. [sent-17, score-0.161]

11 In business, companies are exposed to market risks such as new competitors, disruptive technologies, change in customer attitudes, or a changes in government legislation that can dramatically affect their profitability or threaten their business model or mode of operation. [sent-19, score-0.708]

12 Therefore, any tool to assist in the elicitation of otherwise unforeseen risk factors carries tremendous potential value. [sent-20, score-0.663]

13 Nassim Nicholas Taleb calls these “black swans” (Taleb, 2007). [sent-22, score-0.032]

14 Companies in the US are required to disclose a list of potential risks in their annual Form 10-K SEC fillings in order to warn (potential) investors, and risks are frequently the topic of conference phone calls about a company’s earnings. [sent-23, score-1.172]

15 These risks are often reported in general terms, in particular, because it is quite difficult to pinpoint the unknown unknown, i. [sent-24, score-0.603]

16 what kind of risk is concretely going to materialize. [sent-26, score-0.647]

17 On the other hand, there is a stream of valuable evidence available on the Web, such as news messages, blog entries, and analysts’ reports talking about companies’ performance and products. [sent-27, score-0.084]

18 Financial analysts and risk officers in large companies have not enjoyed any text analytics support so far, and risk lists devised using questionnaires or interviews are unlikely to be exhaustive due to small sample size, a gap which we aim to address in this paper. [sent-28, score-1.38]

19 To this end, we propose to use a combination of Web Mining (WM) and Information Eextraction (IE) to assist humans interested in risk (with respect to an organization) and to bridge the gap between the general language and concrete risks. [sent-29, score-0.688]

20 We describe our system, which is divided in two main parts: (a) an offline Risk Miner that facilitates the risk identification step ofthe risk management process, and an online (b) Risk Monitor that supports the risk monitoring step (cf. [sent-30, score-1.996]

21 In addition, a Risk Mapper can aggregate and visualize the evidence in the form of a risk map. [sent-32, score-0.647]

22 Our risk mining algorithm combines Riloff hyponym patterns with recursive Web pattern bootstrapping and a graph representation. [sent-33, score-0.78]

23 We do not know of any other implemented endto-end system for computer-assisted risk identification/visualization using text mining technology. [sent-34, score-0.688]

24 IE systems have been applied to the financial domain on Message Understanding Contest (MUC) like tasks, ranging from named entity tagging to slot filling in templates (Costantino, 1992). [sent-38, score-0.133]

25 , 2008), which was designed to extract hyponymy, but they did so at the expense of recall, using longer dual anchored patterns and a pattern linkage graph. [sent-45, score-0.126]

26 Also, they create a set of pairs, whereas our approach creates a taxonomy tree as output. [sent-52, score-0.105]

27 Most importantly though, our approach is not driven by frequency, and was instead designed to work especially with rare occurrences in mind to permit “black swan”-type risk discovery. [sent-53, score-0.647]

28 , 2009) study the correlation between share price volatility, a proxy for risk, and a set of trigger words occurring in 60,000 SEC 10-K filings from 1995-2006. [sent-56, score-0.072]

29 Since the disclosure of a company’s risks is mandatory by law, SEC reports provide a rich source. [sent-57, score-0.605]

30 Their trigger words are selected a priori by humans; in contrast, risk mining as exercised in this paper aims to find risk-indicative words and phrases automatically. [sent-58, score-0.721]

31 Kogan and colleagues attempt to find a regression model using very simple unigram features based on whole documents that predicts volatility, whereas our goal is to automatically extract patterns to be used as alerts. [sent-59, score-0.056]

32 , 2004) found that sub-string matching of 14 pre-defined string literals outperforms an SVM classifier using bag-of-words features in the task of speculative language detection in medical abstracts. [sent-63, score-0.042]

33 They use a bi-partite graph-based approach, where one kind of node (content node) represents things people wish for (“world peace”) and the other kind of node (template nodes) represent templates that extract them (e. [sent-66, score-0.069]

34 3 Data We apply the mined risk extraction patterns to a corpus of financial documents. [sent-70, score-0.803]

35 In particular, we are dealing with 170k earning calls transcripts, a text type that contains monologue (company executives reporting about their company’s performance and general situation) as well as dialogue (in the form of questions and answers at the end of each conference call). [sent-72, score-0.073]

36 Participants typically include select business analysts from investment banks, and the calls are published afterwards for the shareholders’ benefits. [sent-73, score-0.112]

37 We randomly took a sample of N=6,185 transcripts to use them in our risk alerting experiments. [sent-75, score-0.704]

38 For demonstration purposes, we add a (c) Risk Mapper, a visualization component. [sent-79, score-0.026]

39 We describe how a variety ofrisks can be identified given a normally very high-level description of risks, as one can find in earnings reports, other finan- cial news, or the risk section of 10-K SEC filings. [sent-80, score-0.73]

40 Also, the three Lewisburg area warehouses will be consolidated as we assess the logistical needs of the casegood group’s existing warehouse operations at an appropriate time in the future to minimize any disruption of service to our customers. [sent-83, score-0.041]

41 This will result in the loss of 425 jobs or approximately 15% of the casegood group’s current employee base. [sent-84, score-0.041]

42 Idon’t know the net equipment sales number last quarter and this quarter. [sent-86, score-0.072]

43 But it sounded like from your comments that if you exclude these fees, that equipment sales were probably flattish. [sent-87, score-0.054]

44 CEO: We’re not breaking out the origination fee from the equipment fee, but Ithink in total, Iwould say flattish to slightly up. [sent-89, score-0.063]

45 Figure 1: Example sentences from the earnings conference call dataset. [sent-90, score-0.083]

46 ; and eventually more concrete, candidates, and relate them to risk types via a transitive chain of binary IS-A relations. [sent-93, score-0.647]

47 Contrary to the related work, we use a base NP chunker and download the full pages returned by the search engine rather than search snippets in order to be able to extract risk phrases rather than just terms, which reduces contextual ambiguity and thus increases overall precision. [sent-94, score-0.691]

48 The taxonomy learning method described in the following subsection determines a risk taxonomy and new risks patterns. [sent-95, score-1.427]

49 architecture The second part of the system, the Risk Monitor, takes the risks from the risk taxonomy and uses them for monitoring financial text streams such as news, SEC filings, or (in our use case) earnings reports. [sent-96, score-1.55]

50 Using this, an analyst is then able to identify concrete risks in news messages and link them to the high-level risk descriptions. [sent-97, score-1.288]

51 He or she may want to identify operational risks such as fraud for a particular company, for instance. [sent-98, score-0.634]

52 The risk taxonomy can also derive further risks in this category (e. [sent-99, score-1.322]

53 Iceland) can be directly linked to the risk as stated in earnings reports or security filings. [sent-106, score-0.765]

54 2 Taxonomy induction method Using frequency to compute confidence in a pattern does not work for risk mining, however, because mention of particular risks might be rare. [sent-110, score-1.269]

55 Instead of frequency based indicators (n-grams, frequency weights), we rely on two types of structural confidence validation, namely (a) previously identified risks and (b) previously acquired structural patterns. [sent-111, score-0.57]

56 Note, however, that we can still use PageRank, a popularity-based graph algorithm, because multiple patterns might be connected to a risk term or phrase, even in the absence of frequency counts for each (i. [sent-112, score-0.734]

57 The first step is used to extract a list of risks based on high precision patterns. [sent-117, score-0.586]

58 However, it has been shown that the use of such patterns (e. [sent-118, score-0.04]

59 Ideally, we want to retrieve specific risks by re-applying the the extract risk descriptions: 2http : / /www . [sent-121, score-1.233]

60 (a) Take a seed, instantiate " < SEED > such as * " pattern with seed, extract candidates: Input: risks Method: apply pattern " < SEED > such as < INSTANCE > ", where < SEED > = risks Outpnuetn:ts l)ist of instances (e. [sent-124, score-1.26]

61 , faulty compo(b) For each candidate from the list of instances, we find a set of additional candidate hyponyms. [sent-126, score-0.108]

62 Input: faulty components Method: apply pattern " < SEED > such as < INSTANCE > ", where < SEED > = faulty components Output: list of instances (e. [sent-127, score-0.188]

63 Since the Risk Candidate extraction step will also find many false positives, we need to factor in information that validates that the extracted risk is indeed a risk. [sent-131, score-0.647]

64 We do this by constructing a possible pattern containing this new risk. [sent-132, score-0.052]

65 (a) Append " * risks " to the output of 1(b) in order to make sure that the candidate occurs in a risk context. [sent-133, score-1.237]

66 Input: brake(s) Pattern: "brake ( s ) * risk ( s ) " Output: a list of patterns (e. [sent-134, score-0.687]

67 We have now reached the point where we constructed a graph with risks and patterns. [sent-139, score-0.57]

68 Risks are connected via IS-A links; risks and patterns are connected via PATTERN links. [sent-140, score-0.67]

69 Note that there are links from risks to patterns and from patterns to risks; some risks back-pointed by a pattern may actually not be a risk (e. [sent-141, score-1.919]

70 However, this node is also not connected to a more abstract risk node and will therefore have a low PageRank score. [sent-144, score-0.713]

71 Risks that are connected to patterns that have a high authority (i. [sent-145, score-0.07]

72 The risk black Swan, for example, has only one pattern it occurs in, but this pattern can be filled by many other risks (e. [sent-148, score-1.36]

73 Hence, the PageRank score of the black swan is high similar to well known risks, such as fraud. [sent-151, score-0.107]

74 3 Risk alerting method We compile the risk taxonomy into a trie automaton, and create a second trie for company names from the meta-data of our corpus. [sent-153, score-0.953]

75 The Risk Monitor reads the two tries and uses the first to detect mentions of risks in the earning reports and the second one to tag company names, both using case-insensitive matching for better recall. [sent-154, score-0.741]

76 Optionally, we can use Porter stemming during trie construction and matching to trade precision for even higher recall, but in the experiments reported here this is not used. [sent-155, score-0.041]

77 count for this hcompany; risk typei tuple, which we use foorr t graphic rendering purposes. [sent-157, score-0.647]

78 The second option also permits the user to explore the detected risk mentions per company and by risk type. [sent-160, score-1.389]

79 5 Results From the Web mining process, we obtain a set of pairs (Figure 4), from which the taxonomy is constructed. [sent-161, score-0.146]

80 In one run with only 12 seeds (just the risk type names with variants), we obtained a taxonomy with 280 validated leave nodes that are connected transitively to the risks root node. [sent-162, score-1.392]

81 Our resulting system produces visualizations we call “risk maps”, because they graphically present the extracted risk types in aggregated form. [sent-163, score-0.647]

82 A set of risk types can be selected for presentation as well as a set of companies of interest. [sent-164, score-0.7]

83 A risk map display is then generated using either R (Figure 5) or an interactive Web page, depending on the user’s preference. [sent-165, score-0.647]

84 We inspected the output of the risk miner and observed the follow- Figure 5: An Example Risk Map. [sent-167, score-0.715]

85 ing classes of issues: (a) chunker errors: if phrasal boundaries are placed at the wrong position, the taxonomy will include wrong relations. [sent-168, score-0.133]

86 that I -A indi re ct ris k s) beS fore we introduced a stop word filter that discards candidate tuples that contain no content words. [sent-171, score-0.261]

87 Another prominent example is “short term” instead of the correct “short term risk”; (b) semantic drift3 : due to polysemy, words and phrases can denote risk and non-risk meanings, depending on context. [sent-172, score-0.664]

88 is cash flow primarily an operational risk or a financial risk? [sent-179, score-0.8]

89 We also found that some classes contain more noise than others, for example operational risk was less precise than financial risk, probably due to the lesser specificity of the former risk type. [sent-187, score-1.447]

90 In this paper, we introduced the task ofrisk mining, which produces patterns that are useful in another task, risk alerting. [sent-189, score-0.687]

91 Both tasks provide com- putational assistance to risk-related decision making in the financial sector. [sent-190, score-0.116]

92 We described a specialpurpose algorithm for inducing a risk taxonomy offline, which can then be used online to analyze earning reports in order to signal risks. [sent-191, score-0.828]

93 how to match up terms and phrases in financial news prose with the more abstract language typically used in talking about risk in general. [sent-194, score-0.812]

94 We have described an implemented demonstrator system comprising an offline risk taxonomy miner, an online risk alerter and a visualization component that creates visual risk maps by company and risk type, which we have applied to a corpus of earnings call transcripts. [sent-195, score-2.924]

95 Extracted negative and also positive risks can be used in many applications, ranging from e-mail alerts to determinating credit ratings. [sent-197, score-0.62]

96 Our preliminary work on risk maps can be put on a more theoretical footing (Hunter, 2000). [sent-198, score-0.665]

97 After studying further how output of risk alerting correlates4 with non-textual signals like share price, risk detection signals could inform human or trading decisions. [sent-199, score-1.365]

98 4Our hypothesis is that risk patterns can outperform bag of words (Kogan et al. [sent-203, score-0.687]

99 May all your wishes come true: A study of wishes and how to recognize them. [sent-227, score-0.088]

100 Semantic class learning from the web with hyponym pattern linkage graphs. [sent-247, score-0.114]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('risk', 0.647), ('risks', 0.57), ('ris', 0.241), ('ks', 0.144), ('financial', 0.116), ('taxonomy', 0.105), ('earnings', 0.083), ('company', 0.078), ('faulty', 0.068), ('miner', 0.068), ('swan', 0.068), ('monitor', 0.061), ('kogan', 0.054), ('companies', 0.053), ('pattern', 0.052), ('sec', 0.048), ('business', 0.047), ('web', 0.044), ('wishes', 0.044), ('pagerank', 0.042), ('cal', 0.042), ('mining', 0.041), ('alerting', 0.041), ('casegood', 0.041), ('earning', 0.041), ('filings', 0.041), ('financi', 0.041), ('trie', 0.041), ('volatility', 0.041), ('patterns', 0.04), ('black', 0.039), ('market', 0.038), ('operational', 0.037), ('legal', 0.037), ('ional', 0.036), ('mapper', 0.036), ('brake', 0.036), ('equipment', 0.036), ('operat', 0.036), ('reports', 0.035), ('seed', 0.034), ('unknown', 0.033), ('analysts', 0.033), ('calls', 0.032), ('connected', 0.03), ('monitoring', 0.029), ('al', 0.028), ('chunker', 0.028), ('thomson', 0.028), ('alerts', 0.027), ('bryan', 0.027), ('envi', 0.027), ('fee', 0.027), ('fraud', 0.027), ('lewisburg', 0.027), ('nassim', 0.027), ('ronment', 0.027), ('starmine', 0.027), ('streetevents', 0.027), ('taleb', 0.027), ('reuters', 0.026), ('visualization', 0.026), ('offline', 0.026), ('talking', 0.025), ('concrete', 0.025), ('news', 0.024), ('saeger', 0.024), ('transitively', 0.024), ('speculative', 0.024), ('credit', 0.023), ('analyst', 0.022), ('candidate', 0.02), ('pennsylvania', 0.02), ('sales', 0.018), ('kozareva', 0.018), ('craig', 0.018), ('rat', 0.018), ('quarter', 0.018), ('light', 0.018), ('node', 0.018), ('maps', 0.018), ('linkage', 0.018), ('priori', 0.018), ('medical', 0.018), ('mentions', 0.017), ('templates', 0.017), ('dis', 0.017), ('descriptions', 0.017), ('term', 0.017), ('price', 0.016), ('house', 0.016), ('nicholas', 0.016), ('transcripts', 0.016), ('ceo', 0.016), ('extract', 0.016), ('assist', 0.016), ('validated', 0.016), ('ie', 0.016), ('signals', 0.015), ('trigger', 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999869 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text

Author: Jochen Leidner ; Frank Schilder

Abstract: In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk taxonomy with concrete risks (e.g., interest rate changes) at its leaves and more abstract risks (e.g., financial risks) closer to its root node. The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls.

2 0.098190956 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

Author: Shih-Hsiang Lin ; Berlin Chen

Abstract: In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. In addition, the introduction of various loss functions also provides the summarization framework with a flexible but systematic way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. Experiments on speech summarization show that the methods deduced from our framework are very competitive with existing summarization approaches. 1

3 0.079812184 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

Author: Karin Murthy ; Tanveer A Faruquie ; L Venkata Subramaniam ; Hima Prasad K ; Mukesh Mohania

Abstract: We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data.

4 0.046845179 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

Author: Ashwin Ittoo ; Gosse Bouma

Abstract: An important relation in information extraction is the part-whole relation. Ontological studies mention several types of this relation. In this paper, we show that the traditional practice of initializing minimally-supervised algorithms with a single set that mixes seeds of different types fails to capture the wide variety of part-whole patterns and tuples. The results obtained with mixed seeds ultimately converge to one of the part-whole relation types. We also demonstrate that all the different types of part-whole relations can still be discovered, regardless of the type characterized by the initializing seeds. We performed our experiments with a state-ofthe-art information extraction algorithm. 1

5 0.04653284 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

Author: Zornitsa Kozareva ; Eduard Hovy

Abstract: A challenging problem in open information extraction and text mining is the learning of the selectional restrictions of semantic relations. We propose a minimally supervised bootstrapping algorithm that uses a single seed and a recursive lexico-syntactic pattern to learn the arguments and the supertypes of a diverse set of semantic relations from the Web. We evaluate the performance of our algorithm on multiple semantic relations expressed using “verb”, “noun”, and “verb prep” lexico-syntactic patterns. Humanbased evaluation shows that the accuracy of the harvested information is about 90%. We also compare our results with existing knowledge base to outline the similarities and differences of the granularity and diversity of the harvested knowledge.

6 0.045405842 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

7 0.036514491 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

8 0.035141215 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

9 0.033040635 27 acl-2010-An Active Learning Approach to Finding Related Terms

10 0.031900849 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

11 0.030877281 238 acl-2010-Towards Open-Domain Semantic Role Labeling

12 0.030821433 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

13 0.030396629 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

14 0.030147037 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

15 0.028894598 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

16 0.028040607 130 acl-2010-Hard Constraints for Grammatical Function Labelling

17 0.027844653 127 acl-2010-Global Learning of Focused Entailment Graphs

18 0.027028278 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences

19 0.027000004 85 acl-2010-Detecting Experiences from Weblogs

20 0.026545055 204 acl-2010-Recommendation in Internet Forums and Blogs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.086), (1, 0.039), (2, -0.026), (3, -0.019), (4, 0.011), (5, -0.016), (6, 0.004), (7, -0.025), (8, -0.015), (9, -0.027), (10, -0.014), (11, 0.015), (12, -0.033), (13, -0.054), (14, 0.043), (15, 0.047), (16, 0.026), (17, 0.032), (18, -0.02), (19, 0.011), (20, -0.003), (21, 0.019), (22, 0.003), (23, -0.007), (24, -0.042), (25, -0.027), (26, -0.078), (27, 0.069), (28, 0.023), (29, 0.036), (30, -0.026), (31, 0.06), (32, 0.042), (33, 0.0), (34, 0.03), (35, -0.062), (36, -0.048), (37, 0.068), (38, 0.022), (39, 0.082), (40, -0.041), (41, -0.035), (42, 0.088), (43, 0.07), (44, 0.058), (45, 0.047), (46, -0.051), (47, -0.039), (48, 0.064), (49, 0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94794494 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text

Author: Jochen Leidner ; Frank Schilder

Abstract: In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk taxonomy with concrete risks (e.g., interest rate changes) at its leaves and more abstract risks (e.g., financial risks) closer to its root node. The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls.

2 0.74374658 43 acl-2010-Automatically Generating Term Frequency Induced Taxonomies

Author: Karin Murthy ; Tanveer A Faruquie ; L Venkata Subramaniam ; Hima Prasad K ; Mukesh Mohania

Abstract: We propose a novel method to automatically acquire a term-frequency-based taxonomy from a corpus using an unsupervised method. A term-frequency-based taxonomy is useful for application domains where the frequency with which terms occur on their own and in combination with other terms imposes a natural term hierarchy. We highlight an application for our approach and demonstrate its effectiveness and robustness in extracting knowledge from real-world data.

3 0.69328409 181 acl-2010-On Learning Subtypes of the Part-Whole Relation: Do Not Mix Your Seeds

Author: Ashwin Ittoo ; Gosse Bouma

Abstract: An important relation in information extraction is the part-whole relation. Ontological studies mention several types of this relation. In this paper, we show that the traditional practice of initializing minimally-supervised algorithms with a single set that mixes seeds of different types fails to capture the wide variety of part-whole patterns and tuples. The results obtained with mixed seeds ultimately converge to one of the part-whole relation types. We also demonstrate that all the different types of part-whole relations can still be discovered, regardless of the type characterized by the initializing seeds. We performed our experiments with a state-ofthe-art information extraction algorithm. 1

4 0.60056466 64 acl-2010-Complexity Assumptions in Ontology Verbalisation

Author: Richard Power

Abstract: We describe the strategy currently pursued for verbalising OWL ontologies by sentences in Controlled Natural Language (i.e., combining generic rules for realising logical patterns with ontology-specific lexicons for realising atomic terms for individuals, classes, and properties) and argue that its success depends on assumptions about the complexity of terms and axioms in the ontology. We then show, through analysis of a corpus of ontologies, that although these assumptions could in principle be violated, they are overwhelmingly respected in practice by ontology developers.

5 0.59085 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns

Author: Zornitsa Kozareva ; Eduard Hovy

Abstract: A challenging problem in open information extraction and text mining is the learning of the selectional restrictions of semantic relations. We propose a minimally supervised bootstrapping algorithm that uses a single seed and a recursive lexico-syntactic pattern to learn the arguments and the supertypes of a diverse set of semantic relations from the Web. We evaluate the performance of our algorithm on multiple semantic relations expressed using “verb”, “noun”, and “verb prep” lexico-syntactic patterns. Humanbased evaluation shows that the accuracy of the harvested information is about 90%. We also compare our results with existing knowledge base to outline the similarities and differences of the granularity and diversity of the harvested knowledge.

6 0.55440623 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

7 0.49444959 61 acl-2010-Combining Data and Mathematical Models of Language Change

8 0.47123507 92 acl-2010-Don't 'Have a Clue'? Unsupervised Co-Learning of Downward-Entailing Operators.

9 0.47076914 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies

10 0.44465005 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction

11 0.43720651 224 acl-2010-Talking NPCs in a Virtual Game World

12 0.43409604 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

13 0.39431906 63 acl-2010-Comparable Entity Mining from Comparative Questions

14 0.38747525 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

15 0.37301937 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

16 0.36701146 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

17 0.36521816 176 acl-2010-Mood Patterns and Affective Lexicon Access in Weblogs

18 0.36450502 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

19 0.36439157 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

20 0.36288923 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.012), (25, 0.054), (28, 0.014), (33, 0.01), (39, 0.012), (42, 0.041), (44, 0.016), (45, 0.331), (59, 0.063), (72, 0.021), (73, 0.055), (76, 0.02), (78, 0.034), (80, 0.023), (83, 0.071), (84, 0.036), (97, 0.01), (98, 0.074)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79369277 138 acl-2010-Hunting for the Black Swan: Risk Mining from Text

Author: Jochen Leidner ; Frank Schilder

Abstract: In the business world, analyzing and dealing with risk permeates all decisions and actions. However, to date, risk identification, the first step in the risk management cycle, has always been a manual activity with little to no intelligent software tool support. In addition, although companies are required to list risks to their business in their annual SEC filings in the USA, these descriptions are often very highlevel and vague. In this paper, we introduce Risk Mining, which is the task of identifying a set of risks pertaining to a business area or entity. We argue that by combining Web mining and Information Extraction (IE) techniques, risks can be detected automatically before they materialize, thus providing valuable business intelligence. We describe a system that induces a risk taxonomy with concrete risks (e.g., interest rate changes) at its leaves and more abstract risks (e.g., financial risks) closer to its root node. The taxonomy is induced via a bootstrapping algorithms starting with a few seeds. The risk taxonomy is used by the system as input to a risk monitor that matches risk mentions in financial documents to the abstract risk types, thus bridging a lexical gap. Our system is able to automatically generate company specific “risk maps”, which we demonstrate for a corpus of earnings report conference calls.

2 0.70991492 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation

Author: Kornel Laskowski

Abstract: Substantial research effort has been invested in recent decades into the computational study and automatic processing of multi-party conversation. While most aspects of conversational speech have benefited from a wide availability of analytic, computationally tractable techniques, only qualitative assessments are available for characterizing multi-party turn-taking. The current paper attempts to address this deficiency by first proposing a framework for computing turn-taking model perplexity, and then by evaluating several multi-participant modeling approaches. Experiments show that direct multi-participant models do not generalize to held out data, and likely never will, for practical reasons. In contrast, the Extended-Degree-of-Overlap model represents a suitable candidate for future work in this area, and is shown to successfully predict the distribution of speech in time and across participants in previously unseen conversations.

3 0.61529553 100 acl-2010-Enhanced Word Decomposition by Calibrating the Decision Threshold of Probabilistic Models and Using a Model Ensemble

Author: Sebastian Spiegler ; Peter A. Flach

Abstract: This paper demonstrates that the use of ensemble methods and carefully calibrating the decision threshold can significantly improve the performance of machine learning methods for morphological word decomposition. We employ two algorithms which come from a family of generative probabilistic models. The models consider segment boundaries as hidden variables and include probabilities for letter transitions within segments. The advantage of this model family is that it can learn from small datasets and easily gen- eralises to larger datasets. The first algorithm PROMODES, which participated in the Morpho Challenge 2009 (an international competition for unsupervised morphological analysis) employs a lower order model whereas the second algorithm PROMODES-H is a novel development of the first using a higher order model. We present the mathematical description for both algorithms, conduct experiments on the morphologically rich language Zulu and compare characteristics of both algorithms based on the experimental results.

4 0.411246 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

Author: Niklas Jakob ; Iryna Gurevych

Abstract: unkown-abstract

5 0.40407413 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

Author: David Jurgens ; Keith Stevens

Abstract: We present the S-Space Package, an open source framework for developing and evaluating word space algorithms. The package implements well-known word space algorithms, such as LSA, and provides a comprehensive set of matrix utilities and data structures for extending new or existing models. The package also includes word space benchmarks for evaluation. Both algorithms and libraries are designed for high concurrency and scalability. We demonstrate the efficiency of the reference implementations and also provide their results on six benchmarks.

6 0.40384141 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

7 0.40128657 158 acl-2010-Latent Variable Models of Selectional Preference

8 0.40108982 121 acl-2010-Generating Entailment Rules from FrameNet

9 0.40035805 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

10 0.40019453 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

11 0.4001779 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

12 0.39955211 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

13 0.39920264 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

14 0.39720243 248 acl-2010-Unsupervised Ontology Induction from Text

15 0.39703417 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

16 0.39672369 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

17 0.39618286 71 acl-2010-Convolution Kernel over Packed Parse Forest

18 0.39589286 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web

19 0.39554626 214 acl-2010-Sparsity in Dependency Grammar Induction

20 0.39453027 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns