brendan_oconnor_ai brendan_oconnor_ai-2014 brendan_oconnor_ai-2014-203 knowledge-graph by maker-knowledge-mining

203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean


meta infos for this blog

Source: html

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 IMPLEMENTATION AND SOUNDNESS (1-5) Has the application or tool been fully implemented or do certain parts of the system remain to be implemented? [sent-33, score-0.804]

2 Is enough detail provided that one might be able to replicate the application or tool with some effort? [sent-35, score-0.679]

3 5 = The application or tool is fully implemented, and the claims are convincingly supported. [sent-37, score-0.72]

4 4 = Generally solid work, although there are some aspects of the application or tool that still need work, and/or some claims that should be better illustrated and supported. [sent-39, score-0.931]

5 The main claims are illustrated to some extent with examples, but I am not entirely ready to accept that the application or tool can do everything that it should (based on the material in the paper). [sent-41, score-0.653]

6 There are some aspects that might be good, but the application or tool has several deficiencies and/or limitations that make it premature. [sent-43, score-0.835]

7 EVALUATION (1-5) To what extent has the application or tool been tested and evaluated? [sent-54, score-0.652]

8 5 = The application or tool has been thoroughly tested. [sent-56, score-0.541]

9 4 = The application or tool has been tested and evaluated on a reasonable corpus or with a small set of users. [sent-59, score-0.736]

10 3 = The application or tool has been tested and evaluated to a limited extent. [sent-62, score-0.826]

11 2 = A few test cases have been run on the application or tool but no significant evaluation or user study has been performed. [sent-64, score-0.747]

12 1 = The application or tool has not been tested or evaluated. [sent-65, score-0.652]

13 3 = Bibliography and comparison are somewhat helpful, but it could be hard for a reader to determine exactly how this work relates to previous work or what its benefits and limitations are. [sent-73, score-0.793]

14 Does the system represent a significant and important advance in implemented and tested human language technology? [sent-78, score-0.519]

15 4 = Some important advances over previous systems, and likely to impact development work of other research groups. [sent-80, score-0.42]

16 IMPACT OF ACCOMPANYING SOFTWARE (1-5) If software was submitted or released along with the paper, what is the expected impact of the software package? [sent-86, score-0.658]

17 5 = Enabling: The newly released software should affect other people's choice of research or development projects to undertake. [sent-90, score-0.464]

18 2 = Documentary: The new software useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. [sent-93, score-0.765]

19 2 = Documentary: The new datasets are useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. [sent-101, score-0.676]

20 5 = This paper changed my thinking on this topic and I'd fight to get it accepted; 4 = I learned a lot from this paper and would like to see it accepted. [sent-111, score-0.512]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('application', 0.289), ('tool', 0.252), ('limitations', 0.223), ('paper', 0.208), ('benefits', 0.194), ('impact', 0.188), ('software', 0.182), ('replicate', 0.138), ('work', 0.127), ('presented', 0.127), ('comparison', 0.122), ('claims', 0.112), ('tested', 0.111), ('acl', 0.111), ('although', 0.111), ('recommendation', 0.111), ('implemented', 0.111), ('advance', 0.111), ('released', 0.106), ('evaluation', 0.105), ('research', 0.105), ('significant', 0.101), ('accepted', 0.096), ('fight', 0.096), ('missed', 0.096), ('poster', 0.096), ('solid', 0.096), ('datasets', 0.093), ('limited', 0.09), ('ideas', 0.087), ('system', 0.085), ('evaluated', 0.084), ('results', 0.083), ('useful', 0.082), ('new', 0.072), ('affect', 0.071), ('aspects', 0.071), ('dataset', 0.07), ('fully', 0.067), ('tools', 0.067), ('researchers', 0.064), ('related', 0.064), ('accompanying', 0.064), ('attack', 0.064), ('awareness', 0.064), ('bibliography', 0.064), ('developers', 0.064), ('documentary', 0.064), ('enabling', 0.064), ('novelty', 0.064)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0000004 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl

2 0.15394729 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper

Introduction: Update (2013-09-17): See David Bamman ‘s great guest post on Language Log on our latent personas paper, and the big picture of interdisciplinary collaboration. I’ve been informed that an interesting critique of my, David Bamman’s and Noah Smith’s ACL paper on movie personas has appeared on the Language Log, a guest post by Hannah Alpert-Abrams and Dan Garrette . I posted the following as a comment on LL. Thanks everyone for the interesting comments. Scholarship is an ongoing conversation, and we hope our work might contribute to it. Responding to the concerns about our paper , We did not try to make a contribution to contemporary literary theory. Rather, we focus on developing a computational linguistic research method of analyzing characters in stories. We hope there is a place for both the development of new research methods, as well as actual new substantive findings. If you think about the tremendous possibilities for computer science and humanities collabor

3 0.10546765 15 brendan oconnor ai-2005-07-04-freakonomics blog

Introduction: Here it is! Still need to read the book. I’m a little bothered by people proclaiming it to be the first application of economic principles to social questions — hasn’t social economics been around for decades? — but the spirit and approach is right.

4 0.1038409 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

Introduction: Everyone recently seems to be talking about this newish paper by Digrazia, McKelvey, Bollen, and Rojas  ( pdf here ) that examines the correlation of Congressional candidate name mentions on Twitter against whether the candidate won the race.  One of the coauthors also wrote a Washington Post Op-Ed  about it.  I read the paper and I think it’s reasonable, but their op-ed overstates their results.  It claims: “In the 2010 data, our Twitter data predicted the winner in 404 out of 435 competitive races” But this analysis is nowhere in their paper.  Fabio Rojas has now posted errata/rebuttals  about the op-ed and described this analysis they did here.  There are several major issues off the bat: They didn’t ever predict 404/435 races; they only analyzed 406 races they call “competitive,” getting 92.5% (in-sample) accuracy, then extrapolated to all races to get the 435 number. They’re reporting about  in-sample predictions, which is really misleading to a non-scientific audi

5 0.094766289 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

Introduction: I haven’t done a paper review on this blog for a while, so here we go. Coreference resolution is an interesting NLP problem.  ( Examples. )  It involves honest-to-goodness syntactic, semantic, and discourse phenomena, but still seems like a real cognitive task that humans have to solve when reading text [1].  I haven’t read the whole literature, but I’ve always been puzzled by the crop of papers on it I’ve seen in the last year or two.  There’s a big focus on fancy graph/probabilistic/constrained optimization algorithms, but often these papers gloss over the linguistic features — the core information they actually make their decisions with [2].  I never understood why the latter isn’t the most important issue.  Therefore, it was a joy to read Aria Haghighi and Dan Klein, EMNLP-2009.   “Simple Coreference Resolution with Rich Syntactic and Semantic Features.” They describe a simple, essentially non-statistical system that outperforms previous unsupervised systems, and compa

6 0.090222627 174 brendan oconnor ai-2011-09-19-End-to-end NLP packages

7 0.087856695 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

8 0.087371089 131 brendan oconnor ai-2008-12-27-Facebook sentiment mining predicts presidential polls

9 0.087024428 159 brendan oconnor ai-2010-04-14-quick note: cer et al 2010

10 0.081629701 116 brendan oconnor ai-2008-10-08-MyDebates.org, online polling, and potentially the coolest question corpus ever

11 0.080769032 73 brendan oconnor ai-2007-08-05-Are ideas interesting, or are they true?

12 0.080582231 1 brendan oconnor ai-2004-11-20-gintis: theoretical unity in the social sciences

13 0.080570117 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

14 0.077482857 5 brendan oconnor ai-2005-06-25-1st International Conference on Computational Models of Argument (COMMA06)

15 0.076841421 132 brendan oconnor ai-2009-01-07-Love it and hate it, R has come of age

16 0.076569967 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R

17 0.074754469 196 brendan oconnor ai-2013-05-08-Movie summary corpus and learning character personas

18 0.074200265 135 brendan oconnor ai-2009-02-23-Comparison of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata

19 0.074075662 153 brendan oconnor ai-2009-09-08-Patches to Rainbow, the old text classifier that won’t go away

20 0.071290575 63 brendan oconnor ai-2007-06-10-Freak-Freakonomics (Ariel Rubinstein is the shit!)


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, -0.294), (1, -0.04), (2, -0.016), (3, -0.017), (4, 0.06), (5, -0.122), (6, -0.015), (7, -0.003), (8, -0.061), (9, -0.063), (10, 0.127), (11, 0.007), (12, 0.074), (13, 0.068), (14, 0.071), (15, -0.037), (16, 0.113), (17, 0.026), (18, 0.002), (19, 0.008), (20, 0.005), (21, -0.081), (22, 0.105), (23, -0.006), (24, -0.001), (25, -0.006), (26, -0.073), (27, -0.068), (28, 0.023), (29, -0.091), (30, -0.069), (31, -0.075), (32, -0.012), (33, 0.05), (34, 0.042), (35, -0.057), (36, 0.075), (37, 0.105), (38, -0.063), (39, 0.055), (40, -0.082), (41, -0.114), (42, -0.042), (43, 0.073), (44, -0.034), (45, -0.082), (46, 0.053), (47, -0.11), (48, -0.034), (49, -0.029)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.98778462 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl

2 0.71336627 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper

Introduction: Update (2013-09-17): See David Bamman ‘s great guest post on Language Log on our latent personas paper, and the big picture of interdisciplinary collaboration. I’ve been informed that an interesting critique of my, David Bamman’s and Noah Smith’s ACL paper on movie personas has appeared on the Language Log, a guest post by Hannah Alpert-Abrams and Dan Garrette . I posted the following as a comment on LL. Thanks everyone for the interesting comments. Scholarship is an ongoing conversation, and we hope our work might contribute to it. Responding to the concerns about our paper , We did not try to make a contribution to contemporary literary theory. Rather, we focus on developing a computational linguistic research method of analyzing characters in stories. We hope there is a place for both the development of new research methods, as well as actual new substantive findings. If you think about the tremendous possibilities for computer science and humanities collabor

3 0.53862751 139 brendan oconnor ai-2009-04-22-Performance comparison: key-value stores for language model counts

Introduction: I’m doing word and bigram counts on a corpus of tweets. I want to store and rapidly retrieve them later for language model purposes. So there’s a big table of counts that get incremented many times. The easiest way to get something running is to use an open-source key/value store; but which? There’s recently been some development in this area so I thought it would be good to revisit and evaluate some options. Here are timings for a single counting process: iterate over 45,000 short text messages, tokenize them, then increment counters for their unigrams and bigrams. (The speed of the data store is only one component of performance.) There are about 17 increments per tweet: 400k unique terms and 750k total count. This is substantially smaller than what I need, but it’s small enough to easily test. I used several very different architectures and packages, explained below. architecture name speed in-memory, within-process python dictionary 2700 tweets/sec

4 0.52249002 184 brendan oconnor ai-2012-07-04-The $60,000 cat: deep belief networks make less sense for language than vision

Introduction: There was an interesting ICML paper this year about very large-scale training of deep belief networks (a.k.a. neural networks) for unsupervised concept extraction from images. They ( Quoc V. Le and colleagues at Google/Stanford) have a cute example of learning very high-level features that are evoked by images of cats (from YouTube still-image training data); one is shown below. For those of us who work on machine learning and text, the question always comes up, why not DBN’s for language? Many shallow latent-space text models have been quite successful (LSI, LDA, HMM, LPCFG…); there is hope that some sort of “deeper” concepts could be learned. I think this is one of the most interesting areas for unsupervised language modeling right now. But note it’s a bad idea to directly analogize results from image analysis to language analysis. The problems have radically different levels of conceptual abstraction baked-in. Consider the problem of detecting the concept of a cat; i.e.

5 0.50324708 84 brendan oconnor ai-2007-11-26-How did Freud become a respected humanist?!

Introduction: Freud Is Widely Taught at Universities, Except in the Psychology Department : PSYCHOANALYSIS and its ideas about the unconscious mind have spread to every nook and cranny of the culture from Salinger to “South Park,” from Fellini to foreign policy. Yet if you want to learn about psychoanalysis at the nation’s top universities, one of the last places to look may be the psychology department. A new report by the American Psychoanalytic Association has found that while psychoanalysis — or what purports to be psychoanalysis — is alive and well in literature, film, history and just about every other subject in the humanities, psychology departments and textbooks treat it as “desiccated and dead,” a historical artifact instead of “an ongoing movement and a living, evolving process.” I’ve been wondering about this for a while, ever since I heard someone describe Freud as “one of the greatest humanists who ever lived.” I’m pretty sure he didn’t think of himself that way. If you’re a

6 0.48638478 106 brendan oconnor ai-2008-06-17-Pairwise comparisons for relevance evaluation

7 0.48163947 196 brendan oconnor ai-2013-05-08-Movie summary corpus and learning character personas

8 0.48058867 150 brendan oconnor ai-2009-08-08-Haghighi and Klein (2009): Simple Coreference Resolution with Rich Syntactic and Semantic Features

9 0.47100452 176 brendan oconnor ai-2011-10-05-Be careful with dictionary-based text analysis

10 0.46968114 90 brendan oconnor ai-2008-01-20-Moral psychology on Amazon Mechanical Turk

11 0.46382999 159 brendan oconnor ai-2010-04-14-quick note: cer et al 2010

12 0.45857856 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

13 0.44263405 198 brendan oconnor ai-2013-08-20-Some analysis of tweet shares and “predicting” election outcomes

14 0.4399167 174 brendan oconnor ai-2011-09-19-End-to-end NLP packages

15 0.43008119 88 brendan oconnor ai-2008-01-05-Indicators of a crackpot paper

16 0.41700229 154 brendan oconnor ai-2009-09-10-Don’t MAWK AWK – the fastest and most elegant big data munging language!

17 0.40823907 135 brendan oconnor ai-2009-02-23-Comparison of data analysis packages: R, Matlab, SciPy, Excel, SAS, SPSS, Stata

18 0.40418458 15 brendan oconnor ai-2005-07-04-freakonomics blog

19 0.40068349 186 brendan oconnor ai-2012-08-21-Berkeley SDA and the General Social Survey

20 0.38170943 1 brendan oconnor ai-2004-11-20-gintis: theoretical unity in the social sciences


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(12, 0.014), (22, 0.015), (44, 0.079), (48, 0.016), (55, 0.025), (57, 0.025), (59, 0.021), (70, 0.061), (74, 0.616), (80, 0.015), (97, 0.01)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.9964062 203 brendan oconnor ai-2014-02-19-What the ACL-2014 review scores mean

Introduction: I’ve had several people ask me what the numbers in ACL reviews mean — and I can’t find anywhere online where they’re described. (Can anyone point this out if it is somewhere?) So here’s the review form, below. They all go from 1 to 5, with 5 the best. I think the review emails to authors only include a subset of the below — for example, “Overall Recommendation” is not included? The CFP said that they have different types of review forms for different types of papers. I think this one is for a standard full paper. I guess what people really want to know is what scores tend to correspond to acceptances. I really have no idea and I get the impression this can change year to year. I have no involvement with the ACL conference besides being one of many, many reviewers. APPROPRIATENESS (1-5) Does the paper fit in ACL 2014? (Please answer this question in light of the desire to broaden the scope of the research areas represented at ACL.) 5: Certainly. 4: Probabl

2 0.99307477 26 brendan oconnor ai-2005-09-02-cognitive modelling is rational choice++

Introduction: Rational choice has been a huge imperialistic success, growing in popularity and being applied to more and more fields. Why is this? It’s not because the rational choice model of decision-making is particularly realistic. Rather, it’s because rational choice is a completely specified theory of human behavior , and therefore is great at generating hypotheses. Given any situation involving people, rational choice can be used to generate a hypothesis about what to expect. That is, you just ask, “What would a person do to maximize their own benefit?” Similar things have been said about evolutionary psychology: you can always predict behavior by asking “what would hunter-gatherers do?” Now, certainly both rational choice and evolutionary psychology don’t always generate correct hypotheses, but they’re incredibly useful because they at least give you a starting point. Witness the theory of bounded rationality: just like rational choice, except amended to consider computational l

3 0.99187183 19 brendan oconnor ai-2005-07-09-the psychology of design as explanation

Introduction: Since I posted the link to his blog, Baron just wrote about Cardinal Schönborn’s anti-evolution Op-Ed piece . I agree absolutely that people should learn about the psychology of judgment and probability for these sorts of questions, where it’s really hard to understand that random processes can generate things that seem not so random. I’m still thinking about how the psychology of judgment plays in to the analysis below . I have a feeling that people’s intuitions are usually too hospitable for explanations based on intention. E.g.: People are poor, therefore someone is trying to make them poor. Organizations (corportations, governments) do things, therefore someone (say, at the top) ordered them to do these things. Natural disasters happen, therefore someone is wishing them upon us. Etc., etc. I’m still not sure how a bayesian dissection of whether “looks intentful” implies “is intentful” shows us whether such an “intent-seeking” bias (hey, I have to call it something) is

4 0.9918049 105 brendan oconnor ai-2008-06-05-Clinton-Obama support visualization

Introduction: This interactive histogram is brilliant. The NYT data visualization folks never fail to impress. margins.swf (application/x-shockwave-flash Object)

5 0.99140865 77 brendan oconnor ai-2007-09-15-Dollar auction

Introduction: I got nervous and panicky just reading about this game. I wonder if I could con some people into playing it. Economics professors have a standard game they use to demonstrate how apparently rational decisions can create a disastrous result. They call it a “dollar auction.” The rules are simple. The professor offers a dollar for sale to the highest bidder, with only one wrinkle: the second-highest bidder has to pay up on their losing bid as well. Several students almost always get sucked in. The first bids a penny, looking to make 99 cents. The second bids 2 cents, the third 3 cents, and so on, each feeling they have a chance at something good on the cheap. The early stages are fun, and the bidders wonder what possessed the professor to be willing to lose some money. The problem surfaces when the bidders get up close to a dollar. After 99 cents the last vestige of profitability disappears, but the bidding continues between the two highest players. They now realize that they stand

6 0.99079448 63 brendan oconnor ai-2007-06-10-Freak-Freakonomics (Ariel Rubinstein is the shit!)

7 0.98371589 152 brendan oconnor ai-2009-09-08-Another R flashmob today

8 0.9828999 123 brendan oconnor ai-2008-11-12-Disease tracking with web queries and social messaging (Google, Twitter, Facebook…)

9 0.80803519 86 brendan oconnor ai-2007-12-20-Data-driven charity

10 0.80633926 138 brendan oconnor ai-2009-04-17-1 billion web page dataset from CMU

11 0.76788652 53 brendan oconnor ai-2007-03-15-Feminists, anarchists, computational complexity, bounded rationality, nethack, and other things to do

12 0.74337679 6 brendan oconnor ai-2005-06-25-idea: Morals are heuristics for socially optimal behavior

13 0.74271941 130 brendan oconnor ai-2008-12-18-Information cost and genocide

14 0.73398489 80 brendan oconnor ai-2007-10-31-neo institutional economic fun!

15 0.7292521 188 brendan oconnor ai-2012-10-02-Powerset’s natural language search system

16 0.69305682 125 brendan oconnor ai-2008-11-21-Netflix Prize

17 0.69193166 84 brendan oconnor ai-2007-11-26-How did Freud become a respected humanist?!

18 0.69032311 200 brendan oconnor ai-2013-09-13-Response on our movie personas paper

19 0.68949246 129 brendan oconnor ai-2008-12-03-Statistics vs. Machine Learning, fight!

20 0.68353045 179 brendan oconnor ai-2012-02-02-Histograms — matplotlib vs. R