andrew_gelman_stats andrew_gelman_stats-2012 andrew_gelman_stats-2012-1098 knowledge-graph by maker-knowledge-mining

1098 andrew gelman stats-2012-01-04-Bayesian Page Rank?


meta infos for this blog

Source: html

Introduction: Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. If you are not familiar with PageRank, it is the basis for how Google ranks their pages. It basically treats the internet as a large social network with each link conferring some value onto the page it links to. For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. For example, in the attached spreadsheet, Column D shows e


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. [sent-1, score-0.112]

2 If you are not familiar with PageRank, it is the basis for how Google ranks their pages. [sent-2, score-0.042]

3 It basically treats the internet as a large social network with each link conferring some value onto the page it links to. [sent-3, score-0.502]

4 For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. [sent-4, score-0.135]

5 However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. [sent-5, score-0.462]

6 The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. [sent-6, score-0.926]

7 For example, in the attached spreadsheet, Column D shows each node while Column E shows the probability that a random surfer will land on the node. [sent-7, score-0.941]

8 Columns F through H are used to calculate the PageRank, Columns K and L are the links (with Column K linking to Column L), while Column M is also used to calculate the PageRank. [sent-8, score-0.325]

9 There is a macro in the workbook that is activated by pressing “Control-Shift-P” that will copy Column H to Column E 100 times, which is usually more than enough iterations for the PageRank to converge. [sent-9, score-0.217]

10 You can change the links around or add nodes if you like to play around with the spreadsheet. [sent-10, score-0.36]

11 The only requirements are that each node links to at least one other node (not a true PageRank requirement, but it keeps that math easy) and that the initial values in Column E equal 1 (usually simply set to 1/N). [sent-11, score-0.931]

12 The dampening factor (Cell B3) represents the chance that a random surfer will jump to another page at random rather than follow a link from the current node. [sent-12, score-0.873]

13 In this specific example, a random surfer has a 38. [sent-14, score-0.423]

14 2% chance to land on node A at any given time (Node A’s PageRank from Column E) and only a 3. [sent-15, score-0.524]

15 It would be helpful to also see what your thoughts would be on dropping and adding nodes in between the periods as well. [sent-19, score-0.277]

16 My reply: I’d start by estimating the probabilities using a simple (non-network) model, getting posterior simulations from this Bayesian inference, then running your Page Rank algorithm separately for each simulation. [sent-20, score-0.253]

17 You can then use the output from these simulations to summarize your uncertainty. [sent-21, score-0.061]

18 It’s not perfect—you’re not using the network model to get your uncertainties—but it seems like a reasonable start. [sent-22, score-0.121]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('pagerank', 0.637), ('node', 0.383), ('column', 0.24), ('nodes', 0.239), ('surfer', 0.233), ('season', 0.15), ('links', 0.121), ('network', 0.121), ('dampening', 0.117), ('algorithm', 0.112), ('random', 0.101), ('page', 0.099), ('google', 0.096), ('specific', 0.089), ('land', 0.082), ('probabilities', 0.08), ('columns', 0.075), ('calculate', 0.068), ('link', 0.068), ('linking', 0.068), ('webpage', 0.067), ('simulations', 0.061), ('chance', 0.059), ('represents', 0.056), ('bayesian', 0.055), ('friend', 0.055), ('however', 0.054), ('pages', 0.053), ('loren', 0.053), ('workbook', 0.053), ('probability', 0.05), ('scheduling', 0.05), ('basically', 0.049), ('landing', 0.048), ('unsure', 0.048), ('associated', 0.047), ('pressing', 0.046), ('shows', 0.046), ('values', 0.044), ('envision', 0.044), ('treats', 0.044), ('certain', 0.043), ('list', 0.042), ('ranks', 0.042), ('macro', 0.04), ('uncertainties', 0.04), ('usually', 0.04), ('current', 0.039), ('iterations', 0.038), ('dropping', 0.038)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.99999988 1098 andrew gelman stats-2012-01-04-Bayesian Page Rank?

Introduction: Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. If you are not familiar with PageRank, it is the basis for how Google ranks their pages. It basically treats the internet as a large social network with each link conferring some value onto the page it links to. For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. For example, in the attached spreadsheet, Column D shows e

2 0.10165575 1237 andrew gelman stats-2012-03-30-Statisticians: When We Teach, We Don’t Practice What We Preach

Introduction: My new Chance ethics column (cowritten with Eric Loken). Click through and take a look. It’s a short article and I really like it. And here’s more Chance.

3 0.098768473 1117 andrew gelman stats-2012-01-13-What are the important issues in ethics and statistics? I’m looking for your input!

Introduction: I’ve recently started a regular column on ethics, appearing every three months in Chance magazine . My first column, “Open Data and Open Methods,” is here , and my second column, “Statisticians: When we teach, we don’t practice what we preach” (coauthored with Eric Loken) will be appearing in the next issue. Statistical ethics is a wide-open topic, and I’d be very interested in everyone’s thoughts, questions, and stories. I’d like to get beyond generic questions such as, Is it right to do a randomized trial when you think the treatment is probably better than the control?, and I’d also like to avoid the really easy questions such as, Is it ethical to copy Wikipedia entries and then sell the resulting publication for $2800 a year? [Note to people who are sick of hearing about this particular story: I'll consider stopping my blogging on it, the moment that the people involved consider apologizing for their behavior.] Please insert your thoughts, questions, stories, links, et

4 0.098415293 1807 andrew gelman stats-2013-04-17-Data problems, coding errors…what can be done?

Introduction: This post is by Phil A recent post on this blog discusses a prominent case of an Excel error leading to substantially wrong results from a statistical analysis. Excel is notorious for this because it is easy to add a row or column of data (or intermediate results) but forget to update equations so that they correctly use the new data. That particular error is less common in a language like R because R programmers usually refer to data by variable name (or by applying functions to a named variable), so the same code works even if you add or remove data. Still, there is plenty of opportunity for errors no matter what language one uses. Andrew ran into problems fairly recently, and also blogged about another instance. I’ve never had to retract a paper, but that’s partly because I haven’t published a whole lot of papers. Certainly I have found plenty of substantial errors pretty late in some of my data analyses, and I obviously don’t have sufficient mechanisms in place to be sure

5 0.096969381 857 andrew gelman stats-2011-08-17-Bayes pays

Introduction: George Leckie writes: The Centre for Multilevel Modelling at the University of Bristol is seeking to appoint an applied statistician to work on a new ESRC-funded project, Longitudinal Effects, Multilevel Modelling and Applications (LEMMA 3). LEMMA 3 is one of six Nodes of the National Centre for Research Methods (NCRM). The LEMMA 3 Node will focus on methods for the analysis of longitudinal data. The appointment, at Research Assistant or Research Associate level, will be for 2.5 years with likelihood of extension to the end of September 2014. For further details, including information on how to apply online, please go to http://www.bris.ac.uk/boris/jobs/feeds/ads?ID=100571 By “modelling,” I think he means “modeling.” And by “centre,” I think he means “center.” But I think you get the basic idea. It looks like a great place to do research.

6 0.091226861 434 andrew gelman stats-2010-11-28-When Small Numbers Lead to Big Errors

7 0.088771448 1871 andrew gelman stats-2013-05-27-Annals of spam

8 0.085584119 1760 andrew gelman stats-2013-03-12-Misunderstanding the p-value

9 0.078622624 625 andrew gelman stats-2011-03-23-My last post on albedo, I promise

10 0.077207975 861 andrew gelman stats-2011-08-19-Will Stan work well with 40×40 matrices?

11 0.072331838 2107 andrew gelman stats-2013-11-20-NYT (non)-retraction watch

12 0.071660258 911 andrew gelman stats-2011-09-15-More data tools worth using from Google

13 0.071047485 859 andrew gelman stats-2011-08-18-Misunderstanding analysis of covariance

14 0.070219368 1830 andrew gelman stats-2013-04-29-Giving credit where due

15 0.067885175 1228 andrew gelman stats-2012-03-25-Continuous variables in Bayesian networks

16 0.065871939 752 andrew gelman stats-2011-06-08-Traffic Prediction

17 0.064833269 966 andrew gelman stats-2011-10-20-A qualified but incomplete thanks to Gregg Easterbrook’s editor at Reuters

18 0.063054867 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

19 0.059873823 1240 andrew gelman stats-2012-04-02-Blogads update

20 0.057852536 1374 andrew gelman stats-2012-06-11-Convergence Monitoring for Non-Identifiable and Non-Parametric Models


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.101), (1, 0.033), (2, -0.003), (3, 0.015), (4, 0.014), (5, 0.013), (6, 0.017), (7, -0.022), (8, 0.019), (9, -0.049), (10, -0.003), (11, -0.028), (12, 0.003), (13, -0.008), (14, -0.008), (15, 0.059), (16, 0.02), (17, 0.013), (18, -0.003), (19, 0.008), (20, 0.008), (21, 0.047), (22, 0.055), (23, -0.013), (24, -0.007), (25, 0.006), (26, 0.008), (27, 0.036), (28, 0.014), (29, -0.023), (30, -0.011), (31, 0.002), (32, 0.009), (33, 0.004), (34, 0.005), (35, -0.033), (36, 0.014), (37, -0.031), (38, 0.006), (39, 0.004), (40, 0.057), (41, 0.001), (42, 0.045), (43, 0.045), (44, -0.033), (45, -0.006), (46, -0.007), (47, 0.031), (48, -0.012), (49, -0.007)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.95931822 1098 andrew gelman stats-2012-01-04-Bayesian Page Rank?

Introduction: Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. If you are not familiar with PageRank, it is the basis for how Google ranks their pages. It basically treats the internet as a large social network with each link conferring some value onto the page it links to. For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. For example, in the attached spreadsheet, Column D shows e

2 0.73150605 1240 andrew gelman stats-2012-04-02-Blogads update

Introduction: A few months ago I reported on someone who wanted to insert text links into the blog. I asked her how much they would pay and got no answer. Yesterday, though, I received this reply: Hello Andrew, I am sorry for the delay in getting back to you. I’d like to make a proposal for your site. Please refer below. We would like to place a simple text link ad on page http://andrewgelman.com/2011/07/super_sam_fuld/ to link to *** with the key phrase ***. We will incorporate the key phrase into a sentence so it would read well. Rest assured it won’t sound obnoxious or advertorial. We will then process the final text link code as soon as you agree to our proposal. We can offer you $200 for this with the assumption that you will keep the link “live” on that page for 12 months or longer if you prefer. Please get back to us with a quick reply on your thoughts on this and include your Paypal ID for payment process. Hoping for a positive response from you. I wrote back: Hi,

3 0.66275525 1871 andrew gelman stats-2013-05-27-Annals of spam

Introduction: I received the following email, subject line “Want to Buy Text Link from andrewgelman.com”: Dear, I am Mary Taylor. I have started a link building campaign for my growing websites. For this, I need your cooperation. The campaign is quite diverse and large scale and if you take some time to understand it – it will benefit us. First I want to clarify that I do not want “blogroll” ”footer” or any other type of “site wide links”. Secondly I want links from inner pages of site – with good page rank of course. Third links should be within text so that Google may not mark them as spam – not for you and not for me. Hence this link building will cause almost no harm to your site or me. Because content links are fine with Google. Now I should come to the requirements. I will accept links from Page Rank 3 to as high as you have got. Also kindly note that I can buy 1 to 50 links from one site – so you should understand the scale of the project. If you have multiple sites with co

4 0.65230852 859 andrew gelman stats-2011-08-18-Misunderstanding analysis of covariance

Introduction: Jeremy Miles writes: Are you familiar with Miller and Chapman’s (2001) article : Misunderstanding Analysis of Covariance saying that ANCOVA (and therefore, I suppose regression) should not be used when groups differ on a covariate. It has caused a moderate splash in psychology circles. I wondered if you had any thoughts on it. I had not heard of the article so I followed the link . . . ugh! Already on the very first column of the very first page they confuse nonadditivity with nonlinearity. I could probably continue with, “and it gets worse,” but since nobody’s paying me to read this one, I’ll stop reading right there on the first page! I prefer when people point me to good papers to read. . . .

5 0.64771712 1080 andrew gelman stats-2011-12-24-Latest in blog advertising

Introduction: I received the following message from “Patricia Lopez” of “Premium Link Ads”: Hello, I am interested in placing a text link on your page: http://andrewgelman.com/2011/07/super_sam_fuld/. The link would point to a page on a website that is relevant to your page and may be useful to your site visitors. We would be happy to compensate you for your time if it is something we are able to work out. The best way to reach me is through a direct response to this email. This will help me get back to you about the right link request. Please let me know if you are interested, and if not thanks for your time. Thanks. Usually I just ignore these, but after our recent discussion I decided to reply. I wrote: How much do you pay? But no answer. I wonder what’s going on? I mean, why bother sending the email in the first place if you’re not going to follow up?

6 0.64052558 1237 andrew gelman stats-2012-03-30-Statisticians: When We Teach, We Don’t Practice What We Preach

7 0.60203201 939 andrew gelman stats-2011-10-03-DBQQ rounding for labeling charts and communicating tolerances

8 0.59253138 566 andrew gelman stats-2011-02-09-The boxer, the wrestler, and the coin flip, again

9 0.58614939 631 andrew gelman stats-2011-03-28-Explaining that plot.

10 0.58195859 1026 andrew gelman stats-2011-11-25-Bayes wikipedia update

11 0.58015937 2054 andrew gelman stats-2013-10-07-Bing is preferred to Google by people who aren’t like me

12 0.57825339 347 andrew gelman stats-2010-10-17-Getting arm and lme4 running on the Mac

13 0.57574987 69 andrew gelman stats-2010-06-04-A Wikipedia whitewash

14 0.57112122 246 andrew gelman stats-2010-08-31-Somewhat Bayesian multilevel modeling

15 0.56719321 473 andrew gelman stats-2010-12-17-Why a bonobo won’t play poker with you

16 0.56705517 1907 andrew gelman stats-2013-06-20-Amazing retro gnu graphics!

17 0.56281656 138 andrew gelman stats-2010-07-10-Creating a good wager based on probability estimates

18 0.55729491 562 andrew gelman stats-2011-02-06-Statistician cracks Toronto lottery

19 0.55471778 135 andrew gelman stats-2010-07-09-Rasmussen sez: “108% of Respondents Say . . .”

20 0.55056852 1830 andrew gelman stats-2013-04-29-Giving credit where due


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(9, 0.028), (15, 0.022), (16, 0.115), (21, 0.03), (24, 0.092), (27, 0.026), (45, 0.029), (69, 0.01), (77, 0.021), (86, 0.019), (88, 0.237), (95, 0.035), (99, 0.194)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.89331639 1098 andrew gelman stats-2012-01-04-Bayesian Page Rank?

Introduction: Loren Maxwell writes: I am trying to do some studies on the PageRank algorithm with applying a Bayesian technique. If you are not familiar with PageRank, it is the basis for how Google ranks their pages. It basically treats the internet as a large social network with each link conferring some value onto the page it links to. For example, if I had a webpage that had only one link to it, say from my friend’s webpage, then its PageRank would be dependent on my friend’s PageRank, presumably quite low. However, if the one link to my page was off the Google search page, then my PageRank would be quite high since there are undoubtedly millions of pages linking to Google and few pages that Google links to. The end result of the algorithm, however, is that all the PageRank values of the nodes in the network sum to one and the PageRank of a specific node is the probability that a “random surfer” will end up on that node. For example, in the attached spreadsheet, Column D shows e

2 0.88668638 1174 andrew gelman stats-2012-02-18-Not as ugly as you look

Introduction: Kaiser asks the interesting question: How do you measure what restaurants are “overrated”? You can’t just ask people, right? There’s some sort of social element here, that “overrated” implies that someone’s out there doing the rating.

3 0.84892923 290 andrew gelman stats-2010-09-22-Data Thief

Introduction: John Transue sends along a link to this software for extracting data from graphs. I haven’t tried it out but it could be useful to somebody out there?

4 0.8290273 1992 andrew gelman stats-2013-08-21-Workshop for Women in Machine Learning

Introduction: This might interest some of you: CALL FOR ABSTRACTS Workshop for Women in Machine Learning Co-located with NIPS 2013, Lake Tahoe, Nevada, USA December 5, 2013 http://www.wimlworkshop.org Deadline for abstract submissions: September 16, 2013 WORKSHOP DESCRIPTION The Workshop for Women in Machine Learning is a day-long event taking place on the first day of NIPS. The workshop aims to showcase the research of women in machine learning and to strengthen their community. The event brings together female faculty, graduate students, and research scientists for an opportunity to connect, exchange ideas, and learn from each other. Underrepresented minorities and undergraduates interested in pursuing machine learning research are encouraged to participate. While all presenters will be female, all genders are invited to attend. Scholarships will be provided to female students and postdoctoral attendees with accepted abstracts to partially offset travel costs. Workshop

5 0.78049719 1507 andrew gelman stats-2012-09-22-Grade inflation: why weren’t the instructors all giving all A’s already??

Introduction: My upstairs colleague Blattman writes : The trend is unsurprising. Schools have every incentive to move to the highest four or five piles [grades] possible. . . . Then grade inflation will stop because . . there will be nowhere to go. . . . So why resist the new equilibrium? I don’t have any argument for resisting, but I don’t think everything’s quite so simple as Chris is saying. First, you can easily get compression so that almost everyone gets the same grade (“A”). So, no four or five piles. Second, the incentives for grading high have been there for awhile. To me, the more interesting question is: how is it that grade inflation hasn’t gone faster? Why would it take so many decades to reach a natural and obvious equilibrium? Here’s what I wrote last year : As a teacher who, like many others, assigns grades in an unregulated environment (that is, we have no standardized tests and no rules on how we should grade), all the incentives to toward giving only A’s: Wh

6 0.7742312 136 andrew gelman stats-2010-07-09-Using ranks as numbers

7 0.77275109 1866 andrew gelman stats-2013-05-21-Recently in the sister blog

8 0.76967669 569 andrew gelman stats-2011-02-12-Get the Data

9 0.76657987 629 andrew gelman stats-2011-03-26-Is it plausible that 1% of people pick a career based on their first name?

10 0.76520717 2095 andrew gelman stats-2013-11-09-Typo in Ghitza and Gelman MRP paper

11 0.76139665 825 andrew gelman stats-2011-07-27-Grade inflation: why weren’t the instructors all giving all A’s already??

12 0.7538377 400 andrew gelman stats-2010-11-08-Poli sci plagiarism update, and a note about the benefits of not caring

13 0.75279033 1403 andrew gelman stats-2012-07-02-Moving beyond hopeless graphics

14 0.75240612 1930 andrew gelman stats-2013-07-09-Symposium Magazine

15 0.75203437 1633 andrew gelman stats-2012-12-21-Kahan on Pinker on politics

16 0.73887521 784 andrew gelman stats-2011-07-01-Weighting and prediction in sample surveys

17 0.7346825 603 andrew gelman stats-2011-03-07-Assumptions vs. conditions, part 2

18 0.73375946 1631 andrew gelman stats-2012-12-19-Steven Pinker is a psychologist who writes on politics. His theories are interesting but are framed too universally to be valid

19 0.71981156 1980 andrew gelman stats-2013-08-13-Test scores and grades predict job performance (but maybe not at Google)

20 0.71128583 1871 andrew gelman stats-2013-05-27-Annals of spam