andrew_gelman_stats andrew_gelman_stats-2011 andrew_gelman_stats-2011-527 knowledge-graph by maker-knowledge-mining

527 andrew gelman stats-2011-01-20-Cars vs. trucks


meta infos for this blog

Source: html

Introduction: Anupam Agrawal writes: I am an Assistant Professor of Operations Management at the University of Illinois. . . . My main work is in supply chain area, and empirical in nature. . . . I am working with a firm that has two separate divisions – one making cars, and the other makes trucks. Four years back, the firm made an interesting organizational change. They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). This group was focused on improving supplier quality and reported to car plant head . The truck division did not (and still does not) have such an independent “supplier improvement group”. Other than this unit in car, the organizational arrangements in the two divisions mimic each other. There are many common suppliers to the car and truck division. Data on quality of components coming from suppliers has been collected (for the last four years). The organizational change happened in January 2007. My focus is


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 I am working with a firm that has two separate divisions – one making cars, and the other makes trucks. [sent-9, score-0.459]

2 Four years back, the firm made an interesting organizational change. [sent-10, score-0.522]

3 They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). [sent-11, score-0.956]

4 This group was focused on improving supplier quality and reported to car plant head . [sent-12, score-1.062]

5 The truck division did not (and still does not) have such an independent “supplier improvement group”. [sent-13, score-0.467]

6 Other than this unit in car, the organizational arrangements in the two divisions mimic each other. [sent-14, score-0.646]

7 There are many common suppliers to the car and truck division. [sent-15, score-1.095]

8 Data on quality of components coming from suppliers has been collected (for the last four years). [sent-16, score-0.669]

9 My focus is to see whether organizational change (and a different organizational structure) drives improvements. [sent-18, score-0.868]

10 My hypothesis is that this changed structure in car strengthened supplier trust (my interviews with suppliers point to this) which helped in improving quality for car (but not for truck, even for the same supplier). [sent-19, score-1.93]

11 For analyzing this, I was thinking of a difference-in-differences analysis between the quality data of these two divisions. [sent-20, score-0.328]

12 The organizational change in the car division is similar to a quasi-experiment. [sent-21, score-0.995]

13 So truck division is not a good “matched” control. [sent-23, score-0.467]

14 An omitted variable drives both adoption and performance on quality, resulting in omitted variable bias. [sent-24, score-0.578]

15 The common suppliers also create contamination in the dependent variable, quality. [sent-25, score-0.563]

16 So, given the data, and the quasi experimental setting, which kind of analysis is best suitable? [sent-26, score-0.146]

17 Also, is this suggestion of combining two kinds of analysis a good one? [sent-27, score-0.133]

18 I have read papers that suggest this (Blundell and Dias, 2000), but I am apprehensive since my setting does not quite match with that of the papers that suggest this analysis. [sent-28, score-0.437]

19 I do have a separate division (of the same firm) , and there are common suppliers, so how can Propensity score provide more information? [sent-29, score-0.489]

20 I agree with your colleagues that difference-in-difference analysis is not enough and will not necessarily do a good job of adjusting or imbalance or lack of complete overlap. [sent-33, score-0.248]


similar blogs computed by tfidf model

tfidf for this blog:

wordName wordTfidf (topN-words)

[('suppliers', 0.41), ('car', 0.355), ('organizational', 0.342), ('supplier', 0.29), ('truck', 0.245), ('division', 0.222), ('quality', 0.195), ('firm', 0.18), ('divisions', 0.119), ('omitted', 0.119), ('drives', 0.108), ('propensity', 0.107), ('separate', 0.105), ('engineers', 0.1), ('matching', 0.091), ('variable', 0.088), ('improving', 0.086), ('common', 0.085), ('match', 0.082), ('group', 0.079), ('analysis', 0.078), ('score', 0.077), ('change', 0.076), ('strengthened', 0.072), ('apprehensive', 0.072), ('structure', 0.072), ('quasi', 0.068), ('contamination', 0.068), ('arrangements', 0.065), ('mimic', 0.065), ('four', 0.064), ('setting', 0.063), ('suggest', 0.061), ('colleagues', 0.06), ('launch', 0.06), ('imbalance', 0.06), ('plant', 0.057), ('adoption', 0.056), ('two', 0.055), ('operations', 0.052), ('matched', 0.051), ('vs', 0.051), ('suitable', 0.051), ('adjusting', 0.05), ('papers', 0.049), ('interviews', 0.048), ('assistant', 0.047), ('cars', 0.047), ('helped', 0.047), ('january', 0.047)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 1.0 527 andrew gelman stats-2011-01-20-Cars vs. trucks

Introduction: Anupam Agrawal writes: I am an Assistant Professor of Operations Management at the University of Illinois. . . . My main work is in supply chain area, and empirical in nature. . . . I am working with a firm that has two separate divisions – one making cars, and the other makes trucks. Four years back, the firm made an interesting organizational change. They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). This group was focused on improving supplier quality and reported to car plant head . The truck division did not (and still does not) have such an independent “supplier improvement group”. Other than this unit in car, the organizational arrangements in the two divisions mimic each other. There are many common suppliers to the car and truck division. Data on quality of components coming from suppliers has been collected (for the last four years). The organizational change happened in January 2007. My focus is

2 0.22811723 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?

Introduction: This entry was posted by Phil Price. A colleague is looking at data on car (and SUV and light truck) collisions and casualties. He’s interested in causal relationships. For instance, suppose car manufacturers try to improve gas mileage without decreasing acceleration. The most likely way they will do that is to make cars lighter. But perhaps lighter cars are more dangerous; how many more people will die for each mpg increase in gas mileage? There are a few different data sources, all of them seriously deficient from the standpoint of answering this question. Deaths are very well reported, so if someone dies in an auto accident you can find out what kind of car they were in, what other kinds of cars (if any) were involved in the accident, whether the person was a driver or passenger, and so on. But it’s hard to normalize: OK, I know that N people who were passengers in a particular model of car died in car accidents last year, but I don’t know how many passenger-miles that

3 0.1479236 720 andrew gelman stats-2011-05-20-Baby name wizards

Introduction: The other day I noticed a car with the improbable name of Nissan Rogue, from Darien, Connecticut (at least that’s what the license plate frame said). And, after all, what could be more “rogue”-like than a suburban SUV? I can’t blame the driver of the car for this one; I’m just amused that the marketers and Nissan thought this was an appropriate name for the car.

4 0.13409074 1417 andrew gelman stats-2012-07-15-Some decision analysis problems are pretty easy, no?

Introduction: Cassie Murdoch reports : A 47-year-old woman in Uxbridge, Massachusetts, got behind the wheel of her car after having a bit too much to drink, but instead of wreaking havoc on the road, she ended up lodged in a sand trap at a local golf course. Why? Because her GPS made her do it—obviously! She said the GPS told her to turn left, and she did, right into a cornfield. That didn’t faze her, and she just kept on going until she ended up on the golf course and got stuck in the sand. There were people on the course at the time, but thankfully nobody was injured. Police found a cup full of alcohol in her car and arrested her for driving drunk. Here’s the punchline: This is the fourth time she’s been arrested for a DUI. Assuming this story is accurate, I guess they don’t have one of those “three strikes” laws in Massachusetts? Personally, I’m a lot more afraid of a dangerous driver than of some drug dealer. I’d think a simple cost-benefit calculation would recommend taking away

5 0.11117993 375 andrew gelman stats-2010-10-28-Matching for preprocessing data for causal inference

Introduction: Chris Blattman writes : Matching is not an identification strategy a solution to your endogeneity problem; it is a weighting scheme. Saying matching will reduce endogeneity bias is like saying that the best way to get thin is to weigh yourself in kilos. The statement makes no sense. It confuses technique with substance. . . . When you run a regression, you control for the X you can observe. When you match, you are simply matching based on those same X. . . . I see what Chris is getting at–matching, like regression, won’t help for the variables you’re not controlling for–but I disagree with his characterization of matching as a weighting scheme. I see matching as a way to restrict your analysis to comparable cases. The statistical motivation: robustness. If you had a good enough model, you wouldn’t neet to match, you’d just fit the model to the data. But in common practice we often use simple regression models and so it can be helpful to do some matching first before regress

6 0.0918516 307 andrew gelman stats-2010-09-29-“Texting bans don’t reduce crashes; effects are slight crash increases”

7 0.09178327 2123 andrew gelman stats-2013-12-04-Tesla fires!

8 0.090416111 417 andrew gelman stats-2010-11-17-Clutering and variance components

9 0.083380118 940 andrew gelman stats-2011-10-03-It depends upon what the meaning of the word “firm” is.

10 0.082069919 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

11 0.077181876 1248 andrew gelman stats-2012-04-06-17 groups, 6 group-level predictors: What to do?

12 0.076693751 86 andrew gelman stats-2010-06-14-“Too much data”?

13 0.076624304 1912 andrew gelman stats-2013-06-24-Bayesian quality control?

14 0.076371796 213 andrew gelman stats-2010-08-17-Matching at two levels

15 0.075461812 1491 andrew gelman stats-2012-09-10-Update on Levitt paper on child car seats

16 0.07539171 796 andrew gelman stats-2011-07-10-Matching and regression: two great tastes etc etc

17 0.072631709 1184 andrew gelman stats-2012-02-25-Facebook Profiles as Predictors of Job Performance? Maybe…but not yet.

18 0.067495927 456 andrew gelman stats-2010-12-07-The red-state, blue-state war is happening in the upper half of the income distribution

19 0.067473285 736 andrew gelman stats-2011-05-29-Response to “Why Tables Are Really Much Better Than Graphs”

20 0.066381149 970 andrew gelman stats-2011-10-24-Bell Labs


similar blogs computed by lsi model

lsi for this blog:

topicId topicWeight

[(0, 0.115), (1, -0.002), (2, 0.012), (3, -0.032), (4, 0.027), (5, 0.02), (6, -0.009), (7, -0.013), (8, 0.013), (9, 0.045), (10, -0.006), (11, -0.0), (12, 0.014), (13, -0.028), (14, 0.014), (15, 0.013), (16, 0.025), (17, -0.013), (18, 0.025), (19, 0.009), (20, -0.008), (21, 0.043), (22, 0.01), (23, 0.0), (24, 0.003), (25, 0.002), (26, 0.002), (27, -0.01), (28, 0.022), (29, -0.002), (30, -0.006), (31, -0.004), (32, 0.018), (33, 0.019), (34, -0.025), (35, -0.006), (36, 0.018), (37, 0.045), (38, 0.011), (39, 0.054), (40, -0.021), (41, -0.046), (42, -0.019), (43, -0.016), (44, -0.003), (45, 0.013), (46, 0.025), (47, 0.002), (48, 0.038), (49, 0.04)]

similar blogs list:

simIndex simValue blogId blogTitle

same-blog 1 0.94798577 527 andrew gelman stats-2011-01-20-Cars vs. trucks

Introduction: Anupam Agrawal writes: I am an Assistant Professor of Operations Management at the University of Illinois. . . . My main work is in supply chain area, and empirical in nature. . . . I am working with a firm that has two separate divisions – one making cars, and the other makes trucks. Four years back, the firm made an interesting organizational change. They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). This group was focused on improving supplier quality and reported to car plant head . The truck division did not (and still does not) have such an independent “supplier improvement group”. Other than this unit in car, the organizational arrangements in the two divisions mimic each other. There are many common suppliers to the car and truck division. Data on quality of components coming from suppliers has been collected (for the last four years). The organizational change happened in January 2007. My focus is

2 0.75031132 708 andrew gelman stats-2011-05-12-Improvement of 5 MPG: how many more auto deaths?

Introduction: This entry was posted by Phil Price. A colleague is looking at data on car (and SUV and light truck) collisions and casualties. He’s interested in causal relationships. For instance, suppose car manufacturers try to improve gas mileage without decreasing acceleration. The most likely way they will do that is to make cars lighter. But perhaps lighter cars are more dangerous; how many more people will die for each mpg increase in gas mileage? There are a few different data sources, all of them seriously deficient from the standpoint of answering this question. Deaths are very well reported, so if someone dies in an auto accident you can find out what kind of car they were in, what other kinds of cars (if any) were involved in the accident, whether the person was a driver or passenger, and so on. But it’s hard to normalize: OK, I know that N people who were passengers in a particular model of car died in car accidents last year, but I don’t know how many passenger-miles that

3 0.73976398 1017 andrew gelman stats-2011-11-18-Lack of complete overlap

Introduction: Evens Salies writes: I have a question regarding a randomizing constraint in my current funded electricity experiment. After elimination of missing data we have 110 voluntary households from a larger population (resource constraints do not allow us to have more households!). I randomly assign them to threated and non treated where the treatment variable is some ICT that allows the treated to track their electricity consumption in real tim. The ICT is made of two devices, one that is plugged on the household’s modem and the other on the electric meter. A necessary condition for being treated is that the distance between the box and the meter be below some threshold (d), the value of which is 20 meters approximately. 50 ICTs can be installed. 60 households will be in the control group. But, I can only assign 6 households in the control group for whom d is less than 20. Therefore, I have only 6 households in the control group who have a counterfactual in the group of treated.

4 0.72628105 1330 andrew gelman stats-2012-05-19-Cross-validation to check missing-data imputation

Introduction: Aureliano Crameri writes: I have questions regarding one technique you and your colleagues described in your papers: the cross validation (Multiple Imputation with Diagnostics (mi) in R: Opening Windows into the Black Box, with reference to Gelman, King, and Liu, 1998). I think this is the technique I need for my purpose, but I am not sure I understand it right. I want to use the multiple imputation to estimate the outcome of psychotherapies based on longitudinal data. First I have to demonstrate that I am able to get unbiased estimates with the multiple imputation. The expected bias is the overestimation of the outcome of dropouts. I will test my imputation strategies by means of a series of simulations (delete values, impute, compare with the original). Due to the complexity of the statistical analyses I think I need at least 200 cases. Now I don’t have so many cases without any missings. My data have missing values in different variables. The proportion of missing values is

5 0.71871692 1703 andrew gelman stats-2013-02-02-Interaction-based feature selection and classification for high-dimensional biological data

Introduction: Ilya Esteban writes: In your blog your advice for performing regression in the presence of large numbers of correlated features, has been to use composite scores and hierarchical modeling. Unfortunately, many problems don’t provide an obvious and unambiguous way of grouping features together (e.g. gene expression data). Are there any techniques that you would recommend that automatically pool correlated features together based on the data, without requiring the researcher to manually define composite scores or feature hierarchies? I don’t know the answer to this but I imagine something is possible . . . any ideas? In the meantime I’m reminded of this recent article by Shaw-Hwa Lo, Haitian Wang, Tian Zheng, and Inchi Hu: Recent high-throughput biological studies successfully identified thousands of risk factors associated with common human dis- eases. Most of these studies used single-variable method and each variable is analyzed individually. The risk factors so identi

6 0.69029737 791 andrew gelman stats-2011-07-08-Censoring on one end, “outliers” on the other, what can we do with the middle?

7 0.6849708 86 andrew gelman stats-2010-06-14-“Too much data”?

8 0.67039883 1910 andrew gelman stats-2013-06-22-Struggles over the criticism of the “cannabis users and IQ change” paper

9 0.67031014 1121 andrew gelman stats-2012-01-15-R-squared for multilevel models

10 0.66807723 14 andrew gelman stats-2010-05-01-Imputing count data

11 0.66801858 287 andrew gelman stats-2010-09-20-Paul Rosenbaum on those annoying pre-treatment variables that are sort-of instruments and sort-of covariates

12 0.66703582 2249 andrew gelman stats-2014-03-15-Recently in the sister blog

13 0.66306472 770 andrew gelman stats-2011-06-15-Still more Mr. P in public health

14 0.66134548 32 andrew gelman stats-2010-05-14-Causal inference in economics

15 0.66035849 53 andrew gelman stats-2010-05-26-Tumors, on the left, or on the right?

16 0.6594367 465 andrew gelman stats-2010-12-13-$3M health care prediction challenge

17 0.65873206 553 andrew gelman stats-2011-02-03-is it possible to “overstratify” when assigning a treatment in a randomized control trial?

18 0.65300977 569 andrew gelman stats-2011-02-12-Get the Data

19 0.65270931 2190 andrew gelman stats-2014-01-29-Stupid R Tricks: Random Scope

20 0.65210342 245 andrew gelman stats-2010-08-31-Predicting marathon times


similar blogs computed by lda model

lda for this blog:

topicId topicWeight

[(2, 0.03), (9, 0.013), (15, 0.016), (16, 0.07), (21, 0.033), (24, 0.103), (31, 0.03), (36, 0.024), (38, 0.195), (47, 0.021), (56, 0.014), (79, 0.014), (85, 0.011), (86, 0.044), (99, 0.231)]

similar blogs list:

simIndex simValue blogId blogTitle

1 0.9525097 1874 andrew gelman stats-2013-05-28-Nostalgia

Introduction: Saw Argo the other day, was impressed by the way it was filmed in such a 70s style, sorta like that movie The Limey or an episode of the Rockford Files. I also felt nostalgia for that relatively nonviolent era. All those hostages and nobody was killed. It’s a good thing the Ayatollah didn’t have some fundamentalist Shiite equivalent of John Yoo telling him to waterboard everybody. At the time we were all so angry and upset about the hostage-taking, but from the perspective of our suicide-bomber era, that whole hostage episode seems so comfortingly mild.

2 0.93570155 393 andrew gelman stats-2010-11-04-Estimating the effect of A on B, and also the effect of B on A

Introduction: Lei Liu writes: I am working with clinicians in infectious disease and international health to study the (possible causal) relation between malnutrition and virus infection episodes (e.g., diarrhea) in babies in developing countries. Basically the clinicians are interested in two questions: does malnutrition cause more diarrhea episodes? does diarrhea lead to malnutrition? The malnutrition status is indicated by height and weight (adjusted, HAZ and WAZ measures) observed every 3 months from birth to 1 year. They also recorded the time of each diarrhea episode during the 1 year follow-up period. They have very solid datasets for analysis. As you can see, this is almost like a chicken and egg problem. I am a layman to causal inference. The method I use is just to do some simple regression. For example, to study the causal relation from malnutrition to diarrhea episodes, I use binary variable (diarrhea yes/no during months 0-3) as response, and use the HAZ at month 0 as covariate

same-blog 3 0.90902042 527 andrew gelman stats-2011-01-20-Cars vs. trucks

Introduction: Anupam Agrawal writes: I am an Assistant Professor of Operations Management at the University of Illinois. . . . My main work is in supply chain area, and empirical in nature. . . . I am working with a firm that has two separate divisions – one making cars, and the other makes trucks. Four years back, the firm made an interesting organizational change. They created a separate group of ~25 engineers, in their car division (from within their quality and production engineers). This group was focused on improving supplier quality and reported to car plant head . The truck division did not (and still does not) have such an independent “supplier improvement group”. Other than this unit in car, the organizational arrangements in the two divisions mimic each other. There are many common suppliers to the car and truck division. Data on quality of components coming from suppliers has been collected (for the last four years). The organizational change happened in January 2007. My focus is

4 0.88487935 1073 andrew gelman stats-2011-12-20-Not quite getting the point

Introduction: I gave this talk the other day and afterwards, a white guy came up to me and said he thought it was no coincidence that the researcher who made the mistake was “Oriental.” He then went on for about 5 minutes explaining his theory. I couldn’t keep myself from laughing—I had to start coughing into a napkin to hide it.

5 0.88356137 1498 andrew gelman stats-2012-09-16-Choices in graphing parallel time series

Introduction: I saw this graph posted by Tyler Cowen: and my first thought was that the bar plot should be replaced by a line plot: Six lines, one for each income category, with each line being a time series of these changes. With a line plot, you can more easily see each time series (these are hard to see in the barplot because you have to follow each color and jump from decade to decade) and also compare the patterns for each category. The line plot pretty much dominates the bar plot. At least that was the theory. Now here’s what actually happened. I downloaded the data as Excel files, saved them as csv, then read them into R. In all, it took close to an hour to get the data set up in the format that was needed to make the graphs. At this point it was pretty easy to make the line plot. But the result was disappointing: The six lines are hard to untangle (sure, a better color scheme might help, but it wouldn’t really solve the problem) and the graph as a whole is much l

6 0.8706277 251 andrew gelman stats-2010-09-02-Interactions of predictors in a causal model

7 0.85607433 600 andrew gelman stats-2011-03-04-“Social Psychologists Detect Liberal Bias Within”

8 0.84904164 658 andrew gelman stats-2011-04-11-Statistics in high schools: Towards more accessible conceptions of statistical inference

9 0.83342004 717 andrew gelman stats-2011-05-17-Statistics plagiarism scandal

10 0.8125453 1032 andrew gelman stats-2011-11-28-Does Avastin work on breast cancer? Should Medicare be paying for it?

11 0.80204654 509 andrew gelman stats-2011-01-09-Chartjunk, but in a good cause!

12 0.80019796 1722 andrew gelman stats-2013-02-14-Statistics for firefighters: update

13 0.79931259 1339 andrew gelman stats-2012-05-23-Learning Differential Geometry for Hamiltonian Monte Carlo

14 0.79851675 48 andrew gelman stats-2010-05-23-The bane of many causes

15 0.79514849 2281 andrew gelman stats-2014-04-04-The Notorious N.H.S.T. presents: Mo P-values Mo Problems

16 0.79420233 703 andrew gelman stats-2011-05-10-Bringing Causal Models Into the Mainstream

17 0.79108936 18 andrew gelman stats-2010-05-06-$63,000 worth of abusive research . . . or just a really stupid waste of time?

18 0.78986913 1218 andrew gelman stats-2012-03-18-Check your missing-data imputations using cross-validation

19 0.78935325 2137 andrew gelman stats-2013-12-17-Replication backlash

20 0.78932202 1340 andrew gelman stats-2012-05-23-Question 13 of my final exam for Design and Analysis of Sample Surveys