Statistical Modeling, Causal Inference, and Social Science

A better way to visualize the spread of coronavirus in different countries?

Posted by Andrew on 10 April 2020, 9:01 am

Joel Elvery write:

Long-time listener, first-time caller. I’m an economist at the Federal Reserve Bank of Cleveland.

I think I have stumbled on to a very effective way to visualize and compare the trajectories of COVID-19 epidemics. This short post describes the approach and what we learn from it, but the graph above is enough to give you the gist.

I’m trying to get the word out to other people who are graphing COVID-19 data in case they also find this approach useful.

Continue reading ‘A better way to visualize the spread of coronavirus in different countries?’ »

Filed under Public Health, Statistical graphics.

64 Comments

Upholding the patriarchy, one blog post at a time

Posted by Andrew on 9 April 2020, 9:29 am

A white male writes:

Your recent post reminded me: partly because of your previous posts, I spent a fair amount of the last two years reading Updike, whom I’d never read before. It was time well spent. Thank you for mentioning him in your blog from time to time.

I find early Updike to be uniformly good, but later Updike novels are more uneven. I liked Rabbit is Rich and many of the later stories and criticism, but I tried to read Roger’s Version once and found it unreadable, even incompetently written.

Filed under Literature.

7 Comments

Big trouble coming with the 2020 Census

Posted by Andrew on 8 April 2020, 9:10 am

OK, first things first. For readers of this blog who live in the United States: Don’t forget to fill out your census. They’re doing in online, and you should’ve received a letter in the mail last month telling you how to do it.

And now the news. Dr. Z points us to this post by Diana Elliott and Robert Santos, “Unpredictable Residency during the COVID-19 Pandemic Spells Trouble for the 2020 Census Count.” Elliott and Santos write:

Just before lockdowns were implemented across the country, there was tremendous movement and migration of people relocating to different residences to shelter in place. This makes sense for the people involved but could be disastrous for the communities they fled and the final 2020 Census counts.

The 2020 Census, like most data collected by the US Census Bureau, is residence based. . . . Most residences across America have already received their 2020 Census invitation. Whether completed online, by paper, by phone, or in person, the first official question on the 2020 Census questionnaire is “How many people were living or staying in this house, apartment, or mobile home on April 1, 2020?” Households are expected to answer this based on the concept of “usual residence,” or the place where a person lives and sleeps most of the time.

Despite written guidance provided on the 2020 Census on how to answer this question, doing so may be wrought with complexities and nuance from the pandemic.

First, research reveals that respondents do not often read questionnaire instructions; they dive in and start answering. With many people scrambling to other counties, cities, and states to hunker down for the long haul with loved ones, this will lead to incorrect counts when people are counted at temporary addresses.

Second, for many, the concept of “usual residence” has little relevance in the uncertainty unfolding during the COVID-19 pandemic. What if your temporary address becomes your permanent address? What does “usual residence” mean during a global epidemic that could stretch for 18 months or more? And perhaps more importantly, what should it mean? . . .

The US Census Bureau must act. It will need more processing time to identify and remove duplicates in the returns—a phenomenon that occurs regardless of a pandemic—and will need to flag potential population spikes in certain communities. . . .

Unfortunately, communities face a zero-sum game for decennial population counts. Communities that gain population in 2020 because of the pandemic will reap the benefits of better funding and representation for the next decade. Communities with population loss will receive less than they deserve.

Every census count brings new challenges, some of which lead to miscounts that get brought to and battled over in court. Without a proactive approach to the 2020 Census that addresses these residency questions, the COVID-19 pandemic may be unintentionally inviting communities to wage contentious court battles over the accuracy of the count for years to come.

A few weeks earlier, Santos and Elliott had written a post, Is It Time to Postpone the 2020 Census?:

We know that COVID-19 testing in the US has proven inadequate, and community spread has now taken hold. The virus has spread to 45 of 50 states as of March 12, 2020, and it’s reported to be 10 times more lethal than influenza and much more contagious. . . . Although the decennial census is mandated by the Constitution, the extreme challenges raised by the pandemic may warrant an unprecedented delay to protect the census’s accuracy. These challenges include:

Difficulty finding and retaining enumerators . . .

Making hard-to-count populations even harder to count . . .

Lacking planning or protocols for conducting the census during a pandemic . . .

Should the census be postponed? Extended? Canceled? . . . Regardless of the option, it is hard to imagine that the 2020 Census could simply go on as scheduled. Some hard decisions face the US Census Bureau. The health of our democracy may be at stake.

We should’ve listened to them back on March 13th.

P.S. A commenter suggests they change the census form to clarify responses for people who are temporarily housed elsewhere because of coronavirus.

The problem is that the census form was already written, and I guess they don’t want to change the form in the middle of the Census. At this point they’d have to redo the whole thing, which I guess is what they should do, but that would be expensive. Also there are winners and losers from every change, and the winners from the current system might want to keep things as is. Finally, there are people outside and inside the government (but presumably not in the Census Bureau itself) who want to “drown the government in the bathtub” etc., and for them I guess it’s a plus if the census is a failure, as it will reduce legitimacy of future governmental actions. This is related to the War on Data that Palko and I wrote about a few years ago and which Palko reblogged recently.

Filed under Political Science.

84 Comments

Webinar on approximate Bayesian computation

Posted by Andrew on 7 April 2020, 11:26 pm

X points us to this online seminar series which is starting this Thursday! Some speakers and titles of talks are listed. I just wish I could click on the titles and see the abstracts and papers!

The seminar is at the University of Warwick in England, which is not so convenient—I seem to recall that to get there you have to take a train and then a bus, or something like that!—but it seems that they will be conducting the seminar remotely, which is both convenient and (nearly) carbon neutral.

Filed under Bayesian Statistics, Statistical computing.

13 Comments

“The Generalizability Crisis” in the human sciences

Posted by Andrew on 7 April 2020, 9:09 am

In an article called The Generalizability Crisis, Tal Yarkoni writes:

Most theories and hypotheses in psychology are verbal in nature, yet their evaluation overwhelmingly relies on inferential statistical procedures. The validity of the move from qualitative to quantitative analysis depends on the verbal and statistical expressions of a hypothesis being closely aligned—that is, that the two must refer to roughly the same set of hypothetical observations. Here I argue that most inferential statistical tests in psychology fail to meet this basic condition. I demonstrate how foundational assumptions of the “random effects” model used pervasively in psychology impose far stronger constraints on the generalizability of results than most researchers appreciate. Ignoring these constraints dramatically inflates false positive rates and routinely leads researchers to draw sweeping verbal generalizations that lack any meaningful connection to the statistical quantities they are putatively based on. I argue that failure to consider generalizability from a statistical perspective lies at the root of many of psychology’s ongoing problems (e.g., the replication crisis), and conclude with a discussion of several potential avenues for improvement.

I pretty much agree 100% with everything he writes in this article. These are issues we’ve been talking about for awhile, and Yarkoni offers a clear and coherent perspective. I only have two comments, and these are more a matter of emphasis than anything else.

1. Near the beginning of the article, Yarkoni writes of two ways of drawing scientific conclusions from statistical evidence:

The “fast” approach is liberal and incautious; it makes the default assumption that every observation can be safely generalized to other similar-seeming situations until such time as those generalizations are contradicted by new evidence. . . .

The “slow” approach is conservative, and adheres to the opposite default: an observed relationship is assumed to hold only in situations identical, or very similar to, the one in which it has already been observed. . . .

Yarkoni goes on to say that in modern psychology, it is standard to use the fast approach, that the fast approach gets attention and rewards, but that in general the fast approach is wrong, that instead we should be using the fast approach to generate conjectures but use the slow approach when trying to understand what we know.

I agree, and I also agree with Yarkoni’s technical argument that the slow approach corresponds to a multilevel model in which there are varying intercepts and slopes corresponding to experimental conditions, populations, etc. That is, if we are fitting the model y = a + b*x + error to data (x_i, y_i), i=1,…,n, we should think of this entire experiment as study j, with the model y = a_j + b_j*x + error, and different a_j, b_j for each potential study. To put it another way, a_j and b_j can be considered as functions of the experimental conditions and the mix of people in the experiment.

Or, to put it another way, we have an implicit multilevel model with predictors x at the individual level and other predictors at the group level that are implicit in the model for a, b. And we should be thinking about this multilevel model even when we only have data from a single experiment.

This is all related to the argument I’ve been making for awhile about “transportability” in inference, which in turn is related to an argument that Rubin and others have been making for decades about thinking of meta-analysis in terms of response surfaces.

To put it another way, all replications are conceptual replications.

So, yeah, these ideas have been around for awhile. On the other hand, as Yarkoni notes, standard practice is to not think about these issues at all and to just make absurdly general claims from absurdly specific experiments. Sometime it seems that the only thing that makes researchers aware of the “slow” approach is when someone fails to replicate one of their studies, at which point the authors suddenly remember all the conditions on generality that they somehow forgot to mention in their originally published work. (See here or an extreme case that really irritated me.) So Yarkoni’s paper could be serving a useful role even if all it did was remind us of the challenges of generalization. But the paper does more than that, in that it links this statistical idea with many different aspects of practice in psychology research.

That all said, there’s one way in which I disagree with Yarkoni’s characterization of scientific inferences as “fast” or “slow.” I agree with him that the “fast” approach is mistaken. But I think that even his “slow” approach can be too strong!

Here’s my concern. Yarkoni writes, “The ‘slow’ approach is conservative, and adheres to the opposite default: an observed relationship is assumed to hold only in situations identical, or very similar to, the one in which it has already been observed.”

But my problem is that, in many cases, I don’t even think the observed relationship holds in the situations in which has been observed.

To put it more statistically: Claims in the sample do not necessarily generalize to the population. Or, to put it another way, correlation does not even imply correlation.

Here’s a simple example: I go the store, buy a die, I roll it 10 times and get 3 sixes, and I conclude that the probability of getting a six from this die is 0.3. That’s a bad inference! The result from 10 die rolls gives me just about no useful information about the probability of rolling a six.

Here’s another example, just as bad but not so obviously bad: I find a survey of 3000 parents, and among those people, the rate of girl births was 8% higher among the most attractive parents than among the other parents. That’s a bad inference! The result from 3000 births gives me just about no useful information about the probability of a girl birth.

So, in those examples, even a “slow” inference (e.g., “This particular die is biased,” or “More attractive parents from the United States in this particular year are more likely to have girls”) is incorrect.

This point doesn’t invalidate any of Yarkoni’s article; I’m just bringing it up because I’ve sometimes seen a tendency in open-science discourse for people to give too much of the benefit of the doubt to bad science. I remember this with that ESP paper from 2011: people would say that this paper wasn’t so bad, it just demonstrated general problems in science. Or they’d accept that the experiments in the paper offered strong evidence for ESP, it was just that the evidence overwhelmed their prior. But no, the ESP paper was bad science, and it didn’t offer strong evidence. (Yes, that’s just my opinion. You can have your own opinion, and I think it’s fine if people want to argue (mistakenly, in my view) that the ESP studies are high-quality science. My point is that if you want to argue that, argue it, but don’t take that position by default.)

That was my point when I argued against over-politeness in scientific discourse. The point is not to be rude to people. We can be as polite as we want to individual people. The point is that there are costs, serious costs, to being overly polite to scientific claims. Every time you “bend over backward” to give the benefit of the doubt to scientific claim A, you’re rigging things against the claim not-A. And, in doing so, you could be doing your part to lead science astray (if the claims A and not-A are of scientific importance) or to hurt people (if the claims A and not-A have applied impact). And by “hurt people,” I’m not talking about authors of published papers, or even of hardworking researchers who didn’t get papers published because they couldn’t compete with the fluff that gets published by PNAS etc., I’m talking about the potential consumers of this research.

Here I’m echoing the points made by Alexey Guzey in his recent post on sleep research. I do not believe in giving a claim the benefit of the doubt, just cos it’s published in a big-name journal or by a big-name professor.

In retrospect, instead of saying “Against politeness,” I should’ve said “Against deference.”

Anyway, I don’t think Yarkoni’s article is too deferential to dodgy published claims. I just wanted to emphasize that even his proposed “slow” approach to inference can let a bunch of iffy claims sneak in.

Later on, Yarkoni writes:

Researchers must be willing to look critically at previous studies and flatly reject—on logical and statistical, rather than empirical, grounds—assertions that were never supported by the data in the first place, even under the most charitable methodological assumptions.

I agree. Or, to put it slightly more carefully, we don’t have to reject the scientific claim; rather, we have to reject the claim that the experimental data at hand provide strong evidence for the attached scientific claim (rather than merely evidence consistent with the claim). Recall the distinction between truth and evidence.

Yarkoni also writes:

The mere fact that a previous study has had a large influence on the literature is not a sufficient reason to expend additional resources on replication. On the contrary, the recent movement to replicate influential studies using more robust methods risks making the situation worse, because in cases where such efforts superficially “succeed” (in the sense that they obtain a statistical result congruent with the original), researchers then often draw the incorrect conclusion that the new data corroborate the original claim . . . when in fact the original claim was never supported by the data in the first place.

I agree. This is the sort of impoliteness, or lack of deference, that I think is valuable going forward.

Or, conversely, if we want to be polite and deferential to embodied cognition and himmicanes and air rage and ESP and ages ending in 9 and the critical positivity ratio and all the rest . . . then let’s be just as polite and deferential to all the zillions of unpublished preprints, all the papers that didn’t get into JPSP and Psychological Science and PNAS, etc. Vaccine denial, N rays, spoon bending, whatever. The whole deal. But that way lies madness.

Let me again yield the floor to Yarkoni:

There is an unfortunate cultural norm within psychology (and, to be fair, many other fields) to demand that every research contribution end on a wholly positive or “constructive” note. This is an indefensible expectation that I won’t bother to indulge.

Thank you. I thank Yarkoni for his directness, as earlier I’ve thanked Alexey Guzey, Carol Nickerson, and others for expressing negative attitudes that are sometimes socially shunned.

2. I recommend that Yarkoni avoid the use of the terms fixed and random effects as this could confuse people. He uses “fixed” to imply non-varying, which makes a lot of sense, but in economics they use “fixed” to imply unmodeled. In the notation of this 2005 post, he’s using definition 1, and economists are using definition 5. The funny thing is that everyone who uses these terms thinks they’re being clear. But the terms have different meanings for different people. Later on page 7 Yarkoni alludes to definitions 2 and 3. The whole fixed and random thing is a mess.

Conclusion

Let me conclude with the list of recommendations with which Yarkoni concludes:

Draw more conservative inferences

Take descriptive research more seriously

Fit more expansive statistical models

Design with variation in mind

Emphasize variance estimates

Make riskier predictions

Focus on practical predictive utility

I agree. These issues come up not just in psychology but also in political science, pharmacology, and I’m sure lots of other fields as well.

Filed under Decision Theory, Miscellaneous Statistics.

43 Comments

BDA FREE (Bayesian Data Analysis now available online as pdf)

Posted by Andrew on 6 April 2020, 10:34 am

Our book, Bayesian Data Analysis, is now available for download for non-commercial purposes!

You can find the link here, along with lots more stuff, including:

• Aki Vehtari’s course material, including video lectures, slides, and his notes for most of the chapters

• 77 best lines from my course

• Data and code

• Solutions to some of the exercises

We started writing this book in 1991, the first edition came out in 1995, now we’re on the third edition . . . it’s been a long time.

If you want the hard copy (which I still prefer, as I can flip through it without disturbing whatever is on my screen), you can still buy it at a reasonable price.

Filed under Bayesian Statistics, Teaching.

11 Comments

Pandemic cats following social distancing

Posted by Andrew on 6 April 2020, 9:16 am

Who ever said that every post had to do with statistical modeling, causal inference or social science?

(Above photo sent in by Zad.)

Filed under Public Health.

39 Comments

Career advice for a future statistician

Posted by Andrew on 5 April 2020, 9:40 am

Gary Ruiz writes:

I am a first-year math major at the Los Angeles City College in California, and my long-term educational plans involve acquiring at least one graduate degree in applied math or statistics.

I’m writing to ask whether you would offer any career advice to someone interested in future professional work in statistics.

I would mainly like to know:

– What sort of skills does this subject demand and reward, more specifically than the requisite/general mathematical abilities?

– What are some challenges someone is likely to face that are unique to studying statistics? Any quirks to the profession at a higher (mainly at the research) level?

– How does statistics contrast with related majors like Applied Mathematics in terms of the requisite training or later subjects of study?

– Are there any big (or at least common) misconceptions regarding what statistical research work involves?

– What are some of the other non-academic considerations I might want to keep in mind? For example, what are other statisticians usually like (if there’s a “general type”)? How does being a statistician affect your day-to-day life (in terms of the time investment, etc.), if at all?

– If you could give your younger self any career-related advice, what would it be? (I hope this question isn’t too cliche, but I figured it was worth asking).

– Finally, what are the most important factors that any potential statistician should consider before committing to the field?

My replies:

– Programming is at least as important as math. Beyond that, you could get a sense of what skills could be useful by looking at our forthcoming book, Regression and Other Stories, or by working through the Stan case studies.

– I don’t know that there are any challenges that are unique to studying statistics. Compared to other academic professions, I think statistics is less competitive, maybe because there are so many alternatives to academia involving work in government and industry.

– I don’t know enough about undergraduate programs to compare statistics to applied math. My general impression is that the two fields are similar.

– I don’t know of any major misconceptions regarding statistical research work. The only thing I can think of offhand is that in our PhD students we sometimes get pure math students who want to go into finance, I think in part because they think this will be a way for them to keep doing math. But then when they get jobs in finance, they find themselves running logistic regressions all day. So it might’ve been more useful for them to have studied applied statistics rather than learning proofs of the Strong Law of Large Numbers. But this won’t arise at the undergraduate level. I’m pretty sure that any math you learn as an undergrad will come in handy later.

– Regarding non-academic considerations: how your day-to-day life goes depends on the job. I’ve found lawyers and journalists to be on irregular schedules: either they’re in an immense hurry and are bugging me at all hours, or they’re on another assignment and they don’t bother responding to inquiries. Statistics is a form of engineering, and I think the job is more time-averaged. Even when there’s urgency (for example, when responding to a lawyer or journalist), everything takes a few hours. It’s typically impossible to do a rush job—and, even if you could, you’re better off checking your answer a few times to make sure you know what you’re doing. You’ll be making lots of mistakes in your career anyway, so it’s best to avoid putting yourself in a situation where you’re almost sure to mess up.

– Career advice to my younger self? I don’t know that this is so relevant, given how times have changed so much in the past 40 years. My advice is when choosing what to do, look at older people who are similar to you in some way and have made different choices. One reason I decided to go into research, many years ago, was that the older people I observed who were doing research seemed happy in their jobs—even the ones who were doing boring research seemed to like it—while the ones doing other sorts of jobs, even those that might sound fun or glamorous, seemed more likely to have burned out. Looking back over the years, I’ve had some pretty good ideas that might’ve made me a ton of money, but I’ve been fortunate enough to be paid enough to have no qualms about giving these ideas away for free.

– What factors should be considered by a potential statistician? I dunno, maybe think hard about what applications you’d like to work on. Typically you’ll have one or maybe two applications you’re an expert on. So choose something that seems interesting or important to you.

Filed under Teaching.

52 Comments

Interesting y-axis

Posted by Andrew on 4 April 2020, 10:38 pm

Merlin sent along this one:

P.S. To be fair, when it comes to innumeracy, whoever designed the above graph has nothing on these people.

As Clarissa Jan-Lim put it:

Math is hard and everyone needs to relax! (Also, Mr. Bloomberg, sir, I think we will all still take $1.53 if you’re offering).

Filed under Statistical graphics.

41 Comments

Model building is Lego, not Playmobil. (toward understanding statistical workflow)

Posted by Andrew on 4 April 2020, 9:46 am

John Seabrook writes:

Socrates . . . called writing “visible speech” . . . A more contemporary definition, developed by the linguist Linda Flower and the psychologist John Hayes, is “cognitive rhetoric”—thinking in words.

In 1981, Flower and Hayes devised a theoretical model for the brain as it is engaged in writing, which they called the cognitive-process theory. It has endured as the paradigm of literary composition for almost forty years. The previous, “stage model” theory had posited that there were three distinct stages involved in writing—planning, composing, and revising—and that a writer moved through each in order. To test that theory, the researchers asked people to speak aloud any stray thoughts that popped into their heads while they were in the composing phase, and recorded the hilariously chaotic results. They concluded that, far from being a stately progression through distinct stages, writing is a much messier situation, in which all three stages interact with one another simultaneously, loosely overseen by a mental entity that Flower and Hayes called “the monitor.” Insights derived from the work of composing continually undermine assumptions made in the planning part, requiring more research; the monitor is a kind of triage doctor in an emergency room.

This all makes sense to me. It reminds me of something I tell my students, which is that “writing is non-algorithmic,” which isn’t literally true—everything is algorithmic, if you define “algorithm” broadly enough—but which is intended to capture the idea that when writing, we go back and forth between structure and detail.

Writing is not simply three sequential steps of planning, composing, and revising, but I still think that it’s useful when writing to consider these steps, and to think of Planning/Composing/Revising as a template. You don’t have to literally start with a plan—your starting point could be composing (writing a few words, or a few sentences, or a few paragraphs) or revising (working off something written by someone else, or something written earlier by you)—but at some point near the beginning of the project, an outline can be helpful. Plan with composition in mind, and then, when it’s time to compose, compose being mindful of your plan and also of your future revision process. (To understand the past, we must first know the future.)

But what I really wanted to talk about today is statistical analysis, not writing. My colleagues and I have been thinking a lot about workflow. On the first page of BDA, we discuss these three steps:
1. Model building.
2. Model fitting.
3. Model checking.
And then you go back to step 1.

That’s all fine, it’s a starting point for workflow, but it’s not the whole story.

As we’ve discussed here and elsewhere, we don’t just fit a single model: workflow is about fitting multiple models. So there’s a lot more to workflow; it includes model building, model fitting, and model checking as dynamic processes where each model is aware of others.

Here are some ways this happens:

– We don’t just build one model, we build a sequence of models. This fits into the way that statistical modeling is a language with a generative grammar. To use toy terminology, model building is Lego, not Playmobil.

– When fitting a model, it can be helpful to use fits from other models as scaffolding. The simplest idea here is “warm start”: take the solution from a simple model as a starting point for new computation. More generally, we can use ideas such as importance sampling, probabilistic approximation, variational inference, expectation propagation, etc., to leverage solutions from simple models to help compute for more complicated models.

– Model checking is, again, relative to other models that interest us. Sometimes we talk about comparing model fit to raw data, but in many settings any “raw data” we see have already been mediated by some computation or model. So, more generally, we check models by comparing them to inferences from other, typically simpler, models.

Another key part of statistical workflow is model understanding, also called interpretable AI. Again, we can often best understand a fitted model by seeing its similarities and differences as compared to other models.

Putting this together, we can think of a sequence of models going from simple to complex—or maybe a network of models—and then the steps of model building, inference, and evaluation can be performed on this network.

This has come up before—here’s a post with some links, including one that goes back to 2011—so the challenge here is to actually do something already!

Our current plan is to work through workflow in some specific examples and some narrow classes of models and then use that as a springboard toward more general workflow ideas.

P.S. Thanks to Zad Chow for the adorable picture of workflow shown above.

Filed under Bayesian Statistics, Literature, Miscellaneous Statistics.

12 Comments

Update: OHDSI COVID-19 study-a-thon.

Posted by Keith O’Rourke on 3 April 2020, 5:30 pm

Thought a summary in the read below section might be helpful as the main page might be a lot to digest.

The OHDSI Covid 19 group re-convenes at 6:00 (EST I think) Monday for updates.

For those who want to do modelling, you cannot get the data but must write analysis scripts that data holders will run on their computers and return results. My guess is that might be most doable through here where custom R scripts can be implemented that data holders might be able to run. Maybe some RStan experts can try to work this through.

Continue reading ‘Update: OHDSI COVID-19 study-a-thon.’ »

Filed under Public Health.

Comment

Noise-mining as standard practice in social science

Posted by Andrew on 3 April 2020, 9:04 am

The following example is interesting, not because it is particularly noteworthy but rather because it represents business as usual in much of social science: researchers trying their best, but hopelessly foiled by their use of crude psychological theories and cruder statistics, along with patterns of publication and publicity that motivate the selection and interpretation of patterns in noise.

Elio Campitelli writes:

The silliest study this week?

I realise that it’s a hard competition, but this has to be the silliest study I’ve read this week. Each group of participants read the same exact text with only one word changed and the researchers are “startled” to see that such a minuscule change did not alter the readers’ understanding of the story. From the Guardian article (the paper is yet to be published as I’m sending you this email):

Two years ago, Washington and Lee University professors Chris Gavaler and Dan Johnson published a paper in which they revealed that when readers were given a sci-fi story peopled by aliens and androids and set on a space ship, as opposed to a similar one set in reality, “the science fiction setting triggered poorer overall reading” and appeared to “predispose readers to a less effortful and comprehending mode of reading – or what we might term non-literary reading”.

But after critics suggested that merely changing elements of a mainstream story into sci-fi tropes did not make for a quality story, Gavaler and Johnson decided to revisit the research. This time, 204 participants were given one of two stories to read: both were called “Ada” and were identical apart from one word, to provide the strictest possible control. The “literary” version begins: “My daughter is standing behind the bar, polishing a wine glass against a white cloth.” The science-fiction variant begins: “My robot is standing behind the bar, polishing a wine glass against a white cloth.”

In what Gavaler and Johnson call “a significant departure” from their previous study, readers of both texts scored the same in comprehension, “both accumulatively and when divided into the comprehension subcategories of mind, world, and plot”.

The presence of the word “robot” did not reduce merit evaluation, effort reporting, or objective comprehension scores, they write; in their previous study, these had been reduced by the sci-fi setting. “This difference between studies is presumably a result of differences between our two science-fiction texts,” they say.

Gavaler said he was “pretty startled” by the result.

I mean, I wouldn’t dismiss out of hand the possibility of a one-word change having dramatic consequences (change “republican” to “democrat” in a paragraph describing a proposed policy, for example). But in this case it seems to me that the authors surfed the noise generated by the previous study into expecting a big change by just changing “sister” to “robot” and nothing else.

I agree. Two things seem to be going on:

1. The researchers seem to have completely internalized the biases arising from the statistical significance filter that lead to estimates being too high (as discussed in section 2.1 of this article), thus they came into this new experiment expecting to see a huge and statistically significant effect (recall the 80% power lie).

2. Then they do the experiment and are gobsmacked to find nothing (like the 50 shades of gray story, but without the self-awareness).

The funny thing is that items 1 and 2 kinda cancel, and the researchers still end up with positive press!

P.S. I looked up Chris Gavalar and he has a lot of interesting thoughts. Check out his blog! I feel bad that he got trapped in the vortex of bad statistics, and I don’t want this discussion of statistical fallacies to reflect negatively on his qualitative work.

Filed under Sociology, Zombies.

14 Comments

Conference on Mister P online tomorrow and Saturday, 3-4 Apr 2020

Posted by Andrew on 2 April 2020, 1:15 pm

We have a conference on multilevel regression and poststratification (MRP) this Friday and Saturday, organized by Lauren Kennedy, Yajuan Si, and me. The conference was originally scheduled to be at Columbia but now it is online. Here is the information.

If you want to join the conference, you must register for it ahead of time; just click on the link.

Here are the scheduled talks for tomorrow (Fri):

Elizabeth Tipton RCT Designs for Causal Generalization

Benjamin Skinner Why did you go? Using multilevel regression with poststratification to understand why community colleges students exit early

Jon Zelner From person-to-person transmission events to population-level risks: MRP as a tool for maximizing the public health benefit of infectious disease data

Katherine Li Multilevel Regression and Poststratification with Unknown Population Distributions of Poststratifiers

Qixuan Chen Use of administrative records to improve survey inference: a response propensity prediction approach

Lauren Kennedy and Andrew Gelman 10 things to love and hate about MRP

And here’s the schedule for Saturday:

Shiro Kuriwaki and Soichiro Yamauchi

Roberto Cerina Election projections using available data, machine learning, and poststratification

Douglas Rivers Modeling elections with multiple candidates

Yajuan Si Statistical Data Integration and Inference with Multilevel Regression and Poststratification

Yutao Liu Model-based prediction using auxiliary information

Samantha Sekar

Chris Hanretty Hierarchical related regression for individual and aggregate electoral data

Lucas Leemann Improved Multilevel Regression with Post-Stratification Through Machine Learning (autoMrP)

Leontine Alkema Got data? Quantifying the contribution of population-period-specific information to model-based estimates in demography and global health

Jonathan Gellar Are SMS (text message) surveys a viable form of data collection in Africa and Asia?

Charles Margossian Laplace approximation for speeding computation of multilevel models

Filed under Bayesian Statistics, Multilevel Modeling, Political Science, Public Health, Stan, Statistical computing.

16 Comments

More coronavirus research: Using Stan to fit differential equation models in epidemiology

Posted by Andrew on 2 April 2020, 11:23 am

Seth Flaxman and others at Imperial College London are using Stan to model coronavirus progression; see here (and I’ve heard they plan to fix the horrible graphs!) and this Github page.

They also pointed us to this article from December 2019, Contemporary statistical inference for infectious disease models using Stan, by Anastasia Chatzilena et al. I guess this particular paper will be useful for people getting started in this area, or for epidemiologists who’ve been hearing about Stan and would like to know how to use it for differential equation models in epidemiology. I have not read the article in detail.

We’re also doing some research on how to do inference for differential equations more efficiently in Stan. Nothing ready to report here, but new things will come soon, I hope. One idea is to run the differential equation solver on a coarser time scale in the NUTS updating and, use importance sampling to correct the errors, and then run the solver on the finer time scale in the generated quantities block.

Filed under Bayesian Statistics, Public Health, Stan, Statistical computing.

6 Comments

What can we learn from super-wide uncertainty intervals?

Posted by Andrew on 2 April 2020, 9:14 am

This question comes up a lot, in one form or another. Here’s a topical version, from Luigi Leone:

I am writing after three weeks of lockdown.

I would like to put to your attention this Imperial College report (issued on monday, I believe).

The report estimates 9.8% of the Italian population (thus, 6 mil) and 15% of the Spanish population (thus about 7 mil people) as already infected. Their estimation is based on Bayesian models of which I do not know a thing, while you know a lot. Hence, I cannot judge. But on a practical note, I was impressed by the credibility intervals: for Italy between 1.9 mil and 15.2 mil, and for Spain between 1.7 mil and 19 mil! What could a normal person do of these estimates that imply opposite conclusions (for instance for the mortality rate, which could oscillate between the Spanish flu at one end and the regular flu at the other end of the interval)? It is also strange for me, that the wider credibility intervals are found for the countries with more data (tests, positives, deaths), not for those with less data.

My reply: When you get this sort of wide interval, the appropriate response is to call for more data. The wide intervals are helpful in telling you that more information will be needed if you want to make an informed decision.

As noted above, this comes up all the time. When we say to accept uncertainty and embrace variation, the point is not that uncertainty (or certainty) is a good in itself but rather guide our actions. Certainty, or the approximation of certainty, can help in our understanding. Uncertainty can inform our decision making.

Filed under Bayesian Statistics, Decision Theory, Public Health.

118 Comments

“Partially Identified Stan Model of COVID-19 Spread”

Posted by Andrew on 1 April 2020, 8:00 pm

Robert Kubinec writes:

I am working with a team collecting government responses to the coronavirus epidemic. As part of that, I’ve designed a Stan time-varying latent variable model of COVID-19 spread that only uses observed tests and cases. I show while it is impossible to know the true number of infected cases, we can rank/sign identify the effects of government policies on the virus spread. I do some preliminary analysis with the dates of emergency declarations of US states to show that states which declared earlier seem to have lower total infection rates (though they have not yet flattened the infection curve).

Furthermore, by incorporating informative priors from SEIR/SIR models, it is possible to identify the scale of the latent variable and provide more informative estimates of total infected. These estimates (conditional on a lower bound based on SIR/SEIR models) report that approximately 700,000 Americans have been infected as of yesterday, or roughly 6-7 times the observed case count, as many SEIR/SIR models have predicted.

I’m emailing you as I would love feedback on the model as well as to share it with others who may be engaged in similar modeling tasks.

Paper link

Github with Data & Stan code

Filed under Public Health, Stan.

2 Comments

Moving blog to twitter

Posted by Andrew on 1 April 2020, 9:00 am

My co-bloggers and I have decided that the best discussions are on twitter so we’re shutting down this blog, as of today. Old posts will remain, and you can continue to comment, but we won’t be adding any new material.

We’re doing this for two reasons:

1. Our various attempts to raise funds by advertising on the blog or by running sponsored posts have not been effective. (Did you know that approximately one in ten posts on this blog have been sponsored? Probably not, as we’ve been pretty careful to keep a consistent “house style” in our writing. Now you can go back and try to figure out which posts were which.) Not enough of you have been clicking the links, so all this advertising and sponsoring has barely made enough money to pay the web hosting fees.

2. The blog is too damn wordy. We recognize that just about nobody ever reads to the end of these posts (even this one). Remember what Robert Frost said about playing tennis without a net? Twitter has that 140-character limit, which will keep us focused. And on the rare occasions when we have more to say than can be fit in 140 characters, we’ll just post a series of tweets. That should be easy enough to read—and the broken-into-140-character-bits will be a great way to instill a readable structure.

Every once in a while we’ll have more to say than can be conveniently expressed in a tweet, or series of tweets. In these cases we’ll just publish our opinion pieces in Perspectives on Psychological Science or PNAS. I don’t know if you’ve heard, but we’ve got great connections at those places! I have a friend who’s a psychology professor at Princeton who will publish anything I send in.

And if we have any ideas that are too conceptually advanced to fit on twitter or in a PNAS paper, we’ll deliver them as Ted talks. We have some great Ted talk ideas but we’ll need some help with the stunts and the special effects.

This blog has been going for over 15 years. We’ve had a good run, and thanks for reading and commenting. Over and out.

Filed under Decision Theory.

79 Comments

Stasi’s back in town. (My last post on Cass Sunstein and Richard Epstein.)

Posted by Andrew on 31 March 2020, 10:12 pm

OK, I promise, this will be the last Stasi post ever.

tl;dr: This post is too long. Don’t read it.
Continue reading ‘Stasi’s back in town. (My last post on Cass Sunstein and Richard Epstein.)’ »

Filed under Sociology, Zombies.

26 Comments

And the band played on: Low quality studies being published on Covid19 prediction.

Posted by Keith O’Rourke on 31 March 2020, 3:16 pm

According to Laure Wynants et al Systematic review and critical appraisal of prediction models for diagnosis and prognosis of COVID-19 infection most of the recent published studies on prediction of Covid19 are of rather low quality.

Information is desperately needed but not misleading information :-(

Conclusion: COVID-19 related prediction models for diagnosis and prognosis are quickly
entering the academic literature through publications and preprint reports, aiming to support
medical decision making in a time where this is needed urgently. Many models were poorly
reported and all appraised as high risk of bias. We call for immediate sharing of the individual
participant data from COVID-19 studies worldwide to support collaborative efforts in
building more rigorously developed and validated COVID-19 related prediction models. The
predictors identified in current studies should be considered for potential inclusion in new
models. We also stress the need to adhere to methodological standards when developing and
evaluating COVID-19 related predictions models, as unreliable predictions may cause more
harm than benefit when used to guide clinical decisions about COVID-19 in the current
pandemic.

Filed under Public Health, Sociology.

9 Comments

“How to be Curious Instead of Contrarian About COVID-19: Eight Data Science Lessons From Coronavirus Perspective”

Posted by Andrew on 31 March 2020, 9:05 am

Rex Douglass writes:

I direct the Machine Learning for Social Science Lab at the Center for Peace and Security Studies, UCSD. I’ve been struggling with how non-epidemiologists should contribute to COVID-19 questions right now, and I wrote a short piece that summarizes my thoughts.

8 data science suggestions

For people who want to use theories or models to make inferences or predictions in social science, Douglass offers the following eight suggestions:

1: Actually Care About the Answer to a Question

2: Pose a Question and Propose a Research Design that Can Answer It

3: Use Failures of Your Predictions to Revise your Model

4: Form Meaningful Prior Beliefs with a Thorough Literature Review

5: Don’t Form Strong Prior Beliefs Based on Cherry Picked Data

6: Be Specific and Concrete About Your Theory

7: Choose Enough Cases to Actually Test Your Theory

8: Convey Uncertainty with Specificity not Doublespeak

2 more suggestions from me

I’d like to augment Douglass’s list with two more items:

9: Recognize that social science models depend on context. Be clear on the assumptions of your models, and consider where and when they will fail.

10: Acknowledge internal anomalies (aspects of your theories that are internally incoherent) and external anomalies (examples when your data makes incorrect real-world predictions).

Both these new points are about recognizing and working with the limitations of your model. Some of this is captured in Douglass’s point 3 above (“Use Failures of Your Predictions to Revise your Model”). I’m going further, in point 9 urging people to consider the limitations of their models right away, without waiting for the failures; and in point 10 urging people to publicly report problems when they are found. Don’t just revise your model; also explore publicly what went wrong.

Background

Douglass frames his general advice as a series of critiques of a couple of op-eds by a loud and ignorant contrarian, a law professor named Richard Epstein.

Law professors get lots of attention in this country, which I attribute to some combination of their good media connections, their ability to write clearly and persuasively and on deadline, and their habit and training of advocacy, of presenting one side of a case very strongly and with minimal qualifications.

Epstein’s op-eds are pretty silly and they hardly seem worth taking seriously, except as indicating flaws in our elite discourse. He publishes at the Hoover Institution, and I’m guessing the people in charge of the Hoover Institution feel that enough crappy left-wing stuff is being published by the news media every day, that they can’t see much harm in countering that with crappy right-wing stuff of their own. Or maybe it’s just no big deal. Stanford University publishing a poorly-sourced opinion piece is, from a scholarly perspective, a much more mild offense than what their Berkeley neighbor is doing with a professor who engages in omitting data or results such that the research is not accurately represented in the research record. If you’re well connected, elite institutions will let you get away with a lot.

When responding to criticism, Epstein seems like a more rude version of the cargo-cult scientists who we deal with all the time on this blog, people who lash out at you when you point out their mistakes. In this case, Epstein’s venue is not email oor twitter or even Perspectives on Psychological Science; it’s an interview in the New Yorker, where he issues the immortal words:

But, you want to come at me hard, I am going to come back harder at you. And then if I can’t jam my fingers down your throat, then I am not worth it. . . . But a little bit of respect.

Dude’s a street fighter. Those profs and journalists who prattle on about methodological terrorists, second-string replication police, Stasi, Carmelo, etc., they got nothing on this Richard Epstein guy.

In this case, though, we can thank Epstein for motivating Douglass’s thoughtful article.

P.S. I’d been saving the above image for the next time I wrote about Cass “Stasi” Sunstein. But a friend told me that people take umbrage at “sustained, constant criticism,” so maybe best not to post more about Sunstein for awhile. My friend was telling me to stop posting about Nate Silver, actually. It’s ok, there are 8 billion other people we can write about for awhile.