The GiveWell Blog

The winners of the Change Our Mind Contest—and some reflections

In September, we announced the Change Our Mind Contest for critiques of our cost-effectiveness analyses. Today, we’re excited to announce the winners!

We’re very grateful that so many people engaged deeply with our work. This contest was GiveWell’s most successful effort so far to solicit external criticism from the public, and it wouldn’t have been possible without the participation of people who share our goal of allocating funding to cost-effective programs.

Overall, we received 49 entries engaging with our prompts. We were very happy with the quality of entries we received—their authors brought a great deal of thought and expertise to engaging with our cost-effectiveness analyses.

Because we were impressed by the quality of entries, we’ve decided to award two first-place prizes and eight honorable mentions. (We stated in September that we would give a minimum of one first-place, one runner-up, and one honorable mention prize.) We also awarded $20,000 to the piece of criticism that inspired this contest.

Winners are listed below, followed by our reflections on this contest and responses to the winners.

The prize-winners

Given the overall quality of the entries we received, selecting a set of winners required a lot of deliberation.

We’re still in the process of determining which critiques to incorporate into our cost-effectiveness analyses and to what extent they’ll change the bottom line; we don’t agree with all the critiques in the first-place and honorable mention entries, but each prize-winner raised issues that we believe were worth considering. In several cases, we plan to further investigate the questions raised by these entries.

Within categories, the winners are listed alphabetically by the last name of the author who submitted the entry.

First-place prizes – $20,000 each1Both of these entries were outstanding, and they represent very different approaches. Because they are similarly excellent, we are naming two winners rather than one winner and one runner-up.

To give a general sense of the magnitude of the changes we currently anticipate, our best guess is that Matthew Romer and Paul Romer Present’s entry will change our estimate of the cost-effectiveness of Dispensers for Safe Water by very roughly 5 to 10%, and that Noah Haber’s entry may lead to an overall shift in how we account for uncertainty (but it’s too early to say how it would impact any given intervention). Overall, we currently expect that entries to the contest may shift the allocation of resources between programs but are unlikely to lead to us adding or removing any programs from our list of recommended charities.

Honorable mentions – $5,000 each

Participation prizes – $500 each

39 entries, not individually listed here.

All entries that met our criteria will receive participation prizes if they didn’t win a larger prize. To meet our requirements, authors had to share a critique that addressed our cost-effectiveness analysis and proposed a change that could make a material difference to our bottom line—this is no small feat, and we really appreciate everyone who took the time to do so!

Prize for inspiring the Change Our Mind Contest – $20,000

Joel McGuire, Samuel Dupret, and Michael Plant for “Deworming and decay: replicating GiveWell’s cost-effectiveness analysis.”

In July 2022, these three researchers at the Happier Lives Institute shared a critique of how GiveWell models the long-term benefits of deworming; they argue we should treat those benefits as decaying over time rather than remaining constant. We responded to their critique here. We’re in the process of incorporating this critique, and our best guess is that it will lead to a 10% to 30% decrease in our estimate of the cost-effectiveness of deworming, which we roughly estimate would have influenced $2 to $8 million in funding.

Because this work influenced our thinking and played a role in prompting the Change Our Mind Contest, we decided to make a grant of $20,000 to the Happier Lives Institute.

Logistics for prize-winners

We will be emailing the author who submitted each prize-winning entry, including those that won participation prizes. If you and your co-authors have not received an email by early January, please feel free to reach out to change-our-mind@givewell.org.

Reflections on this contest

There’s a robust community of people who are excited to engage with our work.

We received 49 entries that met the contest criteria, all of which represented meaningful engagement with our work. These entries came from a wide range of people—from health economists and from people in entirely unrelated fields, from the global health community and from the effective altruism community, from students and from professionals with years of work experience.

People submitted entries on many different topics. We received at least two entries on each of the six cost-effectiveness analyses we pointed people toward, plus some entries on other programs, and many cross-cutting entries on issues like uncertainty, the discount rate for future benefits, our moral weights, and more.

In order to manage all the suggestions we received, one of our researchers reviewed all 49 entries and created a dashboard for tracking the 100 discrete suggestions we identified. For each of those, we’re tracking whether we plan to do additional work to address the suggestion and how high-priority that work is.

We’re so glad that people were excited to contribute to our decision-making, and we’ll be continuing to look for ways to collaborate with the public to improve our work.

We have room for improvement, particularly on transparency.

We’re proud of being an unusually transparent research organization; transparency is one of our core values. Transparency has two facets: making information publicly available, and also making it easy to understand. We generally succeed at publishing the information that drives our decisions. But, we could do more to enable people to understand why we believe what we believe.

Some entries proposed changes that are actually very similar to what we’re already doing, but where the authors didn’t realize that because of the way our work is presented (e.g., a calculation takes place in a separate spreadsheet, or the name of a parameter doesn’t clearly represent its purpose). In other cases, entries flagged areas where the assumptions underlying our judgments aren’t apparent (e.g., in the case of development effects from averted cases of malaria). We appreciate these authors bringing those issues to our attention, and we hope to improve the clarity of our work!

People brought us new ideas—and old ones we hadn’t implemented.

Some entries covered ideas we hadn’t considered but found worth pursuing. For example, an entry arguing that iron fortification might be less cost-effective than we think inspired us to dig into the questions it raised about the prevalence of iron deficiency anemia in India. In this case, our current best guess is that our view on iron fortification won’t change much, but we believe it’s worthwhile for us to consider this issue.

Other entries covered issues we were aware of but hadn’t resolved. For example, we’ve known for a while that a calculation in our cost-effectiveness analysis for the Against Malaria Foundation is both poorly structured and presented in a confusing way. A few entries flagged perceived issues with how we calculate mortality in that analysis, and some authors thought the calculations were mistaken in a way that would have a significant impact on our bottom line (e.g., some entries understandably believed we’d failed to account for indirect deaths from malaria). We don’t think any of these entries captured the precise problem with the current calculation, but they homed in on a weak point in our analysis. Several months ago we created a revised version of our internal analysis that fixes the issue, but we haven’t yet finalized and published it. We’re likely to publish this revision in the next few months. In general, people flagging a known issue can help us prioritize changes.

This contest was worth doing.

We haven’t done anything like this before, and we weren’t really sure what to expect. We saw this as an opportunity to lean into our values, particularly transparency and truth-seeking, in service of helping people as much as we can with our funding decisions. The contest succeeded in that goal; we identified improvements we can make to our cost-effectiveness analyses in terms of both accuracy and clarity. And beyond that, this contest established that there are people who care deeply about our work and want to help us improve it. To everyone who participated—thank you!

Appendix: Discussion of winning entries

In this section, we share our initial thoughts on the two first-place entries. This appendix is probably more technical than will be of interest to most readers.

Noah Haber on uncertainty

Several entries focused on how GiveWell could improve its approach to uncertainty. This entry stood out for its clear demonstration of how failing to account for uncertainty can lead to suboptimal allocations, even in a risk-neutral framework.

In brief, this entry argues that when prioritizing by estimated expected value, one will sometimes select more uncertain programs whose true values are lower, over less uncertain programs whose true values are higher. This “optimizer’s curse” or “winner’s curse” can create a portfolio that is systematically less valuable than it could be if uncertainty was properly accounted for. This issue has been raised before, but we haven’t ever fully addressed it.2For example, a former GiveWell researcher wrote this post, which makes a different argument from Noah Haber’s piece but addresses a similar problem.

We’d like to consider the issues presented in this post and other recent criticisms of our approach to uncertainty in more depth. In the meantime, we’ll share some initial thoughts:

  • We really appreciated that this piece drew a clear link between incorporating uncertainty and the ranking of programs. It shows that if we don’t account for uncertainty explicitly, we may be allocating too much funding to more uncertain programs, which lowers the value of our overall funding allocation.
  • Currently, we make ad hoc adjustments for uncertainty, such as our strict internal validity adjustment for deworming. However, we haven’t adopted any rules for penalizing more uncertain programs, either quantitatively or qualitatively. This entry updates us toward believing we should consider a more systematic approach.
  • We’re not sure if conducting the full uncertainty modeling recommended by the entry is the right approach for GiveWell, and we’d like to explore alternative approaches to addressing this issue.
    • The entry argues the best approach would be to model uncertainty using a probabilistic sensitivity analysis (PSA). This would involve selecting and parameterizing probability distributions for key parameters; running repeat simulations to obtain a distribution of potential outcomes; and using this distribution as the basis for decision-making.
    • We’re not sure if this is the right approach because we think there could be some important drawbacks. It could limit accessibility of our models externally, make it difficult to compare across models if we’re unsure if uncertainty is being equally accounted for across programs, and make it harder to understand intuitively what’s driving our bottom line on which programs are more cost-effective. We would want to weigh those downsides against the benefits of PSA.
  • On the other hand, if we find that this problem leads to a sufficiently large impact on the value of our allocations, it might be worth the costs of a more complicated modeling approach. We’d like to do more work to explore how big of an impact this problem has on our portfolio.

Overall, we think handling uncertainty is an important issue, and we appreciate the nudge to consider it more deeply!

Matthew Romer and Paul Romer Present on water

This entry clearly presented a series of plausible changes to how we estimate the cost-effectiveness of water quality interventions, specifically Dispensers for Safe Water (DSW) and in-line chlorination (ILC). We believe the authors understand our analysis well and were able to identify some weak points in our cost-effectiveness analysis; we expect to make some but not all of the changes they propose.

To briefly summarize in our own words, this entry suggests that GiveWell should:

  1. Include Haushofer et al. (2021) in its meta-analysis on the effect of water chlorination on all-cause mortality.
  2. Use a formal Bayesian approach rather than a “plausibility cap” to estimate the effect of water chlorination on all-cause mortality.
  3. Revise our estimate for the age distribution of deaths averted by water chlorination, as well as for the medical costs averted.
  4. Discount both future costs and future benefits to account for changes over time.
  5. Revise the cost estimates for ILC.
  6. Review a calculation in our leverage and funging adjustment that they believe may contain an error.

We share some initial thoughts here, noting that we’re still in the process of deciding whether and how to incorporate these suggestions:

  1. Including Haushofer et al.: The choice to exclude Haushofer et al. was difficult, but we’re currently comfortable with our decision to exclude it. See more on this page, including in footnote 39. For our practical decision-making (versus a context like Cochrane meta-analyses where stricter decision rules might be needed), we think it makes sense to exclude it given (a) the fact that we find the implied intervention effect implausible across the 95% confidence interval; (b) the divergence of these results from the other strongest pieces of evidence we have; and (c) the large effect that including it would have on the pooled result.
  2. Using a formal Bayesian approach: We’re planning to consider this in more depth. Estimating effect sizes is difficult in cases where the point estimates from available evidence seem implausibly high to us. As the authors note, we’ve used a Bayesian approach in some of our other analyses (e.g., deworming), and it might be reasonable to use here. The plausibility cap we’re currently using seems like one reasonable approach in this context, but we haven’t fully explored other approaches. If we don’t move to a Bayesian approach, we may still make other changes inspired by this point.
  3. Revising the ages of deaths averted and medical costs averted:
    1. On the ages of deaths averted: We think it’s reasonable to use the age structure of direct deaths from enteric infections for indirect deaths as well because those deaths are still linked to enteric infections, based on the idea that enteric infections increase the risk of other diseases.
    2. On medical costs averted: Our published cost-effectiveness analysis uses an outdated method to estimate medical costs averted by water quality interventions, and we’re now internally using a method that we believe is better aligned with our analyses for interventions like those conducted by our top charities.
  4. Discounting future benefits and costs: We hadn’t realized that we’re treating future deaths differently in the New Incentives analysis—thank you for flagging that. For grants where the benefits occur more than a few years in the future (like this Dispensers for Safe Water grant), we generally want to account for both changing disease burdens (in this case, a decline in deaths from diarrhea) and general uncertainty over time, and we didn’t do that in this case. We’re less sure that we’d want to discount costs (versus benefits) in the future, given (a) consistency with our other analyses and (b) the fact that from our perspective, the costs are “spent” when we decide to make a grant.
  5. Revising costs of ILC: It’s true that we’re using very rough cost figures for ILC in our published analysis. We expect to learn more over time and incorporate that in future analyses we publish.
  6. Reviewing funging calculation: This seems like a likely error (that makes a small difference to the bottom line). We’ll correct it if upon further review we confirm that it’s an error!

Our best guess overall is that after more thoroughly reviewing these suggestions, we’ll revise our water quality cost-effectiveness analysis and our estimate of the cost-effectiveness of Dispensers for Safe Water will change by very roughly 5 to 10%.

Notes

Notes
1 Both of these entries were outstanding, and they represent very different approaches. Because they are similarly excellent, we are naming two winners rather than one winner and one runner-up.
2 For example, a former GiveWell researcher wrote this post, which makes a different argument from Noah Haber’s piece but addresses a similar problem.

Comments

  • Elizabeth Santorella on December 16, 2022 at 10:36 am said:

    I appreciated Noah Haber’s piece on “GiveWell’s Uncertainty Problem” and GiveWell’s response, and I’m very interested to see what changes this leads to.

    I’m curious about a couple things:
    – Does the process of recommending charities account for uncertainty in any way that modeling them doesn’t? For example, would the cost-effectiveness estimate from a program with less or worse evidence behind it be taken more skeptically?
    – Where are ad-hoc adjustments like the large adjustment for deworming already used?

    For what it’s worth, I’m sure the statistical issues Haber’s piece highlights are real. These issues are well-understood in areas I have worked in. For example, teacher value-added models based on student test scores adjust for regression to the mean; if they didn’t, average teachers with little data would often appear extremely good or extremely bad. And in online experiments (A/B tests), small sample sizes make it much more likely that an effect will be overstated, and that a follow-up experiment would fail to replicate the initial one.

    If a probabilistic parameter sensitivity analysis is too complicated, I think the New York Times buy or rent calculator is a great illustration of a simpler approach: https://www.nytimes.com/interactive/2014/upshot/buy-rent-calculator.html
    You can play with the parameter sliders to see how they affect the final recommendation. Parameters can affect the cost of buying, renting, or both. This exercise makes transparent which parameters are really important. You can do the same thing with the existing spreadsheets by changing parameters and seeing what happens, but a UI like the NYT’s makes it a lot easier to build intuition.

  • Elizabeth Santorella on December 16, 2022 at 12:42 pm said:

    One more thing on the topic of uncertainty: Uncertainty in denominators is a particular bugbear and can lead to huge biases. A probabilistic model can fix this in theory, but in practice can make issues much harder to spot.

    Imagine the following situation, inspired by the real case of SMS reminders for vaccination. Say we know with certainty that the value of a reminder is $2, but we aren’t sure about the cost. It could be $0.10 or even less, since texting is cheap, but it could be as much as $1 if it’s hard to obtain phone numbers. The cost-effectiveness is $2 / cost. There are a few ways to estimate that:

    A) Make a few guesses of what the cost might be, average them into a best-guess cost, then divide. So if the cost is between $0.10 or $1, a best guess is AVG($0.10, $1) = $0.55, and the cost-effectiveness is $2 / AVG($0.10, $1) = 3.6.

    B) Compute cost-effectiveness in two different scenarios, one where the cost is $0.10 and one where it is $1. So in one scenario it’s $2 / $0.10 = 20, and in another it’s $2 / $1 = 2. Our final estimate is AVG($2 / $0.10, $2 / $1) = AVG(20, 2) = 11.

    C) Use a statistical distribution over all possible costs and integrate over those scenarios. This is similar to (B), but represents all possibilities, not just two. Say our prior is that cost is uniformly distributed between $0.001, since texts might be really cheap, and $1. That gives a result of 13.8. (At least if you do it right! Monte Carlo analyses with fewer than 10,000 simulations will be noisy and unreliable.)

    So different methods give wildly different results here. Which is right? I would say none should inspire confidence. But Method B, working all the way through a couple scenarios, makes the takeaway clearest: the possibility of very low costs drives the potential cost-effectiveness here, and getting better information on cost should be a high priority. The other approaches could obscure this.

  • Ethan Kennerly on December 17, 2022 at 9:14 pm said:

    In GiveWells Uncertainty Problem, the tip to estimate the 80% lower-bound confidence interval (also known as the 20th percentile) instantly enlightened my interpretation of a GiveWell estimate. The tip would also inform my own estimate in my software engineering.

New Comment

Your email address will not be published. Required fields are marked *

*