Note: this is an unusually long and abstract post whose primary purpose is to help a particular subset of our audience understand our style of reasoning. It does not contain substantive updates on our research and recommendations.
GiveWell – both our traditional work and GiveWell Labs – is fundamentally about maximization: doing as much good as possible with each dollar you donate. This introduces some major conceptual challenges when making certain kinds of comparisons – for example, how does one compare the impact of distributing bednets in sub-Saharan Africa with the impact of funding research on potential high-risk responses to climate change, attempts to promote better collaboration in the scientific community or working against abuse of animals on factory farms?
Our approach to making such comparisons strikes some as highly counterintuitive, and noticeably different from that of other “prioritization” projects such as Copenhagen Consensus. Rather than focusing on a single metric that all “good accomplished” can be converted into (an approach that has obvious advantages when one’s goal is to maximize), we tend to rate options based on a variety of criteria using something somewhat closer to (while distinct from) a “1=poor, 5=excellent” scale, and prioritize options that score well on multiple criteria. (For example, see our most recent top charities comparison.)
We often take approaches that effectively limit the weight carried by any one criterion, even though, in theory, strong enough performance on an important enough dimension ought to be able to offset any amount of weakness on other dimensions. Relatedly, we look into a broad variety of causes, broader than can seemingly be justified by a consistent and stable set of values. Many others in the effective altruist community seem to have a strong and definite opinion on questions such as “how much animals suffer compared to humans,” such that they either prioritize animal welfare above all else or dismiss it entirely. (Similar patterns apply to views on the moral significance of the far future.) By contrast, we give simultaneous serious consideration to reducing animal suffering, reducing risks of global catastrophic events, reforming U.S. intellectual property regulation, global health and nutrition and more, and think it’s quite likely that we’ll recommend giving opportunities in several of these areas, while never resolving the fundamental questions that could (theoretically) establish one such cause as clearly superior to the others.
I believe our approach is justified, and in order to explain why – consistent with the project of laying out the basic worldview and epistemology behind our research – I find myself continually returning to the distinction between what I call “sequence thinking” and “cluster thinking.” Very briefly (more elaboration below),
- Sequence thinking involves making a decision based on a single model of the world: breaking down the decision into a set of key questions, taking one’s best guess on each question, and accepting the conclusion that is implied by the set of best guesses (an excellent example of this sort of thinking is Robin Hanson’s discussion of cryonics). It has the form: “A, and B, and C … and N; therefore X.” Sequence thinking has the advantage of making one’s assumptions and beliefs highly transparent, and as such it is often associated with finding ways to make counterintuitive comparisons.
- Cluster thinking – generally the more common kind of thinking – involves approaching a decision from multiple perspectives (which might also be called “mental models”), observing which decision would be implied by each perspective, and weighing the perspectives in order to arrive at a final decision. Cluster thinking has the form: “Perspective 1 implies X; perspective 2 implies not-X; perspective 3 implies X; … therefore, weighing these different perspectives and taking into account how much uncertainty I have about each, X.” Each perspective might represent a relatively crude or limited pattern-match (e.g., “This plan seems similar to other plans that have had bad results”), or a highly complex model; the different perspectives are combined by weighing their conclusions against each other, rather than by constructing a single unified model that tries to account for all available information.
A key difference with “sequence thinking” is the handling of certainty/robustness (by which I mean the opposite of Knightian uncertainty) associated with each perspective. Perspectives associated with high uncertainty are in some sense “sandboxed” in cluster thinking: they are stopped from carrying strong weight in the final decision, even when such perspectives involve extreme claims (e.g., a low-certainty argument that “animal welfare is 100,000x as promising a cause as global poverty” receives no more weight than if it were an argument that “animal welfare is 10x as promising a cause as global poverty”).
Finally, cluster thinking is often (though not necessarily) associated with what I call “regression to normality”: the stranger and more unusual the action-relevant implications of a perspective, the higher the bar for taking it seriously (“extraordinary claims require extraordinary evidence”).
I’ve tried to summarize the difference with the following diagram. Variation in shape size represents variation in the “certainty/robustness” associated with different perspectives, which matters a great deal when weighing different perspectives against each other for cluster thinking, but isn’t an inherent part of sequence thinking (it needs to be explicitly modeled by inserting beliefs such as “The expected value of this action needs to be discounted by 90%”).
I don’t believe that either style of thinking fully matches my best model of the “theoretically ideal” way to combine beliefs (more below); each can be seen as a more intellectually tractable approximation to this ideal.
I believe that each style of thinking has advantages relative to the other. I see sequence thinking as being highly useful for idea generation, brainstorming, reflection, and discussion, due to the way in which it makes assumptions explicit, allows extreme factors to carry extreme weight and generate surprising conclusions, and resists “regression to normality.” However, I see cluster thinking as superior in its tendency to reach good conclusions about which action (from a given set of options) should be taken. I have argued the latter point before, using a semi-formal framework that some have found convincing, some believe has flaws, and many have simply not engaged due to its high level of abstraction. In this post, I attempt a less formalized, more multidimensional, and hopefully more convincing (more “cluster-style”) defense. Following that, I lay out why I think sequence thinking is important and is probably more undersupplied on a global scale than cluster thinking, and discuss how I try to combine the two in my own decision-making. Separately from this post, I have also published a further attempt to formalize the underlying picture of an idealized reasoning process.
By its nature, cluster thinking is hard to describe and model explicitly. With this post, I hope to reduce that problem by a small amount – to help people understand what is happening when I say things like “I see no problem with your reasoning, but I’m not placing much weight on it anyway” or “I think that factor could be a million times as important as the others, but I don’t want to give it 100x as much attention,” and what they can do to change my mind in such circumstances. (The general answer is to reduce the uncertainty associated with an argument, rather than simply demonstrating that no explicit flaws with the argument are apparent.)
In the remainder of this post, I:
- Elaborate on my definitions of sequence and cluster thinking. More
- Give a variety of arguments for why one should expect cluster thinking to result in superior decisions. More
- Briefly note and link to a new page (published alongside this post) that attempts to formalize, to some degree, the “idealized thought process” I’m envisioning and how it reproduces key properties of cluster thinking. More
- Lay out some reasons that I find sequence thinking valuable, even if one accepts that cluster thinking results in superior decisions, and defend the idea of switching between “sequence” and “cluster” styles for different purposes. I believe sequence thinking is superior not only for purposes of discussion and reflection (due to its transparency), but also for reaching the sort of deep understanding necessary for intellectual progress, and for generating novel insights that can become overwhelmingly important. More
- Briefly discuss why cluster thinking can be confusing and challenging to deal with in a discussion, and outline how one can model and respond to cluster-thinking-based arguments that are often perceived as “conversation stoppers.”More
- Close with a brief discussion of how I try to combine the two in my own thinking and actions. More
Before I continue, I wish to note that I make no claim to originality in the ideas advanced here. There is substantial overlap with the concepts of foxes and hedgehogs (discussed by Philip Tetlock); with the model and combination and adjustment idea described by Luke Muehlhauser; with former GiveWell employee Jonah Sinick’s concept of many weak arguments vs. one relatively strong argument (and his post on Knightian uncertainty from a Bayesian perspective); with former GiveWell employee Nick Beckstead’s concept of common sense as a prior; with Brian Tomasik’s thoughts on cost-effectiveness in an uncertain world; with Paul Christiano’s Beware Brittle Arguments post; and probably much more.
Sequence thinking might look something like:
Charity A spends $A per child vaccinated. Each vaccination reduces the odds of death by B%. (Both A and B can be grounded somewhat in further analysis.) That leaves an estimate of (B/A) lives saved per dollar. I will adjust this estimate down 50% to account for the fact that costs may be understated and evidence may be overstated. I will adjust it down another 50% to account for uncertainties about organizational competence.
Charity B spends $C per year. My best guess is that it improves the odds that space colonization eventually occurs by D%. I value this outcome as the equivalent of E lives saved, based on my views about when space colonization is likely to occur, how many human lives would be possible in these case, and how I value these lives. (C, D, and E can be grounded somewhat in further analysis.) That leaves an estimate of (D*E)/C) lives saved per dollar. I will adjust this estimate down 95% to account for my high uncertainty in these speculative calculations. I will adjust it down another 75% to account for uncertainties about organizational competence, which I think are greater for Charity B than Charity A; down another 80% to account for the fact that expert opinion seems to look more favorably on Charity A; and down another 95% to account for the fact that charities such as Charity A generally have a better track record as a class.
After all of these adjustments, Charity B comes out better, so I select that one.
Cluster thinking might look something like:
Explicit expected-value calculations [such as the above] imply quite a strikingly good cost-per-life-saved for Charity A, and I think the estimate isn’t terribly likely to be terribly mistaken. That’s a major point in favor of Charity A. Similar calculations imply good cost-per-life-saved for Charity B, but this is a much more uncertain estimate and I don’t put much weight on it. The fact that Charity B comes out ahead even after trying to adjust for other factors is a point in favor of Charity B. In addition, Charity A seems like a better organization than Charity B, and expert opinion seems to favor Charity A, and organizations such as Charity A generally have a better track record as a class, and all of these are signals I have a fair amount of confidence in. Therefore, Charity A has more certainty-weighted factors in its favor than Charity B.
Note that this distinction is not the same as the distinction between explicit expected value and holistic-intuition-based decision-making. Both of the thought processes above involve expected-value calculations; the two thought processes consider all the same factors; but they take different approaches to weighing them against each other. Specifically:
- Sequence thinking considers each parameter independently and doesn’t do any form of “sandboxing.” So it is much easier for one very large number to dominate the entire calculation even after one makes adjustments for e.g. expert opinion and other “outside views” (such as the track record of the general class of organization). More generally, it seems easier to reach a conclusion that contradicts expert opinion and other outside views using this style. This style also seems more prone to zeroing in on a particular category of charity as most promising: for example, often one’s estimate of the value of space colonization will either be high enough to dominate other considerations or low enough to make all space-colonization-related considerations minor, even after many other adjustments are made.
- The two have very different approaches to what some call Knightian uncertainty (also sometimes called “model uncertainty” or “unknown unknowns”): the possibility that one’s model of the world is making fundamental mistakes and missing key parameters entirely. Cluster thinking uses several models of the world in parallel (e.g., “Expert opinion is correct”, “The track record of the general class of an organization predicts its success”, etc.) and limits the weight each can carry based on robustness (by which I mean the opposite of Knightian uncertainty: the feeling that a model is robust and unlikely to be missing key parameters); any chain of reasoning involving high uncertainty is essentially disallowed from making too much difference to the final decision, regardless of the magnitude of effect it points to. Sequence thinking involves the use of a single unified framework for decision analysis and by default it treats “50% probability that a coin comes up heads” and “50% probability that Charity B will fail for a reason I’m not anticipating” in fundamentally the same way. When it does account for uncertainty, it’s generally by adjusting particular parameters (for example, increasing “0.00001% chance of a problematic error” to “1% chance of a problematic error” based on the chance that one’s calculations are wrong); after such an adjustment, it uses the “highly uncertain probabilities adjusted for uncertainty” just as it would use “well-defined probabilities,” and does not disallow the final calculation from carrying a lot of weight.
Robustness and uncertainty
For the remainder of this piece, I will use the term robustness to refer to the “confidence/robustness” concept discussed immediately above (and “uncertainty” to refer to its opposite). I’m aware that I haven’t defined the term with much precision, and I think there is substantial room for sharpening its definition. One clarification I would like to make is that robustness is not the same as precision/quantifiability; instead, it is intended to capture something like “odds that my view would remain stable on this point if I were to gain more information, more perspectives, more intelligence, etc.” or “odds that the conclusion of this particular mental model would remain qualitatively similar if the model were improved.”
Regression to normality
A final important concept, which I believe is loosely though not necessarily related, is that of regression to normality: the stranger and more unusual the implications of an argument, the more “robustness” the supporting arguments need to have in order for it to be taken seriously. One way to model this concept is to consider “Conventional wisdom is correct and what seems normal is good” to be one of the “perspectives” or “mental models” weighed in parallel with others. This concept can potentially be modeled in sequence thinking as well, but in practice does not seem to be a common part of sequence thinking.
A couple more clarifications
Note that sequence thinking and cluster thinking converge in the case where one can do an expected-value calculation with sufficiently high robustness. “Outside view” arguments inherently involve a substantial degree of uncertainty (there are plenty of examples of expert opinion being wrong, of longstanding historical trends suddenly ending, etc.) so a robust enough expected-value calculation will carry the decision in both frameworks.
Note also that cluster thinking does not convert “uncertain, speculative probabilities” automatically into “very low probabilities.” Rather, it de-weights the conclusions of perspectives that overall contain a great deal of cumulative uncertainty, so that no matter what conclusion such perspectives reach, the conclusion is not allowed to have much influence on one’s actions.
Summary of properties of sequence thinking and cluster thinking
Sequence thinking | Cluster thinking | |
---|---|---|
Basic structure | Tries to combine all relevant beliefs into a prediction using one model (“If A, B, C, … N, then X”) | Weighs different mental models, each implying its own prediction (“A implies X; B implies ~X; C implies X; … therefore X”) |
How much can a high-uncertainty parameter affect the conclusion? | One big enough consideration can outweigh all others, even if it’s an uncertain “best guess” | Any conclusion reached using uncertain methods has limited impact on the final decision |
“Inside views” (laying out a causal chain) vs. “outside views” (expert opinion, “regression to normality,” historical track record of superficially similar decisions, etc. | No obvious way of integrating inside and outside views; integration is often done via ad hoc adjustments and inside views often end up dominating the decision | High-uncertainty inside views are usually dominated by outside views no matter what conclusions they reach |
Below, I give several arguments for expecting cluster thinking to produce better decisions. It is important to note that I emphasize “better decisions” and not “correct beliefs”: it is often the case that one reaches a decision using cluster thinking without determining one’s beliefs about anything (other than what decision ought to be made). In the example given in the previous section, cluster thinking has not reached a defined conclusion on how likely space colonization is, how valuable space colonization would be, etc. and there are many possible combinations of these beliefs that could be consistent with its conclusion that supporting Charity A is superior. Cluster thinking often ends up placing high weight on “outside view” pattern-matching, and often leads to conclusions of the form “I think we should do X, but I can’t say exactly why, and some of the most likely positive outcomes of this action may be outcomes I haven’t explicitly thought of.”
The arguments I give below are, to some degree, made using different vocabularies and different styles. There is some conceptual overlap between the different arguments, and some of the arguments may be partly equivalent to each other. I have previously tried to use sequence-thinking-style arguments to defend something similar to cluster thinking (though there were shortcomings in the way I did so); here I use cluster-thinking-style arguments.
Sequence thinking is prone to reaching badly wrong conclusions based on a single missing, or poorly estimated, parameter
Sequence-style reasoning often involves a long chain of propositions that all need to be reasonable for the conclusion to hold. As an example, Robin Hanson lays out 10 propositions that cumulatively imply a decision to sign up for cryonics, and believes each to have probability 50-80%. However, if even a single one ought to have been assigned a much lower probability (e.g., 10^-5) – or if he’s simply failed to think of a missing condition that has low probability – the calculation is completely off.
In general, missing parameters and overestimated probabilities will lead to overestimating the likelihood that actions play out as hoped, and thus overestimating the desirability of deviating from “tried and true” behavior and behavior backed by outside views. Correcting for missed parameters and overestimated probabilities will be more likely to cause “regression to normality” (and to the predictions of other “outside views”) than the reverse.
Cluster thinking is more similar to empirically effective prediction methods
Sequence thinking presumes a particular framework for thinking about the consequences of one’s actions. It may incorporate many considerations, but all are translated into a single language, a single mental model, and in some sense a single “formula.” I believe this is at odds with how successful prediction systems operate, whether in finance, software, or domains such as political forecasting; such systems generally combine the predictions of multiple models in ways that purposefully avoid letting any one model (especially a low-certainty one) carry too much weight when it contradicts the others. On this point, I find Nate Silver’s discussion of his own system and the relationship to the work of Philip Tetlock (and the related concept of foxes vs. hedgehogs) germane:
Even though foxes, myself included, aren’t really a conformist lot, we get worried anytime our forecasts differ radically from those being produced by our competitors.
Quite a lot of evidence suggests that aggregate or group forecasts are more accurate than individual ones … “Foxes often manage to do inside their heads what you’d do with a whole group of hedgehogs,” Tetlock told me. What he means is that foxes have developed an ability to emulate this consensus process. Instead of asking question of a whole group of experts, they are constantly asking questions of themselves. Often this implies that they will aggregate different types of information together – as a group of people with different ideas about the world naturally would – instead of treating any one piece of evidence as though it is the Holy Grail. The Signal and the Noise, pg 66
In sequence thinking, a single large enough number can dominate the entire calculation. In consensus decision making, a person claiming radically larger significance for a particular piece of the picture would likely be dismissed rather than given special weight; in a quantitative prediction system, a component whose conclusion differed from others’ by a factor of 10^10 would be likely to be the result of a coding error, rather than a consideration that was actually 10^10 times as important as the others. This comes back to the points made by the above two sections: cluster thinking can be superior for its tendency to sandbox or down-weight, rather than linearly up-weight, the models with the most extreme and deviant conclusions.
A cluster-thinking-style “regression to normality” seems to prevent some obviously problematic behavior relating to knowably impaired judgment
One thought experiment that I think illustrates some of the advantages of cluster thinking, and especially cluster thinking that incorporates regression to normality, is imagining that one is clearly and knowably impaired at the moment (for example, drunk), and contemplating a chain of reasoning that suggests high expected value for some unusual and extreme action (such as jumping from a height). A similar case is that of a young child contemplating such a chain of reasoning. In both cases, it seems that the person in question should recognize their own elevated fallibility and take special precautions to avoid deviating from “normal” behavior, in a way that cluster thinking seems much more easily able to accommodate (by setting an absolute limit to the weight carried by an uncertain argument, such that regression to normality can override it no matter what its content) than sequence thinking (in which any “adjustments” are guessed at using the same fallible thought process).
The higher one’s opinion of one’s own rationality relative to other people, the less appropriate the above analogy becomes. But it can be easy to overestimate one’s own rationality relative to other people (particularly when one’s evidence comes from analyzing people’s statements rather than e.g. their success at achieving their goals), and some component of “If I’m contemplating a strange and potentially highly consequential action, I should be wary and seek robustness (not just magnitude) in my justification” seems appropriate for nearly everyone.
Sequence thinking seems to tend toward excessive comfort with “ends justify the means” type thinking
Various historical cases of violent fanaticism seem somewhat fairly modeled as sequence thinking gone awry: letting one’s decisions become dominated by a single overriding concern, which then justifies actions that strongly violate many other principles. (For example, justifying extremely damaging activities based on Marxist reasoning.) Cluster thinking is far from a complete defense against such things: the robustness of a perspective (e.g., a Marxist perspective) can itself be overestimated, and furthermore a “regression to normality” can encourage conformism with highly problematic beliefs. However, the basic structure of cluster thinking does set up more hurdles for arguments about “the ends” (large-magnitude but speculative down-the-line outcomes) to justify “the means” (actions whose consequences are nearer and clearer).
I believe that invoking “the ends justify the means” (justifying near and clear harms by pointing to their further-out effects) is sometimes the right thing to do, and is sometimes not. Specifically, I think that the worse the “means,” the more robust (and not just large in claimed magnitude) one’s case for “the ends” ought to be. Cluster thinking seems to accommodate this view more naturally than sequence thinking.
(Related piece by Phil Goetz: Reason as memetic immune disorder)
When uncertainty is high, “unknown unknowns” can dominate the impacts of our actions, and cluster thinking may be better suited to optimizing “unknown unknown” impacts.
Sequence thinking seems, by its nature, to rely on listing the possible outcomes of an action and evaluating the action according to its probability of achieving these outcomes. I find sequence thinking especially problematic when I specifically expect the unexpected, i.e., when I expect the outcome of an action to depend primarily on factors that haven’t occurred to me. And I believe that the sort of outside views that tend to get more weight in cluster thinking are often good predictors of “unknown unknowns.” For example, obeying common-sense morality (“ends don’t justify the means”) heuristics seems often to lead to unexpected good outcomes, and contradicting such morality seems often to lead to unexpected bad outcomes. As another example, expert opinion often seems a strong predictor of “which way the arguments I haven’t thought of yet will point.”
It’s hard to formalize “expecting unknown unknowns to be the main impact of one’s action” in a helpful way within sequence thinking, but it’s a fairly common situation. In particular, when it comes to donations and other altruistic actions, I expect the bulk of the impact to come from unknown unknown factors including flow-through effects.
Broad market efficiency
Another way of thinking about the case for cluster thinking is to consider the dynamics of broad market efficiency. As I stated in that post:
the more efficient a particular market is, the higher the level of intensity and intelligence around finding good opportunities, and therefore the more intelligent and dedicated one will need to be in order to consistently “beat the market.” The most efficient markets can be consistently beaten only by the most talented/dedicated players, while the least efficient ones can be beaten with fairly little in the way of talent and dedication.
When one is considering a topic or action that one knows little about, one should consider the broad market to be highly efficient; therefore, any deviations from the status quo that one’s reasoning calls for are unlikely to be good ideas, regardless of the magnitude of benefit that one’s reasoning ascribes to them. (An amateur stock trader should generally assume his or her opinions about stocks to be ill-founded and to have zero expected value, regardless of how strong the “inside view” argument seems.) By contrast, when one is considering a topic or action that one is relatively well-informed and intelligent about, contradicting “market pricing” is not as much of a concern.
This is a special case of “as robustness falls, the potential weight carried by an argument diminishes – no matter what magnitudes it claims – and regression to normality becomes the stronger consideration.”
Sequence thinking seems to over-encourage “exploiting” as opposed to “exploring” one’s best guesses
I expect this argument to be least compelling to most people, largely because it is difficult for me to draw convincing causality lines and give convincing examples, but to me it is a real argument in favor of cluster thinking. It seems to me that people who rely heavily on sequence thinking have a tendency to arrive at a “best guess” as to what cause/charity/etc. ought to be prioritized, and to focus on taking the actions that are implied by their best guess (“exploiting”) rather than on actions likely to lead to rethinking their best guess (“exploring”). I would guess that this is because:
- To the extent that sequence thinking highlights opportunities for learning, it tends to focus on a small number of parameters that dominate the model, and these parameters are often the least tractable in terms of learning more (for example, the value of space colonization). It thus seems often to encourage continued debate on largely intractable topics. Cluster thinking highlights many consequential areas of uncertainty and promises returns to clearing up any of them, leading to more traction on learning and more reduction in “unknown unknowns” over time.
- Sequence thinking has a tendency to make different options seem to differ more in value, while cluster thinking tends to make it appear as though any high-uncertainty decision is a “close one” that can be modified with more learning. I believe the latter tends to be a more helpful picture.
- Cluster thinking tends to have heavier penalties for uncertainty, due to its feature of not allowing the magnitude of a model parameter to overwhelm adjustments for uncertainty. When people are promoting speculative arguments, having to contend with and persuade “cluster thinkers” seems to cause them to do more investigation, do more improving of their arguments, and generally do more to increase the robustness of their claims.
In the domains GiveWell focuses on, it seems that learning more over time is paramount. We feel that much of the effective altruist community tends to be quicker than we are to dismiss large areas as unworthy of exploration and to focus in on a few areas.
Formal framework reproducing key qualities of cluster thinking
Cluster thinking, despite its seeming inelegance, is in some ways a closer match to what I see as the “idealized” thought process than sequence thinking is. On a separate page, I have attempted to provide a formal framework describing this “idealized” thought process as I see it, and how this framework deals with extreme uncertainty of the kind we often encounter in making decisions about where to give.
According to this framework, formally combining different mental models of the world has a tendency to cap the decision-relevance of highly uncertain lines of reasoning – the same tendency that distinguishes cluster thinking from sequence thinking. For more, see my full writeup on this framework, which I have confined to another page because it is long and highly abstract.
Writeup on modeling extreme model uncertainty
To be clear, in this section when I say “engaging in sequence thinking” I mean “working on generating and improving chains of reasoning along the lines of explicit expected-value calculations,” or more generally, “Trying to capture as many relevant considerations as possible in a single unified model of the world.” Cluster thinking includes giving some consideration and weight to the outcomes of such exercises, but does not include generating them. Many of the advantages I name have to do with the tendency of sequence thinking to underweight, or ignore, “outside views” and crude pattern-matches such as historical patterns and expert opinion, as well as “regression to normality”; sequence thinking can make adjustments for such things, but I generally find its method for doing so unsatisfactory, and feel that its greatest strengths come when it does not involve such adjustments.
Sequence thinking can generate robust conclusions that then inform cluster thinking
There are times when a long chain of reasoning can be constructed that has relatively little uncertainty involved (it may involve many probabilistic calculations, but these probabilities are well-understood and the overall model is robust).
The extreme case of this is in some science and engineering applications, when sequence thinking is all that is needed to reach the right conclusion (I might say cluster thinking “reduces to” sequence thinking in these cases, since the sequence-thinking perspective is so much more robust than all other available perspectives).
A less extreme case is when someone simply puts a great deal of work into doing as much reflection and investigation as they can of the parameters in their model, to the point where they can reasonably be assumed to have relatively little left to learn in the short to medium term. People who have reached such status have, in my opinion, good reason to assign much less uncertainty to their sequence-thinking-generated views and to place much more weight on their conclusions. (Still, even these people should often assign a substantial amount of uncertainty to their views.)
There are many times when I have underestimated the weight I ought to place on a sequence-thinking argument because I underestimated how much work had gone into investigating and reflecting on its parameters. I have been initially resistant to many ideas that I now regard as extremely important, such as the greater cost-effectiveness of developing-world as opposed to developed-world aid, the potential gains to labor mobility, and views of “long-term future” effective altruists on the most worrying global catastrophic risks, all of which appeared to me at first to be based on naïve chains of logic but which I now believe to have been more thoroughly researched – and to have less uncertainty around key parameters – than I had thought.
Sequence thinking is more favorable to generating creative, unconventional, and nonconformist ideas
I often feel that people in the effective altruist community do too little regression to normality, but I believe that most people in the world do far too much. Any thinking style that provides a “regression to normality”-independent way of reaching hypotheses has major advantages.
Sequence thinking provides a way of seeing where a chain of reasoning goes when historical observations, conventional wisdom, expert opinion and other “outside views” are suspended. As such, it can generate the kind of ideas that challenge long-held assumptions and move knowledge forward (the cases I list in the immediately previous section are some smaller-scale examples; many scientific breakthroughs seem to fit in this category as well). Sequence thinking is also generally an important component in the formation of expert opinion (more below), which is usually a major input into cluster thinking.
Sequence thinking is better-suited to transparency, discussion and reflection
I generally find it very hard to formalize and explain what “outside views” I am bringing to a decision, how I am weighing them against each other, and why I have the level of certainty I do in each view. Many of my outside views consist of heuristics (i.e., “actions fitting pattern X don’t turn out well”) that come partly from personal experiences and observations that are difficult to introspect on, and even more difficult to share in ways that others would be able to comprehend and informedly critique them.
Sequence thinking tends to consist of breaking a decision down along lines that are well-suited to communication, often in terms of a chain of causality (e.g., “This action will lead to A, which will lead to B, which will lead to outcome-of-interest C if D and E are also true”). This approach can be clumsy at accommodating certain outside views that don’t necessarily apply to a particular sub-prediction (for example, many heuristics are of the form “actions fitting pattern X don’t turn out well for reasons that are hard to visualize in advance”). However, sequence thinking usually results in a chain of reasoning that can be explicitly laid out, reflected on, and discussed.
Consistent with this, I think the cost-effectiveness analysis we’ve done of top charities has probably added more value in terms of “causing us to reflect on our views, clarify our views and debate our views, thereby highlighting new key questions” than in terms of “marking some top charities as more cost-effective than others.” I have often been pushed, by people who heavily favor sequence thinking, to put more work into clarifying my own views, and I’ve rarely regretted doing so.
Sequence thinking can lead to deeper understanding
Partly because it is better-suited to explicit discussion and reflection, and partly because it tends to focus on chains of causality without deep integration of poorly-understood but empirically observed “outside view” patterns, sequence thinking often seems necessary in order to understand a particular issue very deeply. Understanding an issue deeply, to me, includes (a) being able to make good predictions in radically unfamiliar contexts (thus, not relying on “outside views” that are based on patterns from familiar contexts); (b) matching and surpassing the knowledge of other people, to the point where “broad market efficiency” can be more readily dismissed.
In my view, people who rely heavily on sequence thinking often seem to have inferior understanding of subjects they aren’t familiar with, and to ask naive questions, but as their familiarity increases they eventually reach greater depth of understanding; by contrast, cluster-thinking-reliant people often have reasonable beliefs even when knowing little about a topic, but don’t improve nearly as much with more study. At GiveWell, we often use a great deal of sequence thinking when exploring a topic (less so when coming to a final recommendation), and often feel the need to apologize in advance to the people we interview for asking naïve-seeming questions.
In order to reap this benefit of sequence thinking, one must do a good job stress-testing and challenging one’s understanding, rather than being content with it as it is. This is where the “incentives to investigate” provided by cluster thinking can be crucial, and this is why (as discussed below) my ideal is to switch between the two modes.
Other considerations
Sequence thinking can be a good antidote to scope insensitivity, since it translates different factors into a single framework in which they can be weighed against each other. I do not believe scope insensitivity is the only, or most important, danger in making giving decisions, but I do find sequence thinking extremely valuable in correcting for it.
Many seem to believe that sequence thinking is less prone to various other cognitive biases, and in general that it represents an antidote to the risks of using “intuition” or “system 1.” I am unsure of how legitimate this view is. When making decisions with high levels of uncertainty involved, sequence thinking is (like cluster thinking) dominated by intuition. Many of the most important parameters in one’s model or expected-value calculation must be guessed at, and it often seems possible to reach whatever conclusion one wishes. Sequence thinking often encourages one to implicitly trust one’s intuitions about difficult-to-intuit parameters (e.g., “value of space colonization”) rather than trusting one’s more holistic intuitions about the choice being made – not necessarily an improvement, in my view.
I’ve sometimes observed an intelligent cluster thinker, when asked why s/he believes something, give a single rather unconvincing “outside view” related reason. I’ve suspected, in some such cases, that the person is actually processing a large number of different “outside views” in a way that is difficult to introspect on, and being unable to cite the full set of perspectives with weights, returns a single perspective with relatively (but not absolutely) high weight. I believe this dynamic sometimes leads sequence thinkers to underestimate cluster thinkers.
One of my hopes for this piece is to help people better understand cluster thinking, and in particular, how one can continue to make progress in a discussion even after a seemingly argument-stopping comment like “I see no problem with your reasoning, but I’m not placing much weight on it anyway” is made.
In such a situation, it is important to ask not just whether there are explicit problems with one’s argument, but how much uncertainty there is in one’s argument (even if such uncertainty doesn’t clearly skew the calculation in one direction or another) and whether other arguments, using substantially different mental models, give the same conclusion. When engaging with cluster thinking, improving one’s justification of a probability or other parameter – even if it has already been agreed to by both parties as a “best guess” – has value; citing unrelated heuristics and patterns has value as well.
To give an example, many people are aware of the basic argument that donations can do more good when targeting the developing-world poor rather than the developed-world poor: the developing-world poor have substantially worse incomes and living conditions, and the interventions charities carry out are commonly claimed to be substantially cheaper on per-person or per-life-saved basis. However, many (including myself) take these arguments more seriously on learning things like “people I respect mostly agree with this conclusion”; “developing-world charities’ activities are generally more robustly evidence-supported, in addition to cheaper”; “thorough, skeptical versions of ‘cost per life saved’ estimates are worse than the figures touted by charities, but still impressive”; “differences in wealth are so pronounced that “hunger” is defined completely differently for the U.S. vs. developing countries“; “aid agencies were behind undisputed major achievements such as the eradication of smallpox”; etc. The function of such findings isn’t necessarily to address specific objections to the basic argument, but rather to put its claims on more solid footing – to improve the robustness of the argument.
However, there are also times in which I let sequence thinking dominate my decisions (not just my investigations), for the following reasons.
One of the great strengths of sequence thinking is its ability to generate ideas that contradict conventional wisdom and easily observable patterns, yet have some compelling logic of their own. For brevity, I will call these “novel ideas” (though a key aspect of such ideas is that they are not just “different” but also “promising”). I believe that novel ideas are usually flawed, but often contain some important insight. Because the value of new ideas is high, promoting novel ideas – in a way that is likely to lead to stress-testing them, refining them, and ultimately bringing about more widespread recognition of their positive aspects – has significant positive expected value. At the same time, a given novel idea is unlikely to be valid in its current form, and quietly acting on it (when not connected to “promoting” it in the marketplace of ideas, leading to its refinement and/or widespread adoption) may have negative expected value.
One example of this “novel ideas” dynamic is the charities recommended by GiveWell in 2006 or 2007: GiveWell at that time had a philosophy and methodology with important advantages over other resources, but it was also in a relatively primitive form and needed a great deal of work. Supporting GiveWell’s recommendations of that time – in a way that could be attributed to GiveWell – led to increasing attention and influence for GiveWell, which was evolving quickly and becoming a more sophisticated and influential resource. However, if not for GiveWell’s ongoing evolution, supporting its recommended charities would not have had the sort of expected value that it naively appeared to (according to our over-optimistic “cost per life saved” figures of the time). (Note that this paragraph is intended to give an example of the “novel ideas” dynamic I described, but does not fit the themes of the post otherwise. Our recommendations weren’t purely a product of sequence thinking but rather of a combination of sequence and cluster thinking.)
For me, a basic rule of thumb is that it’s worth making some degree of bet on novel ideas, even when the ideas are likely flawed, when it’s the kind of bet that (a) facilitates the stress-testing, refinement, and growing influence of these ideas; (b) does not interfere with other, more promising bets on other novel ideas. So it makes sense to start, run, or support an organization based on a promising but (because dependent on sequence thinking, and in tension with various outside views) likely flawed idea … if (a) the organization is well-suited to learning, refining, and stress-testing its ideas and growing its influence over time; (b) starting or supporting the organization does not interfere with one’s support of other, more promising novel ideas. It makes sense to do so even when cluster thinking suggests that the novel idea’s conclusions are incorrect, to the extent that quite literal endorsement of the novel idea would be “wrong.”
When we started GiveWell, I believed that we were likely wrong about many of the things that seemed to us from an inside, sequence-thinking view to be true, but that it was worth acting on these things anyway, because of the above dynamic. (I am referring more to our theories about how we could influence donors and have impact than to our theories about which charities were best, which we tried to make as robust as we could, while realizing that they were still quite uncertain.) We believed we were onto some underappreciated truth, but that we didn’t yet know what it was, and were “provisionally accepting” our own novel ideas because we could afford to do so without jeopardizing our overall careers and because they seemed to be the novel ideas most worth making this sort of bet on. We expected our ideas to evolve, and rather than taking them as true we tried to stress-test them by examining as many different angles as we could (for example, visiting a recommended charity’s work in the field even though we couldn’t say in advance which aspect of our views this would affect). There were other novel ideas that we found interesting as well, but incorporating them too deeply into our work (or personal lives) would have interfered with our ability to participate in this dynamic.
The above line of argument justifies behavior that can seem otherwise strange and self-contradictory. For example, it can justify advocating and acting to some degree on a novel idea, while not living one’s life fully consistently with this idea (e.g., working to promote Peter Singer’s ideas about the case for giving more generously, while not actually giving as much as his ideas would literally imply one should). When considering possible actions including “avoiding factory-farmed meat,” “giving to the most apparently cost-effective charity,” etc., I am always asking not only “Does this idea seem valid to me?” but “Am I acting on this idea in a way that promotes it and facilitates its evolution, and does not interfere with my promotion of other more promising ideas?” As such, I tend to change my own behavior enough to reap a good portion of the benefits of supporting/promoting an idea but not as much as literal acceptance of the idea would imply. I have a baseline level of stability and conservatism in the way I live my life, which my bets on novel ideas are layered on top of in a way that fits well within my risk tolerance.
Promoting a sequence-thinking-based idea in a cluster-thinking-based world leads to examining the idea from many angles, looking for many unrelated (or minimally related) arguments in its favor, and generally working toward positive evolution of the idea. The ideal, from my perspective, is to use cluster thinking to evaluate the ultimate likely validity of ideas, while retaining one’s ability to (without undue risk) promote and get excited about sequence-thinking-generated ideas that may eventually change the world.
For one with few resources for idea promotion and exploration, this may mean picking a very small number of bets. For one who expects to influence substantial resources – as GiveWell currently does – it is rational to simultaneously support/promote work in multiple different causes, each of which could be promising under certain assumptions and parameters (regarding how much value we should estimate in the far future, how much suffering we should ascribe to animals, etc.), even if the assumptions and parameters that would support one cause contradict those that would support another. When choosing between causes to support, cluster thinking – rather than choosing one’s best-guess for each parameter and going from there – is called for.
Comments
Thanks for the excellent post!
How do you incorporate specific and confident “adjustments” into your cluster thinking process, such as the Synsepalum dulcificum example I gave in Model combination and adjustment?
Great post, I expect it will be referenced often in the future.
I think getting results extremely favoring one cause or intervention by many orders of magnitude in expected value often isn’t so much a problem with sequential thinking, as of only exploring the maximum potential gains for some causes but not others.
For example, if one thinks that political activity is generally more leveraged than paying for direct interventions, then evaluating cause A against cause B with the assumption that cause A interventions will be political but cause B interventions won’t can distort. Similarly, attending to impact on future generations when assessing cause X but not cause Y can produce a big skew.
But if one factors out such common influences and makes the maximalist case for plausible top picks in EV terms they won’t be ten orders of magnitude apart (in absolute value, at least, they may differ in expected sign). Then one can apply all sorts of particular arguments and lines of evidence to weight those cases for high value against one another.
There’s also the general problem of people not flagging whether they are talking about values within a model or all-things considered (perhaps because of confusion themselves, perhaps to avoid distraction in a piece, perhaps for rhetoric).
I think sequence thinking is perhaps a bit more efficient.
How do factor graphs fit into either the sequence or cluster thinking paradigms?
“I’ve suspected, in some such cases, that the person is actually processing a large number of different “outside views” in a way that is difficult to introspect on, and being unable to cite the full set of perspectives with weights, returns a single perspective with relatively (but not absolutely) high weight.”
I like this point. But it also looks very similar to rationalisation, where the response to having one argument refuted is to simply raise a new one, regardless of how important each was in forming the original conclusion.
Do you have any feeling for how to recognise the difference between rationalisation and sophisticated assimilation of perspectives in your own mind, or how to detect & interact with it in others?
Thanks for the comments, all.
Luke: the sort of “adjustment” you describe could be integrated into existing models (“I usually like Thai food, though these aren’t usual circumstances”) and/or represented as its own model (“I expect miraculin to change my tastes, and I’m quite confident in this, enough to outweigh the predictions of other models”). Or it could be integrated using a different process entirely; I meant this post to contrast two simplified models of thinking rather than to propose any particular “complete” formula for reaching decisions.
Carl: I agree that the method you suggest for comparing causes (e.g., making sure to include the maximalist case for all possibilities) will lead to a degree of convergence, but I think it’s hard to say just how much convergence. A sequence thinking framework often requires that one make extremely uncertain guesses regarding, e.g., the flow-through effects of bednets or the probability of existential risk aversion, and variation in such guesses alone (even within the range of defensibility) can overwhelm other important factors in the final estimate (perhaps not by ten orders of magnitude, but by enough to cause high sensitivity to these guesses). I also think it’s worth noting that a “cluster thinking” style led me to be skeptical of various claims about massive differences even before arguments about flow-through effects were on my radar. I think cluster thinking often helps one reach reasonable conclusions without having to get as many things right (compared to sequence thinking); it allows more margin for error.
Michael Miller: my feeling is that cluster thinking is generally more efficient, in the sense of reaching reasonable conclusions with less effort/time/mental resources. Good sequence thinking requires building a careful model of the world, and taking great care not to get any particular piece badly wrong or omit a major consideration. It also tends to involve high-stakes guesses about uncertain parameters. Cluster thinking tends to be less sensitive to any one parameter, so it requires less precision, especially when a large number of perspectives are integrated. Note that the two are not distinguished by what information they use but by how they weigh different considerations against each other; I think “find which way each consideration points and limit the weight of the more uncertain ones” tends to be easier than “figure out how to integrate each consideration within a unified model.” Possibly for these reasons, cluster thinking seems to me to be more common, especially among those who have limited time to think things through.
I’m not familiar with factor graphs, so can’t answer your other question.
Helen Toner: the main thing I’d say is that detecting the difference between a justified “cluster-style” view and an unjustified, rationalized view is often not easy. I often mistake one for the other, in both directions. I’d like to see sequence thinkers recognize this difficulty more and be slower to jump to the conclusion that a particular person’s position is not worth considering, especially when there are other reasons to believe that person is credible. I usually require a lot of thought and information to decide one way or the other, and in the meantime I put an intermediate amount of weight on the person in question’s view.
Great post! This dyad of reasoning styles seems to show up in many domains, including artificial intelligence ( https://en.wikipedia.org/wiki/Neats_vs._scruffies ), economics (central planning vs. free markets), and politics (authoritarianism vs. democracy, http://www.amazon.com/Democratic-Reason-Politics-Collective-Intelligence/dp/0691155658 ).
Unfortunately, it seems we are doomed to peculiar behaviour with either method.
Let’s say we are making a cluster-thinking comparison of Charity A, which focusses on raising the likelihood of space colonisation, against Charity B, which helps people but has no expected impact on space colonisation. At the time, we believe the solar system is the only thing that exists. Nonetheless, thanks to the great expanse of the solar system, Charity A gets maximum marks (5 out of 5) for possible expected impact. But when all ‘clusters’ are combined into an overall judgement, it still comes out behind Charity B.
Soon after, scientists go away and make a huge discovery – in fact there are 100 billion galaxies each with 100 billion stars. The universe is 10^22 larger than we thought! We naturally decide to go back and reassess Charity A against Charity B.
However, because we are boxing in the importance of the observation ‘expected value is high because space is huge’, Charity B remains better by about the same margin. Despite what seems like an extraordinary increase in the importance of leaving Earth – however valuable we thought it was before – our decision remains the same.
On the other hand, someone who exclusively uses sequence thinking is vulnerable to Pascal’s mugging and can easily become ‘fanatical’ – impossible to convince that anything else is important no matter how many other considerations can be brought to bear.
I value this post for laying out the pros and cons of each style. Because there are so many, it’s easy to agree with the list of the considerations but still reach a different conclusion. Personally, I am most worried about scale insensitivity, both outside of the effective altruism movement and inside it. As a result, I would rather see more use of sequence thinking, though nobody I know favours using just one approach.
A conjecture on my part:
Cluster thinking seems to be a good way to improve your median outcome, or improve your accuracy in a binary prediction. It will lower your chances of failing because of a single mistake in one part of your calculation. But when it fails, it can fail hugely, for failing to give sufficient weight to an insight that was much more important than others. In particular, its scope insensitivity means that while we should expect it to fail less often, its failures will be concentrated in the scenarios that matter most (those where your potential impact was highest). As a result, it may still lower your average/expected outcome.
Robert, I do agree that each style of thinking leads to peculiar behavior; I see each as only an approximation of the ideal thought process. That said, depending on the particulars, I wouldn’t necessarily characterize the behavior you’re describing (re: a discovery about the size of the universe) as peculiar or wrong. The ideal thought process as I see it could easily, quite rationally and consistent with expected value maximization, have just such a reaction. More at the technical supplement to the post.
For my part, I would disagree with the claim that cluster thinking tends to optimize for the median but not the expectation. Many of the examples in this post (e.g., the one about impaired judgment as well as the bit about ends vs. means) were meant to imply this by showing how sequence thinking leads to things that seem “clearly wrong/irrational” and not just “wrong most of the time.” The technical supplement also spells out why I think this.
I think sequence thinking becomes more valuable when it interacts in the appropriate way with the marketplace of ideas. I think the best combination is sequence thinking for exploration and cluster thinking for evaluation, as outlined at the end of the post, and I think this combination is likely to do well at addressing scope insensitivity issues, if a subset of people choose to explore seemingly high-priority issues and work on strengthening the case for them.
A problem with Cluster thinking is that what the system has is not multiple perspectives but the system’s beliefs about other perspectives. In sequence thinking the system can hold this set of beliefs:
{belief_1 belief_2 belief_3 belief_4 belief_5 belief_6}
In cluster thinking what we have is
{{[holds system belief_1 perspective_1]
[holds system belief_2 perspective_1]
…
[holds system belief_n perspective_1]}
…
{[holds system belief_1 perspective_k]
[holds system belief_2 perspective_k]
…
[holds system belief_n perspective_k]}}
It isn’t really other perspectives it is what the system believes other perspectives to be, which may or may not be accurate.
Thanks for this post and your attempt at a more technical description of a model. I appreciate both, in different ways.
I like the way that this post provides an explicit discussion of the trade-offs between these different style of approaches. I had been planning to write a piece on the pros and cons of explicit models (similar to what you have called sequence thinking), as I think it is useful to have this more widely known and discussed. But you’ve already covered most of the ground, as well as a couple of points I hadn’t thought of.
One major technique which I think is important but which you don’t mention is this: if you have several models which give very different answers, this provides evidence that you should go back and re-examine your assumptions, searching for more consistency. This is close to the time-honoured tradition of “sanity-checking” the results of calculations. It can be abused, and you shouldn’t put too much confidence in the fact that several models agree if you had to massage them to do so, but it is one of the more important tools available to us in trying to integrate these different types of thinking.
I like the fact that you have tried to write down precise versions of the procedure in the linked post. As you remark, one of the general virtues of sequence thinking is that it allows more precise discussion and refinements, in a way that cluster thinking does not. Unfortunately, there are a number of unclarities and what appear to be mistakes in that write-up (for example in the worked example you appear to conflate expected value and median, when these diverge in important ways in the example). I take the post as having value in gesturing towards the sort of way that you think we should be exploring this area, and the type of analysis you would like to see more of, but I worry that it is too far from working at the moment to be worth trying to use. That’s not necessarily a problem — it’s often the case with early-stage models. But does it agree with your thinking? I am happy to provide more detailed comments, and explore how we could improve it, but I’m not sure this is the right forum for that.
Holden, thanks for writing this interesting post and taking the time to explain your approach to decision-making in detail. As informally outlined, I think the ideas here have a lot of merit. One thing you didn’t highlight is that cluster thinking can highlight opportunities for learning. When two seemingly important perspectives disagree, it often indicates that there is something important to be learned about one of the perspectives or how it should be applied. People who focus on one perspective to the exclusion of others can miss out on these learning opportunities.
Some concerns/additional thoughts about cluster thinking:
(A) I think it was interesting to see your response to Helen’s concern about rationalization in cluster thinking. It is hard to tell whether some other cluster thinker is rationalizing, but I’m at least as concerned about my own rationalization. If I have 10 arguments for my view, unless someone painstakingly refutes a few of my arguments at once (a level of attention that is rare), there’s a concern that I’ll say, “Well, sure you overcame one of the arguments for my view, but I have nine other arguments.” Going back to my comment above, it would be important for a good cluster thinker to revisit their other arguments whenever one of them gets a good pummeling.
(B) Another aspect of your informal framework (presented in the main post) that seems problematic is the focus on making decisions on the basis of certainty-weighted perspective, to the exclusion of importance-weighted perspectives. Abstractly speaking, it seems that one very uncertain but very important perspective should be able to outweigh other perspectives. More concretely, if the arguments against X are all of the form “doing X would be weird,” “doing X would be rude,” and “doing X would violate company policy,” but the arguments in against doing X are of the form “I can imagine a way in which doing X would cause a nuclear meltdown,” I want a decision procedure that allows me to not do X.
(C) Perhaps related to (B), suppose you’re ranking giving opportunities in terms of intervention quality, organization quality, room for more funding, and monitoring and evaluation (toy example). Should we give to a highly transparent opera charity run by great people that has lots of room for more funding rather than a vaccination charity that is just average (or maybe pretty good) along the other dimensions? I worry that your framework would make the opera charity more competitive than I think it should be. I guess the devil would be in the details.
A couple of issues/questions regarding the technical framework you’ve developed:
I believe that your formal framework (presented here: https://www.givewell.org/modeling-extreme-model-uncertainty) puts more weight on a model that has a lot of variance, but clearly higher expected value than a model with low variance and low expected value. And sometimes that seems like a mistake.
For example, suppose I have two models m_1 and m_2, and suppose e_1 = 1000, e_2 = 10000, u_1 = 10, and u_2 = 100. Using your formula, I get that the expectation of taking the bet is:
[1000/(10^2) + 10000/(100^2)] / [1/(10^2) + 1/(100^2)] = 1089 (approximately)
So my overall expected value of this bet is dominated by m_1, but intuitively it should be dominated by m_2, which is saying the impact is almost definitely greater than 1000. Put another way, your framework sees a huge difference between switching the u_i on the two models above, but I don’t think that’s justified.
A second issue that I only realized after reading this a few times was that the models you’re describing aren’t necessarily incompatible ways of looking at a question; instead they are sort of complementary. This makes sense in the informal framework—and reinforces the “sanity check” phenomenon that Owen and I describe above. However, it does seem fairly different from what I would think is the typical Bayesian way of thinking about uncertainty over multiple models. Usually, the models/hypotheses one would consider would be mutually exclusive rather than complementary. Operating in that kind of framework, my first instinct for calculating overall expected value would be to have a probability distribution Pr over the m_i and look at:
Pr(m_1)e_1 + … + Pr(m_n)e_n
Can you say more about why you didn’t approach the problem in this way? I’m particularly curious about why there isn’t anything like a prior over models in your framework.
To clarify a bit: I generally agree with you about the strengths and weaknesses of cluster thinking, and your diagnosis that the EA crowd is more sequence-thinking oriented than lots of people. I’m more concerned about the formal model than the informal model. With the informal model, my main concern is that using it leads to rationalization.
I think Nick makes an important distinction in how we may think about model combination. Two natural and opposed approaches are:
(1) Assume that one and only one of the models is correct, and update in a Bayesian manner.
(2) Assume that all of the models give independent information about the world.
Of course in fact neither of these can be right. It is certain that none of our models is an entirely accurate description of the world. But if we have two models which tell us radically different things, each with a similarly high level of confidence, the appropriate conclusion is not that the true value lies somewhere in the middle, and each was unlucky with the amount of noise; rather we should think that one of the models is badly wrong.
Questions of how to combine models are studied in machine learning. They’re mostly looking at classification tasks, and they don’t appear to have any methods we could port over to estimation tasks directly, but it seems worth being aware of the area, as there may be some shared insights:
http://en.wikipedia.org/wiki/Ensemble_learning
The other problem with Cluster Thinking is (I believe) that the mind seeks coherence, and while it may hold beliefs about what others believes (other points of view) it generally accepts or rejects these points of view when it comes to reasoning. Paul Thagard has done some research on Coherence in General, and Jean Piaget did some work on the assimilative and accommodative processes involved in accepting new information and changing one’s opinion. In fact, in his book, The Development of Thought, (1978, 1985) he states that “A system of assimilation tends to feed itself” which I take to mean that a cognitive system will seek to coherently integrate new information, accepting information compatible with what it already believes and rejecting information that is incompatible with its beliefs.
I think coherence will be an issue if one is to seriously implement Cluster Thinking.
By the way, I have no preference to Sequence or Cluster thinking, I think they are both new concepts which give another lens to mental models. These are just my observations.
Great post! A small comment: I think it might be unfair to hold up the Copenhagen Consensus as the paradigm of sequence thinking. While they give cost-benefit analysis a significant role in their evaluations, it’s always followed by a qualitative discussion of the strengths and weaknesses of the model. Moreover, their overall ranking is based on the judgement of a panel of leading economists, using the CBA as an input, but often not ranking in order of CB ratio. This looks like cluster thinking to me!
Thanks for the further comments, all.
Michael Miller: I’m still not following you. I didn’t mean to define cluster thinking via “forming models of what other people think” but rather “using different mental models with different implications.” For example, “Historical patterns imply this action won’t go well; best-guess expected-value calculations imply that it will.”
Owen, thanks for all the thoughts. A few responses:
Re: complementary vs. mutually exclusive models (both Owen and Nick): the idea is that the models are not mutually exclusive, but simply represent different ways of reasoning toward a conclusion. I believe that in real-world decision-making (as well as in many algorithmic prediction systems), it is more common to have multiple ways of looking at a problem (which be correlated and overlapping) than it is to be deciding between multiple mutually exclusive models of the world, one of which is strictly best. If doing the latter, I agree that Nick’s approach would be right, but for the former case, it is important to use all the information we have about the different levels of model uncertainty for different models, and the approach I’ve proposed seems to do that better than simple probability-weighting of conclusions.
Nick:
On using mental models. Lets take a real example. Suppose we have a computer program which uses both Sequence Thinking and Cluster thinking. How would we represent their models. I’ll use The Premise Language for shorthand (see http://premiseai.tumblr.com):
; Sequence Thinking
(let believe :who :what)
(let exists :what)
(let not-exists :what)
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
; Cluster Thinking
(let viewpoint :which :what {})
[viewpoint :which Buddhism
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
[viewpoint :which Atheism
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
[viewpoint :which Christianity
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
[viewpoint :which Islam
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
[viewpoint :which Judaism
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
[viewpoint :which Agnosticism
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
[viewpoint :which Hinduism
:what {
[believe :who Buddhists :what [not-exists :what God]]
[believe :who Atheists :what [not-exists :what God]]
[believe :who Christians :what [exists :what God]]
[believe :who Muslims :what [exists :what God]]
[believe :who Jews :what [exists :what God]]
[believe :who Agnostics :what (or [exists :what God]
[not-exists :what God])]
[believe :who Hindus :what [exists :what Gods]]
}]
In both cases Sequence thinking and Cluster thinking it is the cognitive system holding the belief sets. Ergo, it is the cognitive system’s belief set about another party’s beliefs, and not the other party’s beliefs themselves.
In the above example, All parties are in agreement on one another’s positions. However, suppose one party holds a viewpoint which differs from the cognitive system’s representation of their viewpoint. For example, (Let’s use the Agnostics since they don’t care):
[viewpoint :which Agnosticism
:what {
[believe :who Buddhists :what [exists :what God]]
[believe :who Atheists :what [exists :what God]]
[believe :who Christians :what [not-exists :what God]]
[believe :who Muslims :what [not-exists :what God]]
[believe :who Jews :what [not-exists :what God]]
[believe :who Agnostics :what [exists :what God]]
[believe :who Hindus :what [not-exists :what Gods]]
}]
Suppose the cognitive system held the above beliefs about agnosticism. It would be wrong from our perspective. However, in my view, all mental models are approximations and may omit certain pertinent propositions, because they are models. And we may get another’s viewpoint wrong. So the cognitive system winds up using its model of other viewpoints for reasoning, however accurate or inaccurate. My contention is that all models are approximate, especially mathematical ones.
That was my point.
I’m coming at this from an implementation perspective.
How can you use a model if you don’t form it first. Where do the models come from then?
Moreover I’m coming at this from a constructivist implementation perspective. My approach is to form models within an artificial cognitive system. One has to form the model before it can be used for reasoning or decision making.
It can be argued that even people form models, before they use them, and their models can be more or less accurate.
@Owen Cotton-Barratt: I’m late to the party on this one, but there are some ensemble methods for regression in machine learning. Gradient boosted regression trees, regression forests, and AdaBoost are three such methods (AdaBoost is typically used for classification but has been generalized to regression contexts also). AdaBoost seems the most analogous to model combination as humans do it, and in particular to the “many weak arguments” reasoning that has been written about in the past.
Ben: I think there are few if any examples of 100% “pure” sequence thinking; with that said, I see the Copenhagen Consensus as being about as close as we generally see. All of the analysis and reasoning they publish is focused around their cost-benefit ratio analysis, and generally speaking, a significant portion of the discussion is devoted to discount rates, to which the results are often highly sensitive. It’s true that at the end of the process, there is a panel that changes some of the conclusions, but the reasoning behind the changes does not seem to be disclosed, and the cost-effectiveness estimates seem to be at the center of the process.
Holden, thanks for the response. Here are my comments on the formal model and worked example. I’ll separate them into different posts for legibility.
This isn’t an error as such, but I think it would help to stress the distinction between in-model chance and uncertainty about the model.
It isn’t clear to me that it’s right to pull all of the uncertainty about the model into F_i. I quite like keeping a distinction between uncertainty about model parameters (which you deal with via F_i or similar), and uncertainty about the possibility that the model is fundamentally incorrect.
One reason to do this is that there is no clean line between what constitutes one or two models.
* We might have two approaches for modelling a variable, A and B.
* Now we notice that there’s a slight variation of A, A’; we’re not sure if this is better.
* Taking the geometric mean of A, A’, and B may give quite a different answer to taking the geometric mean of A and B, although really we’ve gained very little information.
Note that A and A’ are giving far from independent information. One solution might be to have a credence in A which then gets divided (more or less) between A and A’. Better yet might be keeping track of independence.
The model combination method of taking the geometric means has weak justification. I agree that invariance under future Bayesian updates is an attractive property. This is accounting for the fact that the models may not be independent.
However, it’s claimed “The geometric mean is the only way to do this while also treating all the models symmetrically.” Is there a proof of this, or a reference?
It seems like the method of assuming that exactly one of your models is correct will also work, so long as you update your credences over which is the correct model as well as updating your model. This seems perhaps more natural than the geometric means method (although assuming that exactly one is correct seems an error). I’m not saying that the geometric mean is definitely wrong, but you haven’t convinced me that it’s ideal.
Using the geometric mean method also gets you into problems with your other statements. You said:
“Using non-Gaussian distributions and/or other combination methods would complicate the actual formula for calculating overall expected value, but in general would not change the qualitative picture: when combining two probability distributions, it is robustly true that a “fatter” distribution will cause less of an update from the distribution it is combined with, and that a sufficiently “fat” distribution (approximating constant probability density) will cause negligible such updating regardless of where its midpoint lies.”
With the method of taking geometric means rather than multiplying together the probabilities, the final claim is false. If you take the geometric mean with of a sensible distribution with one fat enough to be essentially flat over the area in question, you’ll end up close to taking the square root of the probability density of your original distribution (and renormalizing). With a distribution with the right kind of tail, this could make it much more thick-tailed and shift the mean up considerably (indeed arbitrarily far).
For example if the probability density falls off with an inverse square of the value, then its square root is an improper distribution, so precisely how fat the fat distribution is carries a lot of weight.
On the way your example uses medians and calls them expectations, you said: “On means vs. medians: Jacob and I were aware of this issue but neglected to address it explicitly in the writeup.”
I’m afraid I can’t really see how this is defensible. It means that the example is misleading people instead of clarifying. You might counter that it helps to demonstrate the point you think is important, but it really gives people an illusion that they understand what’s going on. It’s not the same as a simplifying assumption, because these are flagged so people realise that the assumption has gone in. Instead it’s been slipped in silently.
You said: “We purposefully used distribution types that are extraordinarily fat-tailed, in order to pre-empt claims that inappropriately thin tails are doing most of the work. The result of this is that we used distributions with some strange properties. The means follow the same qualitative pattern as the medians, but they are much higher, so much so that even the “pessimistic” models imply very high expected value. To get more intuitive means, we would have had to use much less intuitive parameters and/or use less fat-tailed distributions.”
I think there were four reasonable approaches here:
1. Produce a toy example which uses thin-tailed distributions.
2. Produce an example with thick-tailed distributions, and explain why the mean values are higher than expected. (This is a key feature of heavy tailed distributions which deserves to be more widely understood!)
3. Do both of the above – have a quick proof-of-concept example with the thin-tailed distributions, and an example to show the effects you get with thick-tailed distributions.
4. Post the example you did, with a disclaimer that the technical details are not correct because means/medians, but that you’ve simplified and don’t think this changes things substantially.
Actually, do you have the numbers that come out of using expectations in your worked example? I would have loved to see these. I haven’t put this into any kind of stats package, but I think that the geometric mean approach will mean that expectations are rather more sensitive to uncertain models than the medians, for the reasons I outlined in the comment above.
@Ben Kuhn: Thanks for the search terms! I suspected there was something more relevant out there.
What we’re trying to do here isn’t quite a regression task (we generally don’t have training data that we believe; we’re using some other process for weighing the relative merits of different models), although it’s closer to that than to a classification problem.
From some quick reading, it seems that the analogous procedure to Adaboost in our context would be to weight the various models according to how much we believe them, take a simple weighted average of the corresponding distributions F_i, and then use the median value of this combination. Does that sound right to you?
@Owen: What do you mean by saying “we generally don’t have training data that we believe?” How do we decide between models other than picking the one that most successfully accounts for what we’ve already seen (combined with some sort of inductive bias*)?
* See http://en.wikipedia.org/wiki/Inductive_bias
@Ben: I suppose I mean that the inductive bias, or something like it, is doing a lot of work. We often build causal models, where we have data to support the different parts of the model, but not for the thing as a whole (because we’re trying to predict something that we’ve never been able to measure).
What are some instances in which expert opinion is NOT an input into cluster thinking?
@Owen:
Here’s a reference for the geometric mean derivation:
http://projecteuclid.org/euclid.aos/1176349934
The formalization of the property claimed is “external Bayesianity”; my original claim, which Holden cites in his post, is true except in some very degenerate cases.
The latter part of your comment is correct, though. We initially were taking the product rather than the geometric mean, and forgot to modify that paragraph to account for the change.
Thanks for the further comments, all.
Owen:
almondguy: cluster thinking could rely on a number of historical patterns or general heuristics without incorporating expert opinion. For example, I might believe team X is likely to win a baseball game because it wins more than half its games, its opponent loses more than half its games, its starting pitcher has a winning record, and it’s playing at home (and the home team wins more than half the time); that’s 4 “outside views” without detailed conceptual models or expert opinion incorporated.
@ Jacob: Thanks for the reply and the link.
Re. the geometric mean derivation, it seems to me that a key assumption in that paper is that the pooling operator itself isn’t updated in light of new information. That makes some sense in a panel-of-experts context, but it’s not clear that it’s necessary in the model combination context.
For instance the naive “Bayesian updating” of different models where you also update your credence in which one of them is correct — this seems to have the analogous property to ‘external Bayesianity’, but does it by altering the weighting that the different models get in the combination.
@ Holden: thanks for those replies. I agree that covariance may well be the best way forward for dealing with model covariance (although I don’t think I know exactly how to do it).
The means/medians in the worked example are particularly interesting. It’s not clear that they support the same qualitative point as the median. When you made the first model wildly more optimistic and also wildly more uncertain, that lowered the median but over doubled the mean.
Owen: the key qualitative point, in my view, is what one obtains after model combination in different cases. When using either the mean or the median, a wildly more optimistic and uncertain model has little effect on the result after combination, while a more robust model greatly increases the expected value.
BTW, Jacob pointed out to me that using the geometric mean assumes a “high degree of dependence” between models; the independence assumption is only implicit when using the product method of combination.
I really like this post.
Some technical comments are below. (I wouldn’t be at all surprised if I’ve missed/misunderstood something/made a mistake somewhere, but for simplicity I’m dropping that caveat throughout. Also, let me know if you want me to write up the details.)
1. In the technical supplement, you discuss the estimate where you weight your various models by the reciprocals of their variances. If your models m_i are uncorrelated (or in practice sufficiently well approximated as uncorrelated) then that is the optimal linear weighting of your models in the sense of minimizing variance of the combined estimate. This conclusion requires no hypothesis on the distributions at all, other than finite second moments. In particular, it does not require a Gaussian assumption.
2. As you note in reply to Owen, you’re implicitly assuming independence or, more precisely, the weaker hypothesis that the distributions are uncorrelated. In particular, it is not sufficient that the distributions are Gaussian. It would be good to make this explicit. So e.g. the sentence “If all the F_i = N(e_i,u_i)… the resulting probability distribution has expectation…” is false as written (i.e. it depends on that implicit assumption).
3. To generalize your formula to support correlated models, let V be the [nxn] covariance matrix of the n models. So your formula handles the case that V is diagonal. Let e=(e_1,…,e_n) be the n-vector of expected values. Then the analogous formula is the normalization of the sum of the entries of the vector V^{-1} e. If we write (W_ij) for the entries of V^{-1}, the expected value of the variance minimizing linear weighting of the models is given by (\sum_{i,j} W_ij e_j) / (\sum_{i,j} W_ij). The only hypotheses required here are that V is non-singular and all second moments are finite.
Comments are closed.