Counterfactual Thinking: Think like a robot can’t

A counterfactual is a “what-if” scenario in which we consider what could have been or what could happen, rather than just what actually happens. What if I had never met my partner? What if the US had never invaded Iraq? How might our customers react to a price increase?

Thinking in counterfactuals is a quintessential exercise in human creativity. By imagining what is possible or what could have been under different circumstances, we can unlock new solutions, better evaluate past decisions, and uncover deeper explanations of the world than we could by analyzing only what happens.

No, AI isn’t about to take over the world

To appreciate the power of counterfactual thinking, consider the capabilities and limitations of artificial intelligence (“AI”) technologies, specifically generative chatbots such as ChatGPT.

These chatbots leverage intricate machine learning (“ML”) networks to simulate human conversation. For instance, if we feed a chatbot millions of lines of dialogue from the Internet—including questions, answers, jokes, and articles—it can then generate new conversations based on patterns it has learned and convincingly mimic human interactions, using mindless and mechanical statistical analysis.

However, despite exponential improvements, AI chatbots struggle mightily with genuine counterfactual thinking. Unlike humans, their ability to explain the world or dream up entirely new scenarios is constrained by their explicit programming.1 They cannot disobey, or ask themselves a different question, or decide they would rather play chess. Instead, they assemble convincing responses to prompts by reassembling their training data. Until we can fully explain how human creativity works—a milestone we are currently far from reaching—we won’t be able to program it, and AI will remain a remarkable but incomplete imitation of human-level creative thought.2

“Becoming better at pretending to think is not the same as coming closer to being able to think. … The path to [general] AI cannot be through ever better tricks for making chatbots more convincing.”

David Deutsch, The Beginning of Infinity (2011, pgs. 157-158)

Contrary to the popular AI “doomsday” paranoia, this idea paints a hopeful picture. While digital systems continue to automate routine tasks3—such as bookkeeping, analytics, manufacturing, or proofreading—our counterfactual abilities enable us to push the boundaries of innovation and creativity, with AI as our aid, not our replacement.

We should, therefore, dedicate ourselves to solving problems that require our unique creative and imaginative powers—to design solutions that even the most powerful AI cannot. We did not invent the airplane, nuclear bomb, or iPhone by regurgitating historical data. We imagined good explanations for how they might work, then we created them!

Error-correcting with counterfactuals

Over countless generations of ​genetic evolution​, our brains have developed remarkable methods of learning. The first, more “direct” method of learning is through rote trial-and-error, which can help monkeys to figure out how to crack nuts or chess players to devise winning strategies.

The second, which is a human specialty, is through simulation—using hypothetical scenarios to evaluate potential solutions in our minds. Because blind trial-and-error can sometimes lead to tragedy, an effective simulation is often preferable, and sometimes necessary.4 This strategy is evident in flight training, surgical practice, war games, nuclear reactions, and natural disasters.

In fact, counterfactuals are crucial to the knowledge creation process. It always starts with creative guesswork (counterfactuals) to imagine tentative solutions to a problem, followed by criticism of those hypotheses to correct or eliminate bad ones. We evaluate a candidate theory by assuming that it is true (a counterfactual), then following it through to its logical conclusions. If those conclusions conflict with reality, then we can refute the theory. If we fail to refute it, we tentatively preserve it as our best theory, for now.

Let’s try it out with a problem of causality. We’ve all heard that “correlation does not imply causation,” but our instinct to quickly explain things in terms of linear cause-effect narratives still misleads us. To truly establish causality, we must turn to counterfactual reasoning. In general, we should conclude that an event A causes event B if, in the absence of A, B would tend to occur less often.5

Consider the claim that vaccinations cause autism in children, a tempting headline for the conspiracy-minded. Following our counterfactual logic above: if this theory were true, we should expect autism to be more common among vaccinated children. However, the evidence suggests that rates of autism are essentially equivalent between vaccinated and unvaccinated children.6 In reality, vaccination administration and the onset of autism simply happen (coincidentally) around the same age.

“The science of can and can’t”

A thrilling application of counterfactuals comes from physics, where theoretical physicists David Deutsch and Chiara Marletto are pioneering “constructor theory,” a radical new paradigm that aims to rewrite the laws of physics in terms of counterfactuals, statements about what is possible and what is impossible.

Traditional physical theories, such as Einstein’s general relativity and Newton’s laws, have focused on describing how observable objects behave over time, given a particular starting point. Consider a game of billiards. Newton’s laws of motion can predict the paths of the balls after a cue strike, based on initial positions and velocities. However, it remains silent on why a ball can’t spontaneously jump off the table or turn into a butterfly mid-motion.

Constructor theory goes a level deeper. Instead of merely predicting the balls’ trajectories, it might explain why certain trajectories (such as flying off as a butterfly) are impossible given the laws of physics. By focusing on the boundaries of what is possible, counterfactuals enable physicists to paint a much more complete picture of reality.

Using counterfactuals, constructor theory has offered re-formulated versions of the laws of thermodynamics, information theory, quantum theory, and more.7

***

In summary, we should embrace our beautifully human capacity to imagine worlds and scenarios that do not exist. Counterfactual thinking can spark creativity and innovation, help us reflect on the past, enable better critical evaluations, and even reimagine the laws of physics. Plus, it is the best defense we have against automating ourselves away!

What world will you dream up next?

The Normal Model: Ringing the bell… carefully

The probability distribution of a random variable defined by a standard bell-shaped curve is known as the normal distribution, which has a meaningful central average (or “mean”) and increasingly rare deviations from the mean. It is a symmetrical distribution in which we expect any random observation to be equally like to fall below or above the average.

Many variables in the real world are normally distributed, or approximately so. A few well-known examples include human height, birth weight, bone density, cognitive skill, job satisfaction, and SAT scores.1

The normal distribution is one of the most powerful statistical concepts to understand, as the properties of normally distributed processes can be modeled and analyzed with well-established statistical methods and readily available software. These tools allow us to make judgments, inferences, and predictions based on data and to quantify the risk around our hypotheses.

If we are not careful, however, the normal model can lead us into grave errors (including the 2008 financial crisis, which we will explore). But first, we should note that although normally distributed phenomena are common in nature, many processes, especially those taking place in complex systems (such as economies), follow distinctly non-normal distributions and often feature a long right-hand “tail” (such as the distribution of individual incomes). We cannot just blindly use the normal model without understanding the distribution of the underlying data and adjusting if necessary.2

However, that is not the full story.

Bells on bells on bells

The normal model is not, in fact, limited to use only with underlying data that is normally distributed.

Consider a set of source data of 1,000 flips of a coin, where heads and tails each occurred exactly 500 times (as we would expect from a fair coin). This data is clearly not normally distributed (see below). So, are the typical statistical tools useless?

The answer is no.

To see why, let’s take a random sample of, say, 100 flips from the source data, and calculate the fraction of flips that are heads (the sample mean). We expect this percentage to be 50% because that is the population mean of the source data, but, of course, there will be random variation in our sample such that not every sample mean will be exactly 50%.

If we continue to take random samples of 100 flips—say, 500 such samples—and plot the distribution of all the sample means, the distribution of the sample averages will be approximately normal, even though the underlying population data is not normal! This phenomenon is known as the central limit theorem.

Incredibly powerful in statistics, this maxim explains why bell-shaped distributions are so useful, even though source data sets are rarely perfectly normally distributed: the sample means of just about any process subject to some degree of randomness are eventually normally distributed. This occurs because observed data often represent the accumulation of many small factors, such as how our physical traits (e.g., height, birth weight) emerge from the results of millions of random “copy” operations from our parents’ DNA.3

The central limit theorem enables a wide range of statistical analysis and inference, allowing us to ground our decision making in a solid mathematical foundation.

A truly significant bell

The twin tools of random sampling and statistical analysis using the normal model are widely used and remarkably handy.

Perhaps most significantly, the normal distribution lies at the heart of the scientific method. Because we expect that the observed effects in, say, a clinical trial for a new drug (or any scientific experiment) will tend towards a normal distribution, we can assess how “significant” our observed results are by estimating the probability of observing an outcome that extreme if our drug actually had no effect (the “p-value”). If the normal model tells us that the p-value is very low (say, below 5%), then our trial provides statistically significant support for the drug’s effectiveness.

Manufacturers and engineers use the normal model to set control limits for evaluating some measure of system performance. Observations falling outside the control limits alert the system owner that there could be a problem, since we should expect, under the normal model, that such extreme observations are exceedingly unlikely if the system were functioning normally. We should be thankful that these types of controls exist whenever we board a plane or buy a car.

The normal model should also ring a bell (pun intended…) every time we see the results of a political poll. Attached to the headline poll result should be a “margin of error” which describes the amount of potential error expected around those results, given that random sampling involves variation that follows a normal distribution. For example, a poll might show that the Republican candidate has 48% support in polls, with a margin of error of plus-or-minus 3%, implying the “true” value could be anywhere between 45% to 51%.4

2008 was not normal

Financial institutions and regulators rely heavily on applications of the normal distribution through value at risk (“VAR”) models, which they use to quantify financial risk exposure, establish tolerable risk levels, and assess whether there is cause for alarm.

VAR models provide an estimate of the minimum financial losses that should be expected to occur some percentage of the time over a certain time period. For example, an investment firm might estimate that, given an assumed (normal) distribution of their portfolio’s potential returns over the next month, they should expect a 5% probability that they will suffer losses of at least $25m (the “value at risk”).5

The VAR is simply a point on the (assumed) probability distribution of potential returns for a portfolio. The more radical the deviation from today’s conditions, the lower the probability of that outcome—as the normal model would suggest.

VAR supplies a simple, single risk metric that is widely accepted by leading institutions. However, the VAR model also provides an excellent example of the limitations of the normal model—an indeed of models more generally.

Drawing from historical data, the VAR makes strong assumptions about potential future returns, assumptions which are susceptible to error, bias, and manipulation. Moreover, VAR’s use of a normal distribution makes it inadequate for capturing risks of extreme magnitude but low probability, such as unprecedented asset price bubbles or the collapse of the national housing market.

In the aftermath of the 2008 financial crisis, the U.S. House of Representatives concluded that rampant misuse of the VAR model by financial institutions justified excessive risk-taking, which led to hundreds of billions in losses and helped fuel a global recession.6

***

Fluency with basic statistical tools such as the normal model can provide us a valuable edge in our decision-making. It can help us interpret experimental results and political polls, implement quality and safety standards, and quantify financial risks.

But the normal distribution, like all models, is a flawed simplification. It cannot give us certain truth, only suggestions and approximations based on layers of assumptions and theory. Equally important to knowing how to use the normal model is knowing how to determine whether it is appropriate to use in the first place!

Dynamic Equilibrium: Balance, wobbling on the edge of chaos

Dynamic equilibrium describes the balancing cycle that occurs between two phases coexisting on the edge of a phase transition (in so-called “phase separation”), such that neither phase overwhelms the other. Dynamic equilibrium is the goal-state of balancing (negative) feedback loops, which counteract change in an effort to maintain stability.

Consider how a thermostat maintains the temperature of a room at a desired level, constantly rebalancing as the room conditions change.

Dynamic equilibrium exists not when the system is at rest, but rather when its inflows and outflows roughly offset each another, causing the level or size of some stock (such as the room temperature) to remain within a tolerable target value or range.1 Despite constant change, the system maintains a tentative balance.

This model has broad applications for systems across disciplines, including in biology, economics, physics, and business. Because most of what we do takes place in complex systems, we can improve our decision making by understanding the forces determining whether a system is in equilibrium, and how small changes in those forces could create enormous changes in system behavior.

Perfectly balanced, as all things should be

Imagine a bathtub in three different stages:

  1. Static equilibrium” — There is some water in the tub, but the faucet is OFF and the drain is CLOSED. There is stability only because nothing happens.
  1. Not in equilibrium — If we leave the drain CLOSED but turn the faucet ON, the water level will begin to rise. We are now in a phase of growth, not equilibrium, since inflows exceed outflows. Without intervention, the tub will eventually overflow (another phase transition).
  1. Dynamic equilibrium” — If we leave the faucet ON and OPEN the drain such that water is draining out at the same rate as it is flowing in, the water level will not change. The tub is now in a “dynamic equilibrium;” despite the inflows and outflows, the tub will neither empty nor overflow.

We observe these dynamics in all types of complex systems, such as natural ecosystems, economies, businesses, or the human body—any system maintaining a general balance between its inflows and outflows.

In biology, balancing feedback manifests in all life forms through homeostatic behaviors, which counteract any change that moves us away from optimal functioning. For instance, our bodies induce specific, automatic responses to regulate our body temperature, blood sugar levels, and fluid balances.

Humans seeking to create or preserve balance have invented remarkable dynamic equilibrium-maintaining technologies, such as thermostats, autopilot, cruise control, and process control systems in manufacturing.

Delicate balance, at best

Because complex systems (companies, forests, economies, etc.) often involve several simultaneous and competing feedback loops, system behavior will be determined by the loops that dominate. In a dynamic equilibrium, these loops are equally matched. However, systems can experience disruptive and unpredictable shifting dominance if the relative strengths of these loops change (e.g., if we turn the faucet up even higher on a tub that was in equilibrium).2

Near equilibrium, systems generally respond in a more predictable, linear fashion to changes in their environment. But when systems deviate from equilibrium, small shifts in the environment can produce large, nonlinear changes in the system—changes which commonly follow a power-law distribution.3

Consider the field of economics, where periods of apparent stability can quickly transition to unstable “boom” or “bust” cycles if various reinforcing feedback loops start amplifying one another. Bubbles may emerge from a mash-up of high consumer and business confidence, greed and speculation, low interest rates, and increasing asset prices. Similarly, bubbles can “pop” if, for example, an external shock (such as a pandemic or a war) triggers fear, reducing confidence and leading to business contraction, sparking market sell-offs, leading to more fear, and so on.

It is up to a portfolio of negative feedback loops to help reestablish economic equilibrium over the long-term. The free movement of prices is a negative feedback loop that helps constantly rebalance supply and demand. The Federal Reserve possesses tools of negative feedback, such as manipulating interest rates or the money supply, in order to tame business cycles. The government could also change tax rates or implement relief packages.

The key lesson is that we cannot become complacent just because the economy, our relationships, or an organization is stable at the moment. Slight shifts in the forces at play can tip the scales toward drastic change!

A recipe for innovation

In his fantastic book on nurturing innovation, Loonshots, physicist Safi Bahcall argues that dynamic equilibrium is one of the critical factors needed to enable technological breakthroughs.

While our “artists” (who work on research and development) are obviously critical to innovation, so too are our “soldiers” (who work on franchises and help bring those breakthroughs to market). In order for organizations to nurture new bets, they must:

  1. Separate artists and soldiers to give raw ideas the breathing room they need to evolve and improve (phase separation); and,
  2. Enable seamless exchange between the two groups to bring innovations to life (dynamic equilibrium).4

Bahcall gives the incredible example of military engineer Vannevar Bush. During World War I, Bush observed that poor cooperation between the scientific community and the culturally tight military was putting the US military prowess at risk of falling behind. He introduced a novel structure by proposing a new organization called the Office of Scientific Research and Development (OSRD), which enabled the military’s research and development efforts to be separate, but simultaneously to stay connected with the military through a seamless interchange.

The OSRD system was able to generate incredible breakthroughs with remarkable efficiency. Its achievements include radar (which helped win the war); work on penicillin, malaria, and tetanus (which helped reduce infectious deaths among soldiers by 20x); plasma transfusion (which saved thousands of lives on the battlefield); and—above all—nuclear fission (which laid the groundwork for the development of nuclear weapons).5

***

Dynamic equilibrium offers an explanation for why complex systems can exist in periods of relative stability, despite being in constant flux. It also explains why these periods of balance are always vulnerable. We cannot afford to assume that stable things will remain so. Subtle oscillations between the feedback loops can cascade into major system changes—whether in our company, our marriage, the economy, or our bath tub.

Compounding: Why we should actively trade ideas, but not stocks

“Spend each day trying to be a little wiser than you were when you woke up. Discharge your duties faithfully and well. Step by step you get ahead, but not necessarily in fast spurts… Slug it out one inch at a time, day by day. At the end of the day—if you live long enough—most people get what they deserve.”

Charlie Munger, Poor Charlie’s Almanack (2005, pg. 138)

There is perhaps no more fundamental idea that reminds us of the value of the twin virtues of patience and discipline than the phenomenon of compound interest.

Compounding describes the process by which a fixed quantity (such as a savings account) grows by accruing “interest” at a certain rate, then earning interest on the original quantity plus interest on the newly added interest, and so on.

Not so simple

Compounding is a powerful example of a reinforcing (positive) feedback loop, which produces an exponential growth effect in which the absolute growth in the quantity increases over time. In contrast, “simple interest” produces linear growth in which the balance increases by a constant absolute amount each period.

For example, a savings account with an initial balance of $10,000 that earns 5% simple interest will grow by exactly $500 each year (see simplified chart below). If that same $10,000 were to instead earn 5% compound interest, the balance would grow by $500 in Year 1, $525 in Year 2, $551 in Year 3, and (skipping ahead) $776 in Year 10, etc.—generating a nonlinear increase in value over time.

simple interest grows linearly, compound interest grows exponentially

The mathematical phenomenon of compounding is one of the most powerful concepts to understand, with applications for our personal financial management, our habits and productivity, and indeed our pursuit of wisdom generally.

The best for last

Compounding teaches us to be patient, because most of the benefits of compounding come at the end! Whether we’re starting to build up our retirement savings, creating new relationships, or establishing better habits, we may not see huge benefits up front. But if we combine the discipline and patience to keep making incremental improvements, over time we can generate enormous results.

“Habits are the compound interest of self-improvement. They don’t seem like much on any given day, but over the months and years their effects can accumulate to an incredible degree.”

James Clear (2018, on Twitter), author of “Atomic Habits

In financial decisions, it is critical to value cash flows not based on their absolute value today, but on their opportunity cost—the potential value of that cash flow if we had instead invested it and allowed it to compound over time. Any use of money must justify the opportunity cost of foregone compound interest on those funds—and that amount could be huge.

Disciplined investors are “cursed” with viewing investments and expenses through this lens. Warren Buffett famously quipped that his worst investment ever was actually his purchase of Berkshire Hathaway. He estimated that if he had simply taken the amount he paid in 1962 and invested it at the rate of return he would go on to earn over his career, he would have accrued $200bn more wealth.1

Stop trading so much

If I could summarize the one lesson about money that I’ve learned in my own career in finance and strategy, it would be that people (especially men) generally overestimate their own financial acumen.

Overconfidence in finance leads us to transact much more often than we should—and transactions are costly, because they counteract the power of compounding.2 Every time we sell a stock or asset, we pay some transaction fees, and we owe taxes on any gains we accrue. As compounding teaches us, the value of these costs rises exponentially over time, since we could have simply let those funds compound freely.

“Beware of little expenses: a small leak will sink a great ship.”

Benjamin Franklin

The more actively that individual investors trade, the more money they typically lose. A fascinating study observed that if we break out the returns of individual investors into tiers based on how frequently they trade, the net returns of every group except for the least frequent traders are lower than the net return from simply investing in an S&P 500 index fund. And the group with the heaviest traders generated the lowest returns, by far.3

My advice: unless we are market geniuses (most of us aren’t), we probably shouldn’t be trading frequently. Most likely, we would be better off in the long-term by investing the majority of our portfolios in low-cost, passive “index funds” (which simply mirror the returns of a market index)—only infrequently checking our balance or executing transactions.

Learning begets learning

Perhaps the most powerful case of compounding is knowledge growth itself.

Wisdom is not a matter of collecting facts and clever examples. Being a polymath is the “simple interest” version of learning. Rather, the way we compound our knowledge is by incrementally building up a self-supporting, interconnected foundation of ideas and explanatory frameworks—a “latticework,” to borrow Charlie Mungers terminology.

The human brain learns by association, the cognitive processes in which we attempt to draw meaningful connections between a new piece of information and our prior knowledge. We compare, combine, and create variations between different ideas.

Developing a strong foundation of models and ideas in our heads enables a positive feedback loop: more information supplies more possible connections, which helps us improve our knowledge, which makes it easier to add more information and make even more meaningful connections, and so on.4 In other words, learning compounds to facilitate more learning.

This happens through independent reading, research, and elaboration—by following our curiosity and thinking critically about different (often incompatible) ideas and how we might combine them with other knowledge. The more diverse these ideas are across disciplines (physics, systems, economics, etc.), the stronger the foundation will be.

By committing to a lifelong process of learning in this associative, multidisciplinary way, our knowledge can truly compound to empower us to solve an incredible range of problems.

***

The key lesson? Diligently pursue incremental improvements, then simply be patient. Let compounding work its magic! With time, the odds tip heavily in your favor.

Entropy: The improbability of order (oh, and the miracle of life itself)

“The ultimate purpose of life, mind, and human striving: to deploy energy and information to fight back the tide of entropy and carve out refuges of beneficial order.”

Steven Pinker, The Second Law of Thermodynamics (2017)

In the realm of thermodynamics, the term “entropy” represents the measure of disorder or randomness in a system. While the first law of thermodynamics tells us that the total energy of the universe remains constant (the principle of the conservation of energy), this energy becomes progressively less useful as it is used up and spread out. The second law of thermodynamics says that the total disorder (entropy) of the universe increases over time. Energy is not perfectly recyclable; something is always lost.

Entropy may not be intuitive, but its implications ripple across our understanding of the physical world, social systems (such as companies or families), and even the emergence of life on Earth!

Just as a sandcastle will inevitably erode without constant maintenance, so too do the systems in our lives require energy and attention to stave off disorder.

Irreversibility and chaos

A great way to understand why entropy always increases is by exploring a fundamental property of nature: irreversibility. Most processes cannot be perfectly undone. We would struggle to un-cook an egg, to un-mix two combined paint colors, to un-burn firewood, or to un-birth a child.

Imagine applying heat to an ice cube. As the temperature rises, the rigid, frozen water molecules will begin to loosen and scatter as the water goes through phase transitions of melting and eventually vaporizing. Total entropy (disorder) has increased through this process, as some heat (useful energy) was dissipated to cause the phase transitions. We can re-freeze the water vapor by cooling it, but this will require additional energy use, further increasing entropy.1

Now imagine we simply place our ice cube on the kitchen counter. The warmer counter will transfer some heat to the ice, until their temperatures equalize. Because the atoms in a hotter substance—by definition—are moving more quickly than those in a colder substance, heat tends overwhelmingly to flow in one direction: from a hotter place to a cooler place. The ice cube does not transfer “coldness” to the counter. Critically, this one-directional property means that we are unable to perfectly “reverse” most heat transfers.2

Imagine if physical processes—such as melting an ice cube or operating a car engine—were perfectly reversible: we would have infinitely recyclable energy, and there would be no entropy. But in reality, complex natural processes are generally irreversible, meaning that they involve at least some expenditure of useful energy. Consider how even an incredibly efficient car engine requires a non-zero amount of fuel, electricity, or friction to function.

This is thermodynamics’ key limitation: because some useful energy must be expended to do work, no process can be perfectly reversed, and entropy must increase.3

The improbability of order

Nature’s tendency towards disorder, whether with ice cubes or engines or businesses, can also be understood through probabilities. Processes tend to move from less probable to more probable states. Because the number of ways a system can be ordered is far smaller than the number of ways it can be disordered, disorder is inevitable.

Imagine a deck of cards thrown on the ground. How likely is it that the cards land in an ordered fashion? There are more than 8×1067 ways to arrange a 52-card deck, so the probability of any random arrangement is miniscule. Chaos is much more common. In fact, the change from an ordered arrangement to a disordered one is the source of irreversibility: work must be done in order to re-impose an ordered state on the system!4

Carving out refuges of order

Just as we must do more work to re-freeze our ice cube after it inevitably melts, we must constantly expend effort to create and maintain order in our lives, as time tips them towards chaos and disorder. This includes the condition of our homes, relationships, companies, and physical health.

Bottom line: because disorder is always increasing, complacency will eventually lead to failure.

Though invoking thermodynamical laws in everyday life may not earn us many “cool points,” we can drastically improve our decision making by understanding entropy in the context of social systems. For a social system such as an organization or relationship to be productive, it must be in some useful order, and organizing the people and activities into a useful state requires us to actively invest energy and resources.

In business, companies that don’t work constantly to cultivate and evolve their purpose, structure, and processes, will inevitably stagnate. Over time, bureaucracy and decay will seep in, as the gap widens between the company’s goals and the activities of its employees. Lines of responsibility will blur, and the product portfolio will become bloated. To stave off entropy, management must periodically redesign processes and reallocate resources to ensure the company channels its competitive energy outward, not inward.5

Life’s dance with disorder

At a cosmic scale, entropy tells us that all things will fall apart eventually, from our own Sun to the entire Milky Way. This is not something to fear; in fact, the law of entropy can help us appreciate the “miracle” of life on our own planet.

If disorder is always increasing in a closed system, how can it be that highly “ordered” organic life and ecosystems have evolved on Earth? The answer stems from the fact that our planet is not a closed system, but a sub-system within a larger solar system, within a larger galaxy, and so on.

While disorder always increases in the universe at large, order can emerge in smaller systems (such as the Earth) which feed off an outside energy source. Our energy source is the Sun, whose dissipated heat helps create the unique conditions on Earth that enable complex organic systems to emerge.6

In this sense, the universe’s march towards disorder is the very source of the rare, beautiful order of our planet!

***

In conclusion, entropy explains why we are seemingly always battling against forces messing things up in our lives. It also serves as a celebration of the precious and transient order that makes life possible in the first place. There is perhaps no better motivation for us to create and sustain order in the families, communities, and organizations that we cherish.

The Law of Large Numbers: Even big samples can tell lies

The Law of Large Numbers is a theorem from probability theory which states that in normally distributed systems, as we observe more instances of a random event, the actual outcomes will converge on the expected outcomes. In other words, when our sample sizes grow sufficiently large, the results will settle down to fixed averages.

The Law of Large Numbers, unsurprisingly, does not apply to small numbers. The smaller the sample size, the greater the variation in the results, and the less informative those results are. Flipping a coin and observing 80% “heads” in 10 flips is much less remarkable than observing the same imbalance in 10,000 flips (we would start to question the coin). Larger samples reduce variability, providing more reliable and accurate results.

Delusions of causality

Sadly, people tend to be inadequately sensitive to sample size. We focus on the coherence of the story we can tell with the information available, rather than on the reliability of the results. Following our intuition, we are likely to incorrectly assign causality to events that are, really, just random.

Psychologist Daniel Kahneman famously cites a study showing that the counties in the U.S. in which kidney cancer rates are lowest are mostly small, rural, and located in traditionally Republican states. Though these factors invite all sorts of creative stories to explain them, they have no causal impact on kidney cancer. The real explanation is simply the Law of Large Numbers: extreme outcomes (such as very high or very low cancer rates) are more likely to be found in small samples (such as sparsely populated counties).1

We often foolishly view chance as a self-correcting process. For instance, we might believe that several consecutive successes (in blackjack, basketball, business, etc.) make the next outcome more likely to be a success (the “hot-hand” fallacy). Or, we might believe that a random deviation in one direction “induces” an opposite deviation in order to “balance out” (the gambler’s fallacy).

In reality, the Law of Large Numbers works not by “balancing out” or “correcting” what has already happened, but simply by diluting what has already happened with new data, until the original data becomes negligible as a proportion.2 For instance, as we flip a fair coin more and more times, the proportion of flips that land as “heads” will settle down towards 50%.

This is the logic behind the powerful Central Limit Theorem. Essentially no matter what the shape of the population distribution, for larger samples the average can be assumed to be drawn from a normal “bell-shaped” curve. This theorem provides the foundation for all sorts of statistical tests that help us make better inferences and to quantify uncertainty.3

Correlated errors, and why polls can mislead us

A key limitation of the Law of Large Numbers is that the theorem also requires an assumption that the observations we make are independent of one another (for instance, one coin flip does not impact the next). But if the observations are correlated—as they often are in real life—then what appear to be random results could actually reflect a bias in the method.4 If our data is systemically biased, we are likely to make errors, and we cannot expect the Law of Large Numbers to work like normal.

The potential for correlation between sampling errors was one of statistician Nate Silver’s key insights when he assigned a much higher probability of a Trump victory in 2016 than other pundits estimated. Based on polling data, Hillary Clinton was favored in five key swing states, and analysts surmised that Trump was extremely unlikely to win all of them, and thus extremely unlikely to win the election. Sure, they assumed, Trump might pull off an upset win in one or two of those states, but the Law of Large Numbers should take over at some point.

Silver, however, recognizing that our entire polling methodology could be systematically biased toward one candidate, modeled a healthy amount of correlation between the state polls. His model implied that Trump sweeping the swing states was way more likely than we would expect from the individual probabilities, because a discrepancy between the polls and the results in one state could mean that we should expect similar errors in the other states. Trump won all five of those states, and won the election.5

***

Overall, the Law of Large Numbers demonstrates the incredible danger of relying on small samples to make inferences, the fallacies of assuming that random things are “streaky” or that extreme results will promptly be “balanced out,” and the importance of checking our data for systemic bias—which will lead to misleading results regardless of sample size.

Scientific Method: Why the world doesn’t speak Chinese

For much of human history, we relied on authority figures to tell us what is true and just, based on the presumed wisdom of the leaders of our tribe, government, church, etc. The break from this authoritarian and anti-progressive tradition began with the boldness of ancient philosophers such as Aristotle, but truly accelerated in the 16th and 17th centuries with revolutionary thinkers such as Francis Bacon, Galileo Galilei, and Isaac Newton. These leaders helped shape the “Enlightenment,” an intellectual movement that advocated for individual liberty, religious tolerance, and a rebellion against authority with regard to knowledge.

What emerged was an enduring tradition of criticism and of seeking good explanations in an attempt to understand the world—a tradition we call science, which has lead us to remarkable progress in the last ~400 years.

The best method of criticism

Thanks to the brilliant yet underappreciated 20th-century philosopher Karl Popper, we have a full-fledged theory of knowledge creation—the theory of critical rationalism. We do not obtain knowledge by the charity of some authoritative source passing the “truth” down to us. In fact, there is no such authoritative source. All knowledge is fallible.

Instead, we create knowledge through an iterative process of trial-and-error. First, we conjecture (guess) tentative solutions to our problems. Then, we criticize those theories, attempting to disprove them. We discard theories that are refuted and try to improve on them. If we’re able to replace a refuted theory with a better one, then we can tentatively deem our efforts to have made progress.1

“What we should do, I suggest, is to give up the idea of ultimate sources of knowledge, and admit that all knowledge is human; that it is mixed with our errors, our prejudices, our dreams, and our hopes; that all we can do is grope for the truth even though it may be beyond our reach.”

Karl Popper, Conjectures and Refutations (1963, pg. 39)

Criticism is the step in this process that helps us root out wrongness. The characteristic (though not the only) method of criticizing candidate theories is through experimental testing—through the scientific method.

After we postulate a theory, we perform a crucial experiment, one for which the old theory predicts one observable outcome and the new theory another. We eliminate the theory whose predictions turn out to be false.2

For instance, our “new” theory could be that a particular dieting method is effective for losing weight. Our “old” theory could be that the dieting method does nothing (the dreaded “null hypothesis”). We would run an experiment and compare the results of a randomly selected group who used the method to a randomly selected “control” group who didn’t. If the treatment group’s results aren’t sufficiently better than the control group’s, then we reject the theory that the dieting method is effective.

The scientific method embraces objectivity, curiosity, careful observation, fierce skepticism, analytical rigor, and continuous improvement.

Good vs. bad explanations

Critically, our scientific theories (guesses) must meet two key criteria. First of all, they must be falsifiable (or testable)—that is, they must be capable of conflicting with possible observations. If no conceivable event would contradict the theory, it cannot be scientific.3

As an example, consider the hypothesis “Scorpios are creative and loyal.” Would a single uncreative, untrustworthy person born between October 23 and November 21 refute the theory? Would 1,000? How uncreative or untrustworthy would they have to be, and how would we know? Unfortunately, the conditions under which these kinds of astrological predictions would be false are never mentioned; therefore, they cannot be scientific (sorry, astrologers…).

Second, our scientific theories must be what physicist David Deutsch calls “good explanations,” those that are hard to vary while still accounting for what they purport to account for. When we can easily tweak our theories without changing their predictions, then testing them is almost useless for correcting their errors. We can toss these out immediately without experiment.4 Examples of easily variable explanations include assertions such as “The gods did it,” or “It appeared out of thin air,” or “Because I said so” (sorry, parents…). These kinds of claims are easily varied to explain, well, anything.

Why the world speaks English

Historians have long debated why the “scientific revolution” originated in the West, given that many technological and political innovations originated in the Indian, Islam, and (especially) Chinese empires. For centuries, the Chinese outperformed the Europeans in applying natural knowledge to solve human problems. But it was the emergence and proliferation of the scientific method across Western Europe in the 17th century that sparked the tradition of criticism and wave of innovation that has revolutionized human society. Why?

Our best theory attributes the West’s scientific supremacy to the structure of its knowledge creation practice—that is, to the scientific method. Despite enormous creativity in China, political battles and the Song emperors’ personal interests smothered the work of the early innovators. By contrast, in 1660, the English established the Royal Society of London, which openly shunned authority and embraced science as a path toward prosperity. The Royal Society inspired a generation of new scientists (including Isaac Newton) who would ultimately propel the English to a commanding lead in the scientific race.

“Nullius in verba” (Latin for “take nobody’s word for it”)

Royal Society of London motto (1660)

If the Chinese emperors had embraced a tradition of criticism, the scientific revolution might have occurred 500 years sooner. And the world might be speaking Chinese, instead of English.5

***

All knowledge is fallible. We have no authoritative source of “absolute truths.” But that is not what science is about. The real key to science is that our explanatory theories can be improved, both through the creation of new theories, and through criticism and testing of our existing theories—that is, through the scientific method.

The quest for good explanations, guided by a tradition of criticism and a rejection of authority over knowledge, is the source of all progress. It embodies the spirit of science and of the Enlightenment.6 For me, there is perhaps no more worthy calling!

Phase Transitions: Uncovering the hidden tipping points that change everything

A phase transition is the process of change between different states of a system or substance, which occurs when one or more of the system’s control parameters crosses a “critical” threshold.

We tend to take stability for granted, leading us to be caught off-guard when the ground shifts between our feet. By better understanding the dynamics of phase transitions, we can learn to anticipate change and manage it to our advantage.

From stability to phase change

As a simple example, consider an ice cube (a solid), whose key control parameter is temperature. If we apply heat to the ice, the temperature will rise, and the frozen water molecules being held in rigid formation by binding forces will begin to scatter as the water goes through phase transitions of melting (into liquid) and eventually vaporizing (into gas).

Between each phase, there is a range of temperatures in which the state of the system remains stable. It is only once its temperature crosses certain critical thresholds (specifically, 0° and 100° C) that it enters a phase transition.

In fact, this logic helps explain changes in the “phases of matter” (solid, liquid, gas) for all kinds of physical substances as their temperatures fluctuate.1

For us, the real value of the phase transition concept lies in its applicability not only to physical systems (changes in phases of matter), but also to social systems (changes in phases of behavior). In both types, the whole is not only more than the sum of its parts, but it is very different from its parts.

In complex systems, we can’t analyze one component and predict how the whole system will behave, whether it’s one water molecule in a boiling pot, one employee in a company, etc. In each ease, we need to consider the system—its collective behaviors, including the control parameters that can tip it into unpredictable phase transitions.

Avalanches of peace

Imagine dropping grains of sand, one-by-one, onto a countertop. A pile will gradually form, and its slope (the key control parameter) will increase. For a time, each additional grain has minimal effect; the pile remains approximately in equilibrium.

Eventually, however, the pile’s slope will increase to an unstable “critical” threshold, beyond which the next grain may cause an avalanche (a type of phase transition).

Near the critical point, there is no way to tell whether the next grain will cause an avalanche, or how big that avalanche will be. All we know is that the probability of an avalanche is much higher beyond the threshold, and that avalanches of any size are possible, though smaller avalanches will happen much more frequently.

Through a series of avalanches, each of which widens the base of the pile, the sandpile (an inanimate complex system) “adapts” itself to maintain overall stability!

Research has observed the stabilizing role of frequent smaller “avalanches” in a variety of systems, including the extinction of species in nature, price bubbles and bursts in financial markets, traffic jams, forest fires, and earthquakes relieving pressure from grinding tectonic plates.2 These systems unconsciously “self-organize” towards a critical state, beyond which they undergo phase transitions that help preserve equilibrium, over time.

Growing to death

For those of us who work in organizations, the science of phase transitions provides insight into how to nurture innovation, and how to avoid destructive breakpoints.

Just as avalanches occur with a steep enough pile, high-performing teams in every creative industry unexpectedly shift as they grow over time, often to their own detriment. Consider the fate of former industry titans who, once on top, failed to adapt to a shifting technological landscape—PanAm, Nokia, Kodak, Blockbuster, and so on.

“Growth will break your company.”

Tony Fadell, Build (2022, pg. 242)

Team size is a key control parameter. With small teams (up to ~15 people), every member’s stake in the results of the project are very high. There’s no need for management. Communication happens naturally. Up to 40-50 people, some silos and sub-teams begin to form, but individual stakes remain high, and most interactions remain informal.

However, as teams and companies scale (particularly beyond 120-150 people), individual stakes in project outcomes decline, while the perks of rank (job titles, salary growth) increase until, when the two cross, the system “snaps” and incentives begin encouraging unwanted behavior: the rejection of risky but potentially groundbreaking ideas.3

Moreover, layers of management form, information becomes siloed, jobs become more specialized, culture drifts, and people-related issues explode. Further growth magnifies these problems.

Managing these transitions successfully requires careful design of incentives, org structure, and roles and responsibilities. To do so, we should consider a portfolio of preemptive actions:4,5

  1. Increase “project-skill fit” — Search for and correct mismatches between employee skills and project needs. Employees who are well-matched to their assignments will take more ownership of the outcomes.
  2. Non-political promotions — In promotion decisions, emphasize independent assessment from multiple sources over politics (reliance on the manager).
  3. Get the incentives right — Motivate others with “soft equity” rewards such as peer recognition, autonomy, and visibility. External rewards such as money should be based on milestones or outcomes that individuals can actually control. Too many incentive structures rely on perverse schemes such as earnings-based compensation for junior employees, who have no direct influence on those metrics.
  4. Decentralize — Once you have multiple products, you will need to split your org into individual product groups, sort of “mini-startups” within the business that are more nimble and autonomous.
  5. Optimize “management spans” — For teams focused on innovative new projects (R&D), consider increasing the average number of direct reports per manager to encourage looser controls and more trial-and-error. For “franchise” groups that focus on growing existing businesses, consider narrowing management spans to encourage tighter controls and measurable outcomes (since failure is more costly).

***

Whether in an organization, relationship, laboratory, or highway, we can improve our decision-making by investigating the control parameters governing the stability of our system—and the thresholds beyond which chaotic phase transitions may occur.

We cannot simply extrapolate linearly from the present. Even if our company, our marriage, or our environment is stable at the moment, small changes in critical factors can create unpredictable change. We can either be surprised by the pervasive phase transitions in our lives, or we can anticipate them and harness them to our advantage.

Bayesian Reasoning: A powerful (but flawed) rule of thumb for updating our beliefs

Bayesian reasoning is a structured approach to incorporating probabilistic thinking into our decision making. It requires two key steps:

  1. Obtain informed preexisting beliefs (“priors”) about the likelihood of some phenomena, such as a suspect’s DNA matching the crime scene evidence, a candidate winning an election, or an X-ray revealing a tumor.
  2. Update our probability estimates mathematically when we encounter new, relevant information, such as finding DNA from a new suspect, a new poll revealing shifting voter behavior, or a new X-ray showing unexpected results.

The Bayesian approach allows us to use probabilities to represent our personal ignorance and uncertainty. As we gather more good information, we can reduce uncertainty and make better predictions.

Unfortunately, when we receive new information, we tend to either (a) dismiss it because it conflicts with our prior beliefs (confirmation bias) or (b) overweight it because we can recall it more easily (the availability heuristic). Bayesian reasoning demands a balancing act to avoid these extremes: we must continuously revise our beliefs as we receive fresh data, but our pre-existing knowledge—and even our judgment—is central to changing our minds correctly. The data doesn’t speak for itself. 1

The Bayesian approach rests on an elegant but computationally intensive theorem for combining probabilities. Fortunately, we don’t always need to be able to crunch probability calculations, because Bayesian reasoning is extremely useful as a rule of thumb: good predictions tend to come from appropriately combining prior knowledge with new information.

However, as we will see, Bayes is simultaneously super nifty and incredibly limited—as all rules of thumb are.

Why your Instagram ads are so creepy

We experience Bayesian models constantly in today’s digital world. Ever wondered why your Instagram ads seem scarily accurate? The predictive models used by social media apps are powerful Bayesian machines.

As you scroll to a potential ad slot, Instagram’s ad engine makes a baseline prediction about which ad you’re most likely to engage with, based on its “priors” of your demographic data, browsing history, past engagement with similar ads, etc. Depending on how/whether you engage with the ad, the ad targeting algorithm updates its future predictions about which ads you’re likely to interact with.2 This iterative process is the reason why your ads seem creepy: they become remarkably accurate over time through constant Bayesian fine-tuning.

Prioritize priors

For Bayesians, few things are more important than having good priors, or “base rates,” to use a a starting point—such as the demographic or historical-usage data on your Instagram account.

In practice, we tend to underweight or neglect base rates altogether when we receive case-specific information about an issue. For example, would you assume that a random reader of The New York Times is more likely to have a PhD, or to have no college degree? Though Times readers are indeed likely to be more educated, the counterintuitive truth is that far fewer readers have a PhD, because the base rate is much lower!3 There are over 20x more Americans with no college degree than those with a doctorate.

The prescription to this error (called “base-rate neglect”) requires anchoring our judgments on credible base rates, and thinking critically about how much weight to assign to new information. Without any useful new information (or with worthless information), Bayes provides us clear guidance: hold to the base rates.

“Pizza-gate”

Many bad ideas come from neglecting base rates.

Consider conspiracy theories, which, despite their flimsy core claims, propagate by including a sort of Bayesian “protective coating” which discourages believers from updating their beliefs when new information inevitably contradicts the theory.

For example, proponents of the “Pizza-gate” conspiracy falsely claimed in 2016 that presidential candidate Hillary Clinton was running a child sex-trafficking ring out of a pizzeria in Washington, DC. A good Bayesian would assign a very low baseline probability to this theory, based on the prior belief that such operations are exceedingly rare—especially for a lifelong public servant in such an implausible location.

When the evidence inevitably contradicts or fails to support such a ludicrous theory, we should give even less credence to it. That is why Pizza-gate proponents, like many conspiracy theorists, introduced a second, equally baseless theory: that a vast political-media conspiracy exists to cover up the truth! The probability of this second theory is also really low, but it doesn’t matter.4 With this “protective layer” of a political-media cover-up, conspiracy theorists can quickly dismiss all the information that doesn’t support the theory that Hillary Clinton must be a pedophilic mastermind.

Bayesian reasoning is only as good as the priors it starts with, and the willingness of its users to objectively integrate new, valid information.

The boundaries of Bayes

The mathematics of Bayes’ theorem itself is uncontroversial. But Bayesian reasoning becomes problematic when we treat it as anything more than a “rule of thumb” that’s useful when we have a comprehensive understanding of the problem and very good data.

Estimating prior probabilities involves substantial guesswork, which opens the door for subjectivity and error to creep in. We could ignore alternative explanations for the evidence, which is likely to lead us to simply confirm what we already believe. Or, we could assign probabilities to things that may not even exist, such as the Greek gods or the multiverse.5

The biggest problem: Bayes cannot possibly create new explanations. All it can do is assign probabilities to our existing ideas, given our current (incomplete) knowledge. It cannot generate novel guesses. But sometimes, the best explanation is one that has not yet been considered. Indeed, creating new theories is the purpose of science. Scientific progress occurs as new and better explanations supersede their predecessors. All theories are fallible. We may have overwhelming evidence for a false theory, and no evidence for a superior one.

A great example is Albert Einstein’s theory of general relativity (1915), which eclipsed Isaac Newton’s theory of gravity that had dominated our thinking for two centuries. Before Einstein, every experiment on gravity seemed to confirm Newtonian physics, giving Bayesians more and more confidence in his theory. That is, until Einstein showed that Newton’s theory, while extremely useful as a rule of thumb for many human applications, was completely insufficient as a universal theory of gravity. Ironically, the day before Newton’s theory was shown to be false was the day when we were most confident in it.6

The “probability” of general relativity is irrelevant. We understand that it is only conditionally true because it is superior to all other current rivals. We expect that relativity, as with all scientific theories, will eventually be replaced.7

***

Bayesian reasoning teaches us to (1) anchor our judgments on well-informed priors, and (2) incorporate new information intentionally, properly weighting the new evidence and our background knowledge. But we must temper our use of Bayesian reasoning in making probability estimates with the awareness that the best explanation could be one that we haven’t even considered, or one for which good evidence may not yet exist!

Emergence: More is different, very different

“We can’t control systems or figure them out. But we can dance with them!”

Donella Meadows, Thinking in Systems (2008, pg. 170)

In complex systems such as human beings, companies, or ecosystems, collective behaviors can create dynamics that cannot be defined or explained by studying the parts on their own. These “emergent” behaviors result from the interactions between the lower-level components of a system and the feedback loops between them.

For example, in our brains, groups of neurons exchanging disordered electrical signals create incredible phenomena, such as senses and memory. Collections of employees swapping information and favors within a company can create factions and hidden bargains. In nature, examples of emergent collective behaviors are everywhere, such as birds flocking, hurricanes or sand dunes forming, social network development, the evolution of life, climate change, the formation of galaxies and stars and planets, and the development of consciousness in an infant.

In all of these examples, the system is not only more than the sum of its parts, it is also very different from those parts. We cannot analyze a single skin cell and infer someone’s personality. Nor can we analyze one employee and infer the behavior of the organization.

If we aspire to operate effectively in complex systems, we cannot afford to underestimate the possibility and power of emergent phenomena. The complex ongoing interactions between system components will cause unique and unexpected behaviors. Developing a better understanding of emergence can not only help us prepare to be unprepared, but also to create truly unique explanations and solutions.

Emergent explanations

Imagine how difficult it would be if we could only learn reductively—that is, by analyzing things into their constituent parts, such as atoms. Even the most basic, everyday events would be overwhelmingly complex. For example, if we put a pot of water over a hot stove, all the world’s supercomputers working for millions of years could not accurately compute what each individual water molecule will do.

With emergent phenomena, however, high-level simplicity “emerges” from low-level complexity. As a result, we may be able to understand systems extremely well and make useful predictions by analyzing phenomena abstractly—that is, at a higher (emergent) level. In fact, as theoretical physicist David Deutsch explains, all knowledge-creation actually depends on emergent phenomena—problems that become explicable when analyzed at a higher level of abstraction.1

Consider thermodynamics, the physical laws governing the behavior of heat and energy. These powerful and fundamental laws do not attempt to describe the world in terms of particles or atoms. They are abstract, describing physical phenomena in terms of their higher-level properties, such as temperature and pressure.

Thermodynamics can help us understand why water turns into ice at low temperatures and gas at high temperatures, without requiring us to analyze the details of each individual water molecule. It is due to the emergent phenomena of phase transitions, sudden transformations that occur when one or more control parameters of a collective system cross a critical threshold. At certain temperatures, the water will freeze, melt, or vaporize.2

Merely analyzing individual water molecules would not enable us to simply “deduce” how these phase transitions happen—or indeed whether they would occur at all. Phase transitions are emergent phenomena.

The emergent (crypto) economy

The economy is an intricate web of behaviors and relationships, emerging from the actions of diverse agents—individuals, companies, investors, regulators—each concurrently pursuing their own goals. These agents are not merely “participants” in the economy; they co-create its dynamics. A great example is the emergence of cryptocurrency markets such as bitcoin.

Imagine a small group of techies that starts investing in bitcoin, driven by a belief that its price will rise in the future. If bitcoin’s value indeed rises, news will spread, potentially igniting a viral wave of speculation. The influx is not just about making money, but also about joining a movement. Stories of newly minted millionaires fuel additional speculation and even attract criminals and fraudsters seeking to exploit the frenzy. Concerned regulators may institute game-changing rules to quell the madness.

Here, thousands of decisions by diverse individuals converge to create a market behavior that is far different than the sum of its parts: a bubble—which is clearly not in the collective interest. Characterized by speculation and volatility, the bubble is a cycle of elation and fear, gains and losses, that transcends individual intentions.

We observe such emergent economic phenomena not only in cryptocurrencies, but also in stock market oscillations, housing bubbles, and even the rise and fall of entire economies. In these systems, predictability is an illusion and periods of stability are tentative, as economic agents constantly respond to and anticipate changes in the dynamic landscape they themselves help shape.3

Managing emergent behavior in economics requires an understanding that plausible-sounding approaches might fail—or backfire entirely.

Dancing with fire

In economics as well as nature, emergent behavior can cause rapid change, and managing it may require counterintuitive interventions.

Consider forest fires. The destructiveness of forest fires follows a power-law distribution, in which there are many small fires that are generally manageable or tolerable, and a few massive, catastrophic fires.

Until 1972, Yellowstone Park rangers were required to extinguish every small fire immediately—a policy that made sense when analyzed reductively (“We should save every tree.”). But this policy led Yellowstone to grow dense with old trees, making a mega-fire like the one in 1988 inevitable. The reductive policy ignored the emergent phenomena of phase transitions: as a forest approaches critical thresholds of tree density and tree age, the potential for a massive fire grows exponentially. Today, most forestry services utilize a “controlled-burn” policy of intentionally sparking smaller fires under careful watch, reducing the potential for a massive fire.4

We must be aware when the potential for emergent behavior exists—that is, any time we are engaged with complex systems. Our focus should not be on trying to perfectly predict the future (which invites us into traps of overconfidence and illusions of control), but rather on being adaptable, to ensure we’re prepared to respond to the unexpected.

***

Systems such as the economy, democracy, science, and forests emerged through centuries of iteration and adaptation. The most complex systems emerge and function without anyone having knowledge of the whole.5

Consequently, we should be skeptical about the potential efficacy of simplistic policies and predictions targeted at complex systems. It is impossible to perfectly anticipate the system’s behavior. What we can do is diligently attempt to observe a system’s dynamics, map out its feedback loops, identify and intervene at key leverage points, and be prepared to learn and adapt as emergent behavior unfolds.