Computer Science - NotSoWrong

Signal vs. Noise: Finding the drop of truth in an ocean of distraction

Every time that we attempt to transmit information (a “signal”), there is the potential for error (or “noise”), regardless of whether our communication medium is audio, text, photo, video, or raw data. Every layer of transmission or interpretation—for instance, by a power line, radio tower, smartphone, document, or human—introduces some risk of misinterpretation.

The fundamental challenge we face in communication is sending and receiving as much signal as possible without noise obscuring the message. In other words, we want to maximize the signal-to-noise ratio.

While this concept has been instrumental to the fields of information and communication for decades, it is becoming increasingly relevant for everyday life as the quantity and frequency of information to which we are exposed continues to expand… noisily.

A firehose of noise

Our brains are fine-tuned by evolution to detect patterns in all our experiences. This instinct helps us to construct mental “models” of how the world works and to make decisions even amidst high uncertainty and complexity. But this incredible ability can backfire: we sometimes find patterns in random noise. And noise, in fact, is growing.

By 2025, the amount of data in the world is projected to grow to 175 “zetabytes,” growing by 28% annually.¹ To put this in perspective, at the current median US mobile download speed, it would take one person 81 million years to download it all.²

Furthermore, our average number of data interactions are expected to grow from one interaction every 4.8 minutes in 2010, to every 61 seconds in 2020, to every 18 seconds by 2025.³

So, the corpus of data in the world is enormous and growing exponentially faster than the capacity of the human brain. And, the frequency with which we interact with this data is so high that we hardly have a moment to process one new thing before the next distraction arrives. When incoming information grows faster than our ability to process it, the risk that we mistake noise for signal increases, since there is an endless stream of opportunities for us to “discover” relationships that don’t really exist.⁴

Sometimes, think less

In statistics, our challenge lies in inferring the relevant patterns or underlying relationships in data, without allowing noise to mislead us.

Let’s assume we collected some data on two variables and observed the graphical relationship, which appears to be an upward-facing curve (see charts below). If we tried to fit a linear (single-variable) model to the data, the average error (or noise) between our model’s line and the actual data is too high (left chart). We are “underfitting,” or using too few variables to describe the data. If we then incorporate an additional explanatory variable, we might produce a curved model that does a much better job at representing the true relationship and minimizing noise (middle chart). Next, seeing how successful adding another variable was, we might choose to model even more variables to try to eliminate noise altogether (right chart).

Unfortunately, while adding more factors into a model will always—by definition—make it a closer “fit” with the data we have, this does not guarantee that future predictions will be any more accurate, and they might actually be worse! We call this error “overfitting,” when a model is so precisely adapted to the historical data that it fails to predict future observations reliably.

Overfitting is a critical topic in modeling and algorithm-building. The risk of overfitting exists whenever there is potential noise or error in the data—so, almost always. With imperfect data, we don’t want a perfect fit. We face a tradeoff: overly simplistic models may fail to capture the signal (the underlying pattern), and overly complex algorithms will begin to fit the noise (error) in the data—and thus produce highly erratic solutions.

For scientists and statisticians, several techniques exists to mitigate the risk of overfitting, with fancy names like “cross-validation” and “LASSO.” Technical details aside, all of these techniques emphasize simplicity, by essentially penalizing models that are overly complex. One self-explanatory approach is “early stopping,” in which we simply end the modeling process before it has time to become too complex. Early stopping helps prevent “analysis paralysis,” in which excess complexity slows us down and creates an illusion of validity.⁵

We can apply this valuable lesson in all kinds of decisions, whether in making business or policy decisions, searching for job candidates, and even looking for a parking spot. We have to balance the benefits of performing additional analyses or searches with the costs of added complexity and time.

“Giving yourself more time to decide about something does not necessarily mean that you’ll make a better decision. But it does guarantee that you’ll end up considering more factors, more hypotheticals, more pros and cons, and thus risk overfitting.”
Brian Christian & Tom Griffiths, Algorithms to Live By (2016, pg. 166)

The more complex and uncertain the decisions we face, the more appropriate it is for us to rely on simpler (but not simplistic) analyses and rationales.

A model of you is better than actual you

In making professional judgments and predictions, we should seek to achieve twin goals of accuracy (being free of systematic error) and precision (not being too scattered).

A series of provocative psychological studies have suggested that simple, mechanical models frequently outperform human judgment. While we feel more confident in our professional judgments when we apply complex rules or models to individual cases, in practice, our human subtlety often just adds noise (random scatter) or bias (systematic error).

For example, research from the 1960s used the actual case decision records of judges to build “models” of those judges, based on a few simple criteria. When they replaced the judge with the model of the judge, the researchers found that predictions did not lose accuracy; in fact, in most cases, the model out-predicted the professional on which the model was built!

Similarly, a study from 2000 reviewed 135 experiments on clinical evaluations and found that basic mechanical predictions were more accurate than human predictions in nearly half of the studies, whereas humans outperformed mechanical rules in only 6% of the experiments!⁶

The reason: human judgments are inconsistent and noisy, whereas simple models are not. Sometimes, by subtracting some of the nuance of our human intuitions (which can give us delusions of wisdom), simple models actually reduce noise.

***

In summary, we have a few key takeaways with this model:

Above all, we should seek to maximize the signal-to-noise ratio in our communications to the greatest practical extent. Speak and write clearly and concisely. Ask yourself if you can synthesize your ideas more crisply, or if you can remove extraneous detail. Don’t let your message get lost in verbosity.
Second, be aware of the statistical traps of noise:

Don’t assume that all new information is signal; the amount of data is growing exponentially, but the amount of fundamental truth is not.
When faced with substantial uncertainty, be comfortable relying on simpler, more intuitive analyses—and even consider imposing early stopping to avoid deceptive complexity.
Overfitting is a grave statistical sin. Whenever possible, try to emphasize only a few key variables or features so your model retains predictive ability going forward.

Acknowledge that while human judgment is sometimes imperative, it is fallible in ways that simple models are not: humans are noisy.

Local vs. Global Peaks: Balancing exploration and exploitation to reach our pinnacle

A local optimum is a solution that is optimal within a neighboring set of candidate solutions—a point from which no small change can generate improvement. However, this local peak may still be far from the global optimum—the optimal solution among all possible solutions, not just among nearby alternatives.

This valuable model can teach us about the inherent tradeoff between capitalizing on our current opportunities and pursuing new ones—whether in biological ecosystems, businesses, or machine learning. We can use it to better understand the complex environments we operate in, and to design more effective strategies to achieve our goals.

Getting stuck

Picture a rugged plane comprised of many peaks and valleys of various elevations, with numerous individuals or groups competing to reach the highest peaks. Nearby points tend to have similar levels of “fitness.” The landscape itself may shift dynamically, altering the peaks and valleys and transforming the available paths to reach them. This model is known as a “fitness landscape,” an extremely useful metaphor for thinking about optimization amidst local and global peaks in a variety of applications—including systems, biology, computer science, and business.¹

In complex systems (such as an industry or an ecosystem), it is easy to get stuck on local peaks as the ground shifts beneath our feet (undermining our position), especially if we fail to survey new territory. We won’t know precisely how the landscape will shift, so the only way to sustain progress in the long-term is, simply, to explore.

Sometimes, we may even have to go down (temporarily worsen our situation) in order to ascend a higher peak. And this requires a lot of courage. For example, Netflix’s stock fell by almost 80% from its peak after CEO Reed Hastings announced they were getting out of the DVD business in 2011. Ten years later, Netflix had pioneered the video streaming industry, and its stock price had grown by nearly 1,300%!

Evolutionary searches can never relax. We must constantly experiment with new ideas and strategies to find better solutions and adapt as the landscape shifts.

Faster than the speed of evolution

In biological evolution, we can visualize the competition for genetic dominance as a rugged fitness landscape in which the peaks and valleys represent the highs and lows of evolutionary fitness across an ecosystem. Higher peaks represent species or organisms that are better adapted to their environment—that is, ones that are more successful than their nearby competitors at causing their own replication.

Evolution is capable of creating remarkably complex and useful features, such as the human body’s ability to repair itself or the peacock’s brilliant tail. However, because it optimizes only for the ability of genes to spread through the population, evolution will inevitably reach only local peaks of fitness within a given environment.² It can favor genes that are useless (the human appendix), suboptimal (women’s narrow birth canals), or even destructive to the species. For instance, the peacock’s large, colorful tail that helps it find mates also makes it more vulnerable to predators.³

When the landscape shifts, even a highly adapted species will be unable to evolve toward a worse (less well-fitted) state than its current one in order to begin ascending a new, higher evolutionary peak. If the environment shifts faster than the species can adapt to it, mass extinctions can occur.⁴

Fortunately, we humans don’t need to be bound by evolutionary timescales. Often, we can find better hills to climb.

Let’s look to computer science and business to see why.

Getting un-stuck

Algorithms provide useful insights into optimization and into overcoming local peaks.

The simplest optimization algorithm is known as “gradient ascent,” in which the program just keeps going “up.” For instance, a video site such as YouTube might be programmed to continue recommending videos that resemble your past content consumption. But “dumb” algorithms like this one maximize only short-term advantage, leading us to local peaks but not to global ones. What if the user’s content preferences change? What if the viewer gets bored by stale recommendations? What if repetitive videos trap the user in a filter bubble?

Randomness and experimentation can help us “pogo-jump” to higher peaks that simple gradient ascent would not reach. For example, a “jitter” involves making a few random small changes (even if they seem counterproductive) when it looks like you are stuck on a local peak, then resuming hill-climbing. A “random-restart” involves completely scrambling our solution when we reach a local peak—which is particularly useful when there are lots of local peaks.⁵

Perhaps our video site should recommend random pieces of viral content even if the viewer hasn’t watched similar clips previously. Or show clips that contrast sharply with past viewing habits (for nuance or contrarian content). Only experimentation can reveal whether we are climbing the best hill.

The explore/exploit tradeoff

In business, it is useful to picture the strategic environment as a rugged landscape, with each “local peak” representing a coherent bundle of mutually reinforcing choices.

Every organization needs to balance experiments in exploitation of its current businesses with experiments in exploration for future innovations. In the short-term, simple “gradient ascent” strategies (keep going up) help ensure the company is exploiting its current strengths and opportunities. Over the long-term, however, companies must make occasional medium- or long-distance “pogo jumps” to prevent getting stuck on local peaks and, sometimes, to make drastic improvements. The key problem with many organizations is that when the environment seems stable, they stop experimenting because it seems costly and inefficient, and because it sometimes creates internal competition.⁶

This was Reed Hastings’s revelation about Netflix in 2011: its wildly successful DVD-by-mail business was merely a local peak. The landscape had shifted. The new global peak, he believed (correctly), was streaming.

***

The overall lesson is that because the environment is uncertain and always changing, good strategy requires individuals and organizations to carefully cultivate and protect a portfolio of strategic experiments, creating valuable options for the future.

Even when it seems we are at a “peak,” there may be even higher peaks that we cannot yet see, and the peaks themselves are constantly shifting! In such an environment, complacency is a death sentence.

Counterfactual Thinking: Think like a robot can’t

A counterfactual is a “what-if” scenario in which we consider what could have been or what could happen, rather than just what actually happens. What if I had never met my partner? What if the US had never invaded Iraq? How might our customers react to a price increase?

Thinking in counterfactuals is a quintessential exercise in human creativity. By imagining what is possible or what could have been under different circumstances, we can unlock new solutions, better evaluate past decisions, and uncover deeper explanations of the world than we could by analyzing only what happens.

No, AI isn’t about to take over the world

To appreciate the power of counterfactual thinking, consider the capabilities and limitations of artificial intelligence (“AI”) technologies, specifically generative chatbots such as ChatGPT.

These chatbots leverage intricate machine learning (“ML”) networks to simulate human conversation. For instance, if we feed a chatbot millions of lines of dialogue from the Internet—including questions, answers, jokes, and articles—it can then generate new conversations based on patterns it has learned and convincingly mimic human interactions, using mindless and mechanical statistical analysis.

However, despite exponential improvements, AI chatbots struggle mightily with genuine counterfactual thinking. Unlike humans, their ability to explain the world or dream up entirely new scenarios is constrained by their explicit programming.¹ They cannot disobey, or ask themselves a different question, or decide they would rather play chess. Instead, they assemble convincing responses to prompts by reassembling their training data. Until we can fully explain how human creativity works—a milestone we are currently far from reaching—we won’t be able to program it, and AI will remain a remarkable but incomplete imitation of human-level creative thought.²

“Becoming better at pretending to think is not the same as coming closer to being able to think. … The path to [general] AI cannot be through ever better tricks for making chatbots more convincing.”
David Deutsch, The Beginning of Infinity (2011, pgs. 157-158)

Contrary to the popular AI “doomsday” paranoia, this idea paints a hopeful picture. While digital systems continue to automate routine tasks³—such as bookkeeping, analytics, manufacturing, or proofreading—our counterfactual abilities enable us to push the boundaries of innovation and creativity, with AI as our aid, not our replacement.

We should, therefore, dedicate ourselves to solving problems that require our unique creative and imaginative powers—to design solutions that even the most powerful AI cannot. We did not invent the airplane, nuclear bomb, or iPhone by regurgitating historical data. We imagined good explanations for how they might work, then we created them!

Error-correcting with counterfactuals

Over countless generations of genetic evolution, our brains have developed remarkable methods of learning. The first, more “direct” method of learning is through rote trial-and-error, which can help monkeys to figure out how to crack nuts or chess players to devise winning strategies.

The second, which is a human specialty, is through simulation—using hypothetical scenarios to evaluate potential solutions in our minds. Because blind trial-and-error can sometimes lead to tragedy, an effective simulation is often preferable, and sometimes necessary.⁴ This strategy is evident in flight training, surgical practice, war games, nuclear reactions, and natural disasters.

In fact, counterfactuals are crucial to the knowledge creation process. It always starts with creative guesswork (counterfactuals) to imagine tentative solutions to a problem, followed by criticism of those hypotheses to correct or eliminate bad ones. We evaluate a candidate theory by assuming that it is true (a counterfactual), then following it through to its logical conclusions. If those conclusions conflict with reality, then we can refute the theory. If we fail to refute it, we tentatively preserve it as our best theory, for now.

Let’s try it out with a problem of causality. We’ve all heard that “correlation does not imply causation,” but our instinct to quickly explain things in terms of linear cause-effect narratives still misleads us. To truly establish causality, we must turn to counterfactual reasoning. In general, we should conclude that an event A causes event B if, in the absence of A, B would tend to occur less often.⁵

Consider the claim that vaccinations cause autism in children, a tempting headline for the conspiracy-minded. Following our counterfactual logic above: if this theory were true, we should expect autism to be more common among vaccinated children. However, the evidence suggests that rates of autism are essentially equivalent between vaccinated and unvaccinated children.⁶ In reality, vaccination administration and the onset of autism simply happen (coincidentally) around the same age.

“The science of can and can’t”

A thrilling application of counterfactuals comes from physics, where theoretical physicists David Deutsch and Chiara Marletto are pioneering “constructor theory,” a radical new paradigm that aims to rewrite the laws of physics in terms of counterfactuals, statements about what is possible and what is impossible.

Traditional physical theories, such as Einstein’s general relativity and Newton’s laws, have focused on describing how observable objects behave over time, given a particular starting point. Consider a game of billiards. Newton’s laws of motion can predict the paths of the balls after a cue strike, based on initial positions and velocities. However, it remains silent on why a ball can’t spontaneously jump off the table or turn into a butterfly mid-motion.

Constructor theory goes a level deeper. Instead of merely predicting the balls’ trajectories, it might explain why certain trajectories (such as flying off as a butterfly) are impossible given the laws of physics. By focusing on the boundaries of what is possible, counterfactuals enable physicists to paint a much more complete picture of reality.

Using counterfactuals, constructor theory has offered re-formulated versions of the laws of thermodynamics, information theory, quantum theory, and more.⁷

***

In summary, we should embrace our beautifully human capacity to imagine worlds and scenarios that do not exist. Counterfactual thinking can spark creativity and innovation, help us reflect on the past, enable better critical evaluations, and even reimagine the laws of physics. Plus, it is the best defense we have against automating ourselves away!

What world will you dream up next?