More is not always better

My college years were spent studying rocks. While earning a degree in Geology, I was occupied with courses in physics, chemistry, and mathematics. These foundational sciences provided the context to begin to understand larger systems like climate, glaciology, and hydrogeology. I learned to deconstruct and explain outcomes by reducing them to core principles informed by my understanding of foundational mathematics and science. Now, after centuries of scientific deconstruction, the biggest challenges we face often involve reconstructing complex systems from these simpler parts. As Steven Strogatz argues, many of our largest challenges now involve understanding complex nonlinear systems with emergent behaviors rather than understanding just the simpler parts that constitute them.

Geological hammer, WikiProjekt Landstreicher Geotop Eistobel 01 (cropped)
Geology in the field

Software systems can be complex, even when the constituent parts are simple. Just like in natural systems, behaviors can arise giving us surprising or potentially unwanted outcomes. Common ways to address these behaviors is to focus on better testing and observability while developing for graceful failures. My personal favorite method, guided by my work in the space of cost modeling and network design, is to model and simulate. With our modern tools, we can generate more tests and simulations than ever before, giving us more opportunities to stress the systems we create. When testing and simulating, consider if what is being tested is a simple system (a unit test for a specific function) or a complex system (solution built with multiple interacting services).

All of this becomes hyper-relevant in the age of AI. The models we now interact with are too large to be understood by humans. While we know how they are constructed, we cannot identify exactly why certain words are output or certain features are detected. These models are enabled by including unimaginable amounts of data, and improving them has predominantly focused on adding more and more data, because, after all, more is better, right? Yuval Harari, in his book Nexus describes what he terms the “naive view of information”. The naive view centers around a few core falsehoods: that information represents reality and that more information reveals more truth. The table below contrasts the naive view with what Harari claims to be a more accurate historically informed view of information’s purpose.

Perspective The Naive View Harari’s View (The Historical View)
Purpose To represent reality accurately. To create connections and maintain order.
Quantity More data leads to more wisdom. More data often amplifies bias and misinformation.
Function A tool for enlightenment. Often a tool for creating “intersubjective realities” (e.g., money, nations).
Truth Truth is the natural outcome of information. Truth is a rare, costly byproduct that requires effort to preserve.

*Table was generated by Google Gemini

To me, there is a common thread connecting Strogatz’ shift from simple linear to complex non-linear systems, Harari’s naive view of information, and modern technology ecosystems. Our software systems and models now rely in massive amounts of data inputs (especially our LLM tools) and are decidedly non-linear in their behavior. We need to approach these systems differently than the simple functions of the past, acknowledging the flaws identified by the naive view of information and the realities of complex system interactions. We must remember that more information doesn’t inevitably lead to more truthful or desirable outcomes. Ten years ago, Kevin Kelly identified that in an era of cheap answers, the questions would matter more than ever. Let’s make sure to ask good questions.