Web Developers Have Forgotten How to Engineer

Go to a large website from a non tech company and you'll immediately see what I mean. Car rental companies, airlines, financial software, nearly all of these pieces of software are deeply technically flawed. They are slow, they take long to load, and most of all, they're buggy. This uncomfortable slowness and bugginess has become the pervasive background of our online lives. Even software produced by large tech companies, whose revenue comes directly from their products, show these failings.

Large tech companies are constantly making insane technical decisions, and producing buggy software. Open up any modern first party Windows application and you will see it visually load as it launches. Even for (what should be) incredibly basic desktop programs like the text editor. Computers we have today, running nearly identical software in terms of functionality as in 90s (e.g. MS Word), don't feel orders of magnitude faster to use. In many cases, they feel slower than they did on old PCs.

This should be alarming. Many people will counter this argument and say that we have higher resolution monitors, more data, and more complex software. This ignores the fact that monitor resolution has maybe increased on the order of 8x over the nineties, while processor speed has increased on the order of 1,000,000,000x since 1995. Word certainly has more features than in 1995, but more features does not mean that each individual feature is more demanding. The features that Word offers today are not billions of times more intensive than they were in the 90s. A word document is also not needing to store and process billions of times more bytes than in 90s. Succinctly, most features across modern software do not warrant billions of times more compute power over thirty years ago, but software feels worse than ever.

One of the primary reasons is that over time, companies have prioritized augmenting existing systems to deal with scale and complexity, instead of re-engineering them to be simpler. In the modern day, developers think of their software as a stack - layers of software that they stitch together into a working system. These layers are themselves bloated and complicated. What should be a straightforward program to respond to socket requests somehow has become a Javascript application using NodeJS, requiring thousands of dependencies. This exponential increase in the complexity of software is part of why software's increase in inefficiency has beaten the billion fold increase in efficiency of hardware. Software complexity has been driven by developers and organizations taking advantage of hardware efficiency gains to make their lives easier and more convenient.

Writing a web server with Javascript in NodeJS is much simpler than trying to do the same in C++, or C. But, critically, it's not billions of times simpler. It's not even a hundred times simpler. Modern software is making a poor tradeoff between performance and simplicity. And it's not just language/technology choice, popular libraries across most languages demonstrate this same trend of increasing their complexity and their inefficiency as time progresses. This is driven by developers continually trying to solve complexity by shoving it behind some simple interface. This is always easier than actually simplifying the system.

These towers of complexity atop which developers precariously balance themselves are unreliable, slow, and hard to understand. This is part of why modern software is so buggy, modern sofware has millions of lines of potential points of failure - mainly from all the program's dependencies. But, this doesn't fully explain why modern software is in such a bad state. Even with this mountain of complexity, it's still possible to create excellent, reliable products. The modern tech industry has forgotten that software is an engineering discipline, and should be approached in much the same way that an industrial engineer might build a bridge. There are fundamental engineering principles that are almost completely absent in software, whose absence significantly contribute to the mess we're in.

Developers and organizations need to develop a proper engineering mindset when building software. A mindset which focuses on an allergy to complexity, and a fixation on pushing software to its breaking point.

Complexity is The Enemy

Back in college I had the opportunity to contribute to research that was investigating the viability of putting massive arrays of solar panels into space. The idea was that you could create a panel which could capture light on one side, and beam that captured energy in an optimized way back down to Earth. If you could create a giant array of these panels, you could potentially create a source of renewable power that was more efficent than land based solar panels. One of the primary engineering challenges to building a giant array like this is repairing panels when they fail. So, I was tasked with designing a robot that could crawl across this giant array and swap out broken panels with working ones.

When you're designing a robot to operate in space, you want to make its design as simple as possible. Simplicity in this context could be formalized as designs which complete their objective with the fewest points of failure. In the case of a robot, this means as few moving parts as possible. Less moving parts means lower chance that one of them will fail over a given period of time, also means cheaper to manufacture.

Systems which are physically engineered feel the value of simplicity much more tangibly than those which are virtually engineered. The marginal cost of complexity is simply much lower in software than it is in other engineering disciplines. The issue in the modern day is that many companies view this marginal cost as effectively zero.

Simplicity in software can be formalized as performing the necessary function while both minimizing the conceptual overhead of the code and its abstraction from what the computer is actually doing. Most software developers are trigger happy with abstractions, when this is abstractions are what you want to minimize. Abstractions should only appear when the code becomes too confusing to reason about. They always have a cost, either in performance, or in your ability to reason about what the computer is actually doing. Every third party dependency you add to your project increases its complexity. Every abstraction which distorts your understanding of what the computer is really doing actively harms the performance of your application.

A pervasive modern philosophy is to abstract away the computer. In some cases, this is a great idea. Especially if you want to write cross-platform code. But, when developing systems to allow developers to write cross platform code, commensurate effort is not done to help those developers reason about what's actually happening on individual platforms. The most extreme example of this is with interpreted languages like Javascript and Python. In these languages the concept of the computer having memory, a CPU, a GPU, are almost completely hidden from the programmer. How can a Python programmer learn to reason about how their code is actually running? How can they make effective abstraction decisions? Python has facilities to help with this, but it's clear that the original intention of Python was to provide this high level abstract programming environment. There are many great things about Python, but it's worth noting that Python's design revolves around some pretty extreme tradeoffs that only make sense in certain niches (certainly not in web servers).

Abstractions reduce the control that a developer has over their system, they hide performance issues, and have the potential to make systems more unreliable and harder to reason about. Abstractions should be used to regularize a system, they should rarely be used to wrap a system.

Hiding complexity behind an abstraction is almost always a poor motivation, reducing complexity by formalizing a system is a better motivation. An electrical engineer building a robot shouldn't take a complex piece of circuity and wiring, stick it in a box, and provide some simple buttons. If they value simplicity they would modify the system to have fewer boards, the robot to have fewer moving parts, and all the boards, wires and systems to be made out of a smaller variety of components.

(Aside) Ignore Clean Code Principles

The most important metric while coding is conceptual clarity, matched only by runtime performance. Programmers should constantly be striving to write code that minimizes the amount of state another programmer would have to keep track of when reading each line. Code can be complicated, and difficult to understand. But that difficulty should come from the intricacies of the domain, not the control flow of the program. Clean code principles ask people to focus on patterns that don't actually relate to the runtime performance of their program, or to its conceptual overhead (as defined above). A 1000 line function for your rendering pipeline, with clear comments delineating each major section, is going to be easier to reason about than a bunch of small functions nested within each other. The same argument goes for opting for guard clauses over nested if statements. Programs should be as flat as possible, both in a module, and in terms of how modules reference each other. Abstraction levels should only emerge when absolutely necessary.

Torture Your Systems and Measure When They Fail

Simpilicity is paramount to creating great software, but it does not yield reliability or performance. Reliablity and performance arise in software the same way it arises in other engineering disciplines, rigorous simulations and testing. A developer might write a simple users table for an admin page on a website, verify the can create, view, update and delete the user, and call the feature complete. That's like an industrial engineer building a bridge, driving their car over it, and saying that it's done. It's not done. Next week you're going to have hundreds of cars crossing that bridge every ten minutes. If the engineer did not formally test and simulate their bridge, and provide measurements on its expected performance, they would be sued.

You might say that software doesn't need this kind of rigor, especially for business CRUD applications. Web applications can always be patched and fixed, and no one is going to die if my B2B SAAS application has pagination bugs. Applying this mindset to the bridge example (ignoring safety concerns), it's basically like intentionally building a shitty bridge and indenturing yourself to driving out there and repairing it every month so it doesn't collapse. If your team had made a small and smart investment in doing PCA and simulations on the bridge ahead of time, you could be building other bridges, instead of maintaining the ones you've just built.

Software has to be simulated and tested before you ship it. Thankfully most organizations realize this and have a QA team. The problem is that a lot of software companies have completely lost the plot on how to actually test software. Companies focus on nice metrics that give a false sense of security, like code coverage, to establish the reliability of their systems.

Unit Testing is Overrated

Coming back to the bridge example, unit testing is equivalent to the industrial engineer providing a guarantee that every part in the bridge is compatible and up to spec. It's great that the threading of the bolts match the threading in the steel beams, but this doesn't tell me anything about whether the bridge is going to stay up when Walmart sends three semi trucks over it. Sure, a bridge that has parts which fit together is more likely to support the weight of three semi trucks, but it's clearly not the question we should focus on. Unit testing is certainly helpful, especially in establishing additional compile time guarantees. But code coverage only weakly correlates with reliability.

More useful is systematic and regular load testing and fuzz testing across slices of the system. If you're building a user table, part of saying that feature is complete involves writing database seeding code to blast that table with 10k users and see if your system breaks down. While you're blasting your table with users, you're also randomly generating that user data to try and catch errors. This is not hard code to write, and saves enormous amounts of time and worry in the future. Game developers constantly are writing creative tests to push their systems to their limits.

Think about NASA and the enormous investments they do in creating technology they can use to stress test their systems. Platforms to try and shake their satellites apart, vacuum chambers, facilities to monitor and test their engines, etc. Part of making your feature is engineering technology to test your feature. It's not production ready until you have some confidence about when it's going to need to scale, and what kinds of inputs it can handle.

When designing your repos, don't design around unit testing, design around making it easy to write software that can push your internals to their limits. This might mean writing your system in terms of packages that you can reference in other binaries designed to test your internals. It could also mean using testing frameworks not for unit tests, but for larger scale load testing or fuzz testing. There's no prescriptive solution here, the main point is that you need to write software that can push your program to it's breaking point.

How We Solve the Problem

The original DOOM was created in about a year by around a half dozen engineers at Id Software. When DOOM shipped, it shipped on physical media, with no opportunities for the team to send out patches. DOOM was powered by revolutionary technology, was extremely performant, and extremely reliable. This is a remarkable achievement. Id Software was able to do this because they actually engineered their software.

Developers had ownership over each part of the system they were working on, they were highly motivated, and would continually test and stress what they were building. The team were able to effectively engineer a system that did not compromise control or transparency over the actual computing. Id produced, and still produces, extremely high quality software.

Fixing the problems we face in modern development involves fundamentally changing how many developers understand good software. Good software avoids dependencies, it stays close to the computing, it proactively simplifies. Good software is constantly measured, tested, and pushed to its limits. No feature or story is done until the developer can offer guarantees on its behavior. These aren't radical ideas, these aren't asking "too much" of software. These strategies produce better software that saves everyone time and money in the long run. It's simply asking software developers to be software engineers.

sawyer-p

Engineer Your Software