Terraform, Kubernetes, and Helm are powerful solutions that changed the way we organize and deploy cloud systems. By moving cloud infrastructure configuration to code, we can manage configurations via git, then deploy using a continuous deployment service. Lo’ and behold, the idea of Infrastructure as Code (IaC) was born. This innovation made cloud system deployment dramatically easier, and we never looked back.
But we have a new problem: fragmented and static configuration code. When configuration code is removed from its architectural context, there’s no way to easily grasp the purpose it serves in the broader system, which limits our ability to make edits mindfully. And we’re not just talking about a few scattered configuration codes– we’re talking about hundreds of files. More and more configuration files that engineers have to find, examine and edit for every change, from one deployment to the next.
The Current State of Affairs in Cloud Deployment
Where Are These Configuration Files Coming From?
Configuration files grow as a result of a product becoming more complex in support of the business it serves. When your product complexity increases, your cloud system complexity increases, too.
Unlike horizontal scale, in which a system is expanded to accommodate more users and traffic, system complexity is the result of increased functionality. For example, a brand new application doesn’t need user account management features. In the beginning, an engineer can fix a user's account by manipulating user data directly in the production database. But this is risky and time consuming, so before long, the system needs a customer account management application.
Of course, customer account management is not just a few features. Extending billing capabilities and different payment options, invoicing, compliance, GDPR data exports, user data deletion, data privacy, improved security like 2FA, tax, marketing insights… all of this requires adding new databases, queues, file storages, and many other services to a cloud system. It's been only a few paycheck cycles, and suddenly, a simple system with "just" 50-60 cloud configuration files balloons into the 100-200 range. And it just keeps growing.
Configuration Code is Static and Fragmented
Applying the term “code” to configurations is a bit of a misnomer. Configurations are most often just data that represents desired parameters. Configuration code has virtually no ability to link values between files.
Imagine a spreadsheet without formulas and links between sheets. Each cell is an independent isolated value. Now imagine a financial model built with such a tool. Even though there are sheets with many cells that depend on data in cells on other sheets, they have no functional relationship. Cascading changes have to be made manually. A change to one value might mean changing tens or even hundreds of related cells manually.
How often would you use that static sheet to experiment or test your hypotheses if it takes two hours of brain-burning to calculate each value manually? You probably wouldn’t. The very same goes for cloud configuration files. Cascading changes between dependent services are all done manually. Good luck making changes in production. The current best practice is to "hope for the best," see what broke, then fix it.
Deployments Multiply Configurations
We always need multiple deployments, including testing, staging, and production. Each deployment environment contains a completely new set of independent, fragmented files. Remember, they are not linked. Changes to one environment do not propagate to another without manual intervention.
We can reuse some code that's been distributed through modules, and that reduces the rate of growth of configuration lines it takes for each deployed environment. Code modules also help by reusing the same configurations for the same type of services. But you still end up copy-pasting other parts of the configuration code to instantiate that service in the cloud. There’s no way around it.
Each deployed service in a given environment either has independent files that have been written from scratch, or copy-pasted and edited. As with understanding and accurately capturing the dependencies of configurations in a single deployment, when spread across deployments, the manual work (and the potential for errors) for even the simplest of changes, multiplies significantly.
"A Source of Truth" Argument
One of the key benefits of IaC is that configuration files provide a source of truth for how a cloud system operates. That is what we hear in the industry. And it is true, to an extent. But the source of truth for what? For the Software System or the Cloud Infrastructure? What impact does IaC have on defining the CI/CD system? What impact does it have on the development environment?
More importantly, how do configurations as a source of truth help keep all deployment environments in sync? With IaC as our tool, there is no way to ensure changes to one deployment are properly reflected in all others. For example, when we add a database in the staging environment, we cannot ensure a production-grade database is also provisioned in the production environment which serves the same architectural purpose with properly configured services. IaC, as it works today, requires human contextual understanding to know when and where configurations apply. And then a human needs to apply them.
The Wrong Explanatory Framework
It is worth restating that Kubernetes, Helm and Terraform have changed the way we deploy cloud systems for the better. Infrastructure as Code is a very helpful next step in the evolution of cloud software systems, but it is not the last step. As we have seen through this discussion, the ability to treat configurations as code allows us to do new things, but that does not remove the limitations of working on the configuration level.
Cloud system “management-by-configuration” leaves us in the weeds of complex systems, and because of that, the complexity always catches up with us. As dependencies multiply, deployments become more challenging, and over time, create MORE manual work for developers, not less.
This is why we argue that IaC is, in effect, just a faster horse, and not the breakthrough cloud systems need. Instead, we need to elevate our frame of reference from configurations to components and systems. We already think in terms of components and systems, but our day-to-day work doesn’t reflect that, creating inefficiencies that ripple through every stage of the development cycle.