To manage cloud systems, editing Kubernetes YAML, Terraform HCL, Helm Charts, or similar configuration files is the current “state of the art.” If you’re like most developers, you may feel like these solutions represent the best we can do. And yet, we are still plagued by cognitive overload and fragility problems that emerge from copy-pasting and editing reams of specific configuration files. Why?
Ok, so maybe there are some tools that will help with one particular issue at a stage in my process. Say, instead of copy-pasting, I can have a template that a tool will convert into a configuration file. Awesome! Now, I can get a lot more configuration files out faster and… reach cognitive overload more quickly.
This is exactly what most emerging dev tools in the space deliver; as Henry Ford would say, "faster horses."
The main question that comes to my mind is: how can parts of the system stay in sync if there is no single source of truth? Is it even possible to keep all parts in sync rather than hunting down every last dependency by hand?
What If… ?
As an entrepreneur, I learned to repeatedly ask "what if" questions to escape the current way of thinking. Because I have fixed several failing cloud systems, I started to ask myself "what if" questions about how we manage cloud systems now.
What if configuration files are not the way? If not configuration files, what else is there? We still need a way to describe our services accurately, and to manage a huge array of details that can vary widely from one service to another. Files are important because they (along with the gray matter in my head) contain the “source of truth” for my system. Both are kinda scattered. But does it have to be that way?
What if we move the source of truth from configuration files to a data structure like an Architecture Dependency Graph? Not a mere visualization but rather a way to DESIGN and actually BUILD a system from a single source of truth. Could we even do that? It feels like a good start to me. But I don't know how I feel about the ending, though. It feels limiting. Would I still be able to implement complex stuff?
What if we bring in a development framework that can manipulate and handle the Architecture Dependency Graph data structure? That could be interesting, but we need more.
What if the framework defines some entities and interfaces that, when implemented, bring real-world capabilities to the components along with links between components that are defined with the Architecture Dependency Graph? Wait a minute. So we could manipulate the graph structure to actually make changes to the system? Still– wouldn’t it just churn out more configuration files?
What if the code does all the work from start to end, from build to deployment, so I do not even care about configuration files anymore? I see, the files are generated and then executed on the fly.
What if these entities and interfaces are implemented with code libraries so we can install them rather than code them every time we need a functionality? It starts to feel real, right?
Ok, What Now?
Now that we can install libraries that bring capabilities to the components and links from the Architecture Dependency Graph, the first prominent library I would install and use would handle my local development environment. Then I would bring in libraries that deploy my test or staging environments in the cloud provider of my choice. And finally, I would bring in libraries that perform the production deployment for the same cloud provider. Oh wow, that sounds too easy. Too good to be true.
Can I have libraries that will deploy the same components and links from Architecture Dependency Graph to a different cloud provider? Does that mean I could go multi-cloud? I don’t see why not.
Can I have libraries implemented by experts that bring in deployment configurations for a complex service like Kafka or RabbitMQ? Surely better for an expert to do it than someone on my team trying it for the first time.
Can I use different libraries to deploy to different environments? Say a simple Kafka for staging and high-scale redundant Kafka for production? And by different authors? I mean, what’s the limitation on deployments at this point? Or library contributors?
Can I have libraries that estimate my cloud cost before deploying it and wait some time to see what will come out? If I can graph my deployed system, surely I could graph one I’m just thinking about.
Can I have libraries that connect my services to an observability/monitoring solution? There are lots of helpful solutions out there. Makes sense to connect them.
Can I use my setup to run integration tests in my CI/CD pipeline? Yes.
Can I install additional libraries that make my production disaster recovery-ready? I see no reason why not.
Can I install additional libraries that will expand my setup with security best practices? In time, of course.
This approach of using the Architecture Dependency Graph data structure as a single source of truth also means that my changes to the Architecture Dependency Graph propagate to all my deployment environments and to all configurations. For example, say I added a new database. Unlike with Terraform, Kubernetes, and Helm, I do not need to copy-paste code and then miss something.
It’s happened to me (maybe you, too?) that I missed adding an environment variable to a service that uses a new database. I have even missed an entire environment because it was a new environment, and I was surprised to find out it existed at all.
With Torque Framework, all I need to do is add a new database to the Architecture Dependency Graph data structure, along with links from the database to any service that needs to use it. Then Torque Framework uses installed libraries that deploy it to all defined environments. The code that manages a database deployment in a particular environment is already there. Nothing for me to do. Services get all they need, too.
My cognitive load is minimal: a few commands with Torque CLI. Like the "create a new database component" command and a few "create a new link between services" commands to link services that need access to the database. That's it.
Of course, there are always more details. But the real question is: how many details? And on what level are the details: specific or system-level?
Am I working on the top level and making my system aware of a new system-wide capability, or am I in the weeds of a specific environment fixing specific configuration parameters?
Do I have a single source of truth that describes my software system and a framework that keeps all parts of my systems in sync, or do I just hope for the best?
The answer makes the difference between having control and having cognitive overload.
I understand we covered a lot of ground here. Hopefully, it’s becoming more clear to you not only why configuration files are a problem (no matter how many templates or streamlining tools you apply to them), but also what we can do about it.
I also understand that changing standard industry practice is no small thing. And this post probably created a lot of unanswered questions for you. You can find more information in other blog posts, our videos, and documentation, or please reach out to us via the links above. We’re eager to hear from you.