We deal with large data at Aeris Weather. Multiple weather metrics, every few square kilometers, updated every few hours covering several weeks, across the whole globe… oh my. And that is just forecast data. A continuous stream of observations, storm cells, fires and other data in addition to forecasts presents our engineering team with a variety of challenging scaling problems. As we take a deep breath to think about how much data that is, that is just the ingestion side. Much like you as you exhale once again, our applications need to output the data back out in meaningful ways to our customers. Kubernetes has helped us scale individual parts of our infrastructure so we can deal with multiple types of data and varying demand for different output types.
The systems we create need to scale in both directions: vertically, and even better, horizontally. Vertical scaling is easy enough – ask “The Cloud” for a larger machine, re-deploy, keep on going. It’s a manual process and one that incurs a lot of waste. Wasted system administrator time, recurring waste in over-provisioned resources, and possibly downtime if you go from a single instance and flip the switch to just a larger single instance. We would much rather automatically scale. Adding additional hardware only when needed, scaling horizontally, is much more advantageous, but certainly comes with its challenges.
“Not all data was created equal”
As you can probably guess, we don’t have a single application powering our APIs. We use the API Facade pattern – the core API sits on top of microservices. Splitting up our applications into smaller pieces has been essential to our ability to scale horizontally and individually. A lot of times these microservices correlate to data sources. Different data sources have different scaling requirements – pieces of data are in more demand than others. For example, we have found that our customers seek Observations and Forecasts much more than our other data types. By splitting up data sources, we can scale each individual piece as needed and as our customer’s needs change.
If we just implemented this microservice architecture, deploying different data sources on different machines, we would still have unused overhead problems. All of these individual servers would be running at, or most likely under, our comfort level of utilization. Leaving us with a highly scalable application but a lot of wasted server resources if you count all of the microservices. Kubernetes steps into abstract machines and presents an abstract pool of compute and memory resources to run our microservices.
At this point we have accomplished our goal of being more individually scalable, but we still have a lot of waste. This waste is extra apparent especially if we add load balancers in front of each individual microservice. Kubernetes helps us run all of our microservices (packaged as containers) on a single cluster of machines. Containers, being more lightweight than full virtual machines can run more efficiently than requiring a full virtual machine for each instance of a microservice. It sounds strange to take all the work we did making our application distributed and bring it back into a handful of servers. Kubernetes employs a scheduler that helps take care of that for us by scheduling the service across our nodes. The Kubernetes Scheduler is getting better as time goes on and the diverse team behind Kubernetes continues to make tweaks and improvements.
We strive to create a very durable service for our customers. Our durable applications need to be fault tolerant. We routinely test the fault tolerance of our applications, and my personal favorite test method is just killing off a running server. It totally pulls the rug out from under the application and it’s always interesting to see what it does to recover. We have had good luck with Kubernetes and their scheduler migrating containers (or “pods” as they call them) onto existing servers. We rely on our cloud provider to add back the server we killed. They do a great job auto-scaling our cluster and they help us monitor server load and add additional servers where needed.
Kubernetes allows us to coordinate our microservices using their namespaces for an organization. Different applications can be deployed into their own arbitrary groups using namespaces. Our forward-facing APIs can connect to the backend microservices using Kubernetes’ built-in predictable DNS service. We can also limit the number of resources a particular container (Pod) can consume, effectively isolating that pod from affecting others.
We need an abstraction layer that sits on top of our data center that lets our development teams quickly stage and prototype new ideas – an environment where they can push those ideas to a space that is robust and durable for our customers to use. Kubernetes is close. It isn’t the magic bullet we are always hoping for but it provides a lot of complicated features in a rather easy to use package. Easy to use, not easy to install, but that is getting better as they harden version 1.x. They are still missing auto-scaling at a replication controller level. That will allow the Kubernetes system to scale our services without any intervention. That will be nice once that is finally ironed out. We are also looking forward to redistribution of running pods across new nodes. That will allow us to automatically heal from rather large failures with no human intervention. All of those things are in the works from the Kubernetes team which is very active. At the time of writing, their GitHub repository, where the Kubernetes project resides, had a release 2 weeks ago and already has 3270 commits since the release. They are extremely active and pushing forward for what container architecture looks like. As we push our applications we try to provide feedback and help where we can to the Kubernetes team and it’s excited to see this space grow. Special thanks to Google, Red Hat, Linux Foundation, and all of the contributors for the Kubernetes project.