A manager’s guide to Kubernetes adoption

It is impossible to escape hearing about Kubernetes in your organization. The red hot container orchestration technology from Google is regularly part of elevator talk and water cooler conversations. It is not often that you find a singular technology that the rank-and-file, from the DevOps intern to the CTO, swear by whether they fully understand it or not. But randomly reading about Kubernetes can leave you confused and you may fail to understand what the hoopla is about.

There are real benefits you get from the orchestration, auto-scaling, server utilization and self-healing capabilities that Kubernetes provides, and these aren’t worth building on your own.

But while many present Kubernetes as a silver bullet, that’s far from the truth. Whether you can adopt Kubernetes or not, and how deeply, depends on multiple factors. I intend to throw light on some of the key ones. I shall also give you a step-by-step approach to gradually evaluating and introducing Kubernetes in your organization. Kubernetes is great technology and there are significant benefits from it if employed properly.

Pets to herds

There were days when servers were lovingly named and taken care of by system administrators. Cut to the present. We don’t really know what type of servers run our workloads since those gray, rack-mounted boxes in cold server rooms and data centers have given way to public Clouds run by Amazon, Microsoft and Google. System administrators did continue naming virtual machines running on public Clouds as well, like they would do their pets.

Earlier, if servers were required, it could take weeks of talking to salespeople over the phone or email. But with high-performance virtualization and public Clouds came APIs. A simple API call could spin up a machine in seconds. And it all took off from here. Soon enough, clever tech folks realized that public Clouds provided not just quick access to compute but also the potential of automation via simple-to-use RESTful APIs. This was pure gold.

 

Infrastructure as code and DevOps

Then we saw technologies such as Chef, Puppet, Ansible and SaltStack. With these, the line between dev and ops began to blur. While system administrators rarely ventured beyond shell scripts, systems like Chef, Puppet and Ansible were full-blown system orchestration frameworks. While Puppet uses a domain specific language, or DSL, to let system administrators define the end state of their infrastructure setup, Chef uses a Ruby language-based DSL that pulls the full power of Ruby’s expressiveness to define infrastructure end state. 

These technologies put us squarely in the era of declarative and imperative infrastructure, where it was possible to define the end state of server infrastructure using code in files, which could then be versioned like regular source code. This is very different from system administrators setting up servers manually, and then logging into them to set up various software services and hardware resources such as storage or networking. 

While the best they could do was run shell scripts that provided some repeatability, Chef and Puppet provided centralized management of system configuration and powerful APIs that made managing a large number of servers less cumbersome.

The rise of Docker

Another powerful technology emerged that made it easy to use Linux primitives to put together and manage containers: Docker. Docker is not a container technology. Linux containers are built using Linux’s operating system primitives–CGroups, or Control Groups, and Namespaces. Docker made it easy to build and manage containers. 

With Docker, it became easy to package an application along with all its dependencies and containerize it. Much like with a real container, ‘shipping’ this became easy. This container could then be run on any Linux machine without the pain of having to first install all of its dependencies, which could be Linux distribution specific. 

This made it easy to run Nginx from Debian Linux and pair it with the Python Flask framework from Ubuntu while using MySQL from Alpine—these packages coming together to run the user’s application, all running on the same Linux server. Docker became a standard way to build containers declaratively and then run and manage their lifecycles on Linux servers, harnessing Linux’s underlying container technologies and providing tremendous utility. Docker, understandably, was a rage when it first came.

While technologies such as Chef and Puppet were good at managing the configuration of physical and virtual machines, containers were quickly becoming the standard way DevOps engineers deployed applications. Also, containers encapsulate much of the application deployment logic, making the rest of the orchestration a lot simpler. The time was right for a more modern container-centric orchestration system.

The DevOps era

Another methodology was rapidly evolving that would move ops more into the realm of the developer: DevOps. Engineers had figured that while the ops folks wanted stability, developers wanted new features to be released all the time, challenging the stability ops sought. These were conflicting goals and often put the teams at loggerheads, directly affecting product shipping velocity. 

Also, when product developers aren’t aware of all the problems the code they write causes in the production environment, they don’t tend to put in the effort to fix it. This was because the production environment was traditionally the ops team’s headache. To solve these problems, teams started trying a new methodology, DevOps. 

The idea was simple: those who wrote the code ran it in production. They were the ones who were put on call, too. No more lobbing the code over a wall and forgetting about it. If they wanted to release new features more often, they figured out what they needed to do on the ops side to support that. Technologies such as the Cloud and containers, with their APIs and tooling, made it possible to bring a software engineering approach to Cloud operations.

Fast-moving DevOps teams were building large applications often comprising microservices that could be independently developed and released without incurring the overheads of changing, testing and releasing one large monolithic app. Teams practicing DevOps using the microservices architecture loved the easily moldable elasticity of the Cloud and containers that were easily controlled with scriptable tools and APIs.

Enter Kubernetes

With DevOps and microservices meeting containers, it made sense to have a new, more native orchestration system. Inspired by Google’s Borg system, Kubernetes is an open source container orchestration system built initially by engineers from Google. It is now maintained by the Cloud Native Computing Foundation, or CNCF. While systems such as Docker manage containers within a server, Kubernetes does much more.  As a container orchestration platform, Kubernetes can help automate application deployment, scaling, and management. One of its main features is managing a cluster of servers or nodes on which containers can run. Comparable systems are Apache Mesos and Docker Swarm. While there were debates as to which system would become the standard, Kubernetes has emerged as the winner. Kubernetes for your container orchestration strategy is a safe bet at this point.

Not just Docker

It is important to note that Kubernetes can orchestrate containers managed not only by Docker. It can also orchestrate containers managed by systems similar to Docker such as ContainerD, Cri-O and RktLet. While you can create and manage containers with any of these systems, for the purposes of this article, we will substitute ‘Docker” for ‘container management’ in general. Now, let’s look at some of Kubernetes’s most important features.

Why Kubernetes

To understand why a system like Kubernetes is important, it might be a good idea to think about where the capabilities of Docker end. While Docker makes it easy to manage the lifecycle of containers within a server, Kubernetes makes it easy to manage a cluster of servers running Docker containers. 

Also, modern, microservices-based applications are usually made up of several containers. Kubernetes provides a notion of an ‘application deployment’—essentially a set of containers that make up an app—running in a distributed fashion on a cluster. When you want an application made of up a bunch of containers to run, you just tell Kubernetes about it and it can find which nodes on the cluster have enough compute resources to run the containers and it schedules them there. 

Kubernetes can also take care of restarting containers when they fail or even scaling your app by running more containers to take on traffic surges. 

This is the essence of Kubernetes and this is what people are referring to then they’re talking about ‘container orchestration.’

When running multiple applications distributed on a cluster, there are other nice-to-have features that can help with easing their management. Kubernetes has features making it easier to manage application configuration and credentials. There are other bits of infrastructure such as storage and networking that Kubernetes manages. Along with compute, these are the other two blocks that make up the very basic three components of any infrastructure.

Managed or not

While it is possible to run Kubernetes on your private Cloud, it might be wiser to opt for Kubernetes distributions such as OpenShift that have the option of paid support—at least until you scale up your Kubernetes infrastructure. It is also possible to set up Kubernetes on a cluster of machines on AWS, Azure or GCP, but these public Clouds also feature managed Kubernetes offerings. 

Kubernetes is made up of multiple components. These, working together, help run the cluster of machines known as compute nodes. There will still be Kubernetes components that will run on each compute node as well. When you opt for a managed Kubernetes offering from any of the public Cloud vendors, you are generally allowed to choose any type of node from their compute offerings that will then be made part of the Kubernetes cluster. These are the nodes where your containerised workload will be deployed. The main Kubernetes master components, otherwise known as the ‘control plane,’ are managed by the Cloud provider.

Is your organization ready for Kubernetes?

This really depends on not one but a bunch of factors. Let’s go through these in some detail.

Are you on private or public Cloud?

Kubernetes can be deployed on both public and private Clouds. On private Clouds, while you can, in theory, install and maintain Kubernetes yourself, you might be better off running a distribution of Kubernetes such as OpenShift that can be supported by a vendor. While Kubernetes was born in the Cloud, and is what is known as a ‘Cloud native’ technology, it is a great choice to manage your compute cluster with. But it doesn’t really matter if you are running it to manage a private Cloud. In fact, it can be argued that Kubernetes is a great choice to manage your private Cloud.

For public Clouds such as AWS, Azure or GCP, you can run Kubernetes on a set of compute nodes by setting it up yourself. Kops is a popular solution that lets you deploy Kubernetes on a set of EC2 instances running on AWS. I strongly recommend you pass on the headache of maintaining the Kubernetes control plane to the Cloud provider by choosing to go with a managed Kubernetes offering so that you can concentrate on the mechanics of running your workloads on Kubernetes.

What’s your current level of Docker adoption?

If the apps you want to run on Kubernetes are not already containerised, it’s a non-starter. Kubernetes being a container orchestration platform, you’ll have to ensure that the apps you want to run on Kubernetes are well tested on Docker in production. Docker adoption is widespread and the tooling is very mature. There should be no questions around any risk pertaining to Docker adoption no matter how careful an IT strategy your organization has. Another advantage is that Docker, in combination with Kubernetes and if properly implemented, can drive up server utilization levels.

How mature is your DevOps culture?

The presence of a strong DevOps culture means that devs are responsible for running services they develop in production. They always look for ways to make themselves productive by finding means to automate things on the operations side. This is especially true if your app or service is based on a microservices architecture, which means there are many moving parts and teams owning different microservices need to operate independent of each other. Kubernetes is a great fit if this culture exists and should find internal adoption very quickly. 

Don’t get me wrong here. Kubernetes can run monolithic workloads just fine. The main point here is that if there are two separate teams, dev and ops, the ops team working on a completely new way of deploying and the dev team working to adapt to a build system for containerization and app configuration, then getting these teams’ efforts to fit like a hand in a glove seems like wishful thinking.

Availability of Kubernetes talent

Kubernetes can take a while to learn and only makes sense when the person learning it has already put in the effort to learn containerisation. If you are convinced that Kubernetes is for you, you’ll need one or more champions who are capable and confident of running production workloads on Kubernetes. 

But if you only have folks who’ve just scratched the surface as far as Kubernetes goes, we’ll see how you can build on this experience while reducing the risk and lay the path to a future where you’ll be running production workloads on Kubernetes. 

Kubernetes Gotchas

 

If someone calls Kubernetes a panacea, they’re not walking a middle path. Here are some things you need to be aware of.

Managed Kubernetes is not a panacea

Kubernetes is a system built out of many pieces of software working in tandem. Irrespective of whether you manage it directly or you opt for managed Kubernetes, things can go wrong. It is not uncommon for managed Kubernetes to have trouble with any of its various constituent components.

Do not assume that just because the Kubernetes control plane is managed by your Cloud provider nothing can go wrong. You can find several Cloud provider-specific Kubernetes issues assigned on Github. When something goes wrong, you might still need to contact support and get things in order. This can potentially involve downtime. Because Kubernetes components in the control plane only create and monitor containers, if they go down, they generally do not affect already running containers. For a Kubernetes control plane to go down taking with it all containers is a rare thing. You can, however, be prevented from creating new containers and from being able to auto-scale, etc.

 

Stateful applications on Kubernetes are still evolving

Kubernetes was made for applications that create and destroy containers. A lot of containers can be created in response to a surge in traffic and be terminated once things return to normal. The same is true for background job runners. It is meant to be a dynamic environment where servers are truly treated like herds—a far cry from the days of naming servers lovingly. In that sense, Kubernetes’s support for more stateful applications like databases may seem like an afterthought. It is an area of active development and is fairly stable now, but one should not be surprised if the way stateful applications work under Kubernetes continues to change relatively rapidly in comparison to other areas.

It is possible to allocate persistent volumes in a Cloud provider neutral way for use in your stateful applications, which are natively supported by Kubernetes.

If you want to use stateful services managed by the underlying Cloud provider (eg: RDS, DynamoDB, etc.), the native way to do it with Kubernetes is to use a Service Catalog, which makes it easy to consume managed services from Cloud providers.

Kubernetes upgrades

A quick web search should turn up enough horror stories around Kubernetes cluster upgrades, making you want to hug your existing cluster setup. The best way might be to recreate a cluster with the same version that powers your production cluster, install your critical apps there, and upgrade this cluster to check if everything turns out fine. Only then upgrade to that shiny new version of Kubernetes on your production cluster. Let’s face it, if you’re serious about Kubernetes and the benefits it brings, cluster upgradation is something you can’t escape. So, plan and execute.

Many moving parts

Take virtualization. It is an abstraction we take for granted. An abstraction we are comfortable using. In fact, when someone refers to a ‘machine’ or ‘server,’ they are most likely referring to a virtual machine. For applications, it is very much possible that Kubernetes becomes the new, standard substrate. A new level of abstraction that becomes the new normal. With virtual machines, though, with most large systems standardizing on Linux’s KVM technology, it is very much part of the operating system layer. Although there are other components involved, they are fairly low level. It is very unlike Kubernetes where a dozen services are talking to each other across a cluster of machines, doing fairly sophisticated things with compute, storage, network, and auto-scaling.

When problems rear their ugly heads, you might have to roll up your sleeves and peer under Kubernetes’s hood to make sense of what’s going wrong. We just have to assume at some level that all the different versions of these components that Kubernetes uses are somehow in harmony with each other. Or, at least, that is what the Cloud provider would like us to believe.

Retaining Kubernetes talent

If you are itching to jump headlong into Kubernetes and you do it with just one person behind you, you’re taking this jump at your own peril. While Kubernetes proves itself worthwhile for your use case, it might be a good idea to get your whole DevOps team trained or self-trained on Kubernetes. Who in their right mind would want to pass on an opportunity to get trained and then work on a red hot piece of technology? To be on the safer side, it is important that you have a team of DevOps folks who are comfortable in dealing with Kubernetes so that you have continuity should your lone Kubernetes person chooses to move on. That happens a lot, trust me. If Kubernetes is listed as a skill on their LinkedIn page, recruiters won’t take too long to call.

Introducing Kubernetes into your organization

If you’re convinced you can benefit from deploying Kubernetes in your organization, here are some steps to bring in change in a manner that lets you organically understand Kubernetes and what it takes to run it to manage production workloads.

Train or hire Kubernetes talent

You need people with hands-on Kubernetes knowledge from your DevOps team to execute your plan. Given the availability of high-quality training material online, they can train themselves or go through a formal training program. You should check with your Cloud provider if they can train specifically for your organization. You might also have other paid options in your area.

Hiring Kubernetes talent is another option. Look for folks who’ve run production workloads under Kubernetes. While hiring, you might want to discuss the path they took to production and any challenges they faced while running production workloads on Kubernetes. Ask if they ran any stateful workloads. Based on this discussion, you should be able to figure out if they might be a good fit for the projects you have in mind.

Move workloads to Docker

If the workloads you want to move to Kubernetes are not already running either in your staging or in your production environments on Docker, by moving these workloads directly to Kubernetes, you would be attempting a double leap. If you have any trouble, you won’t be able to exactly point out if it’s being caused by build-time, run-time or configuration problems. On the other hand, if your team has worked on containerizing your workloads and has run them in production with Docker you’ll have fewer variables to deal with, should you have trouble.

Run non-production workloads on Kubernetes

Now that your workloads are containerized, you’re good to move some non-production workloads such as those on dev and staging to Kubernetes. This way, your larger team can get used to the new environment, get Kubernetes cleanly integrated into your Continuous Deployment pipeline, etc. 

Stateless workloads first

When it comes to moving production workloads to Kubernetes, it might be a good idea to start with stateless workloads first: containers that merely serve application requests and do not directly persist any data. Running stateful workloads on Kubernetes requires deeper Kubernetes expertise and you can build that first within your organization by running stateless workloads. With stateful workloads, you need to plan a bit more carefully about, say, how you’ll manage when nodes go down. This will be especially complicated with distributed stateful apps such as Elasticsearch, for instance.

Move non-critical workloads

It is also a good idea to move non-critical workloads initially on to Kubernetes and let your team gain experience running production workloads on it. There are other aspects to running workloads in production such as monitoring, alerting, and application upgrades that they’ll need to figure.

Make the big leap

We saw how you can adopt Kubernetes in your organization in a fashion that builds expertise while reducing risk. We also discussed how to get ready from an engineering point of view to containerize and slowly introduce production workloads onto Kubernetes. You are the best judge to figure when exactly you’re ready to do this. Each organization’s adoption versus risk ratio is different and you should use your best judgment. To automate the deployment of Kubernetes clusters themselves, my recommendation is you use a tool like Terraform.

In conclusion

The best laid plans take into account worst-case scenarios. That’s the main idea behind this article: to tell you what could potentially go wrong and how you can mitigate those risks in your Kubernetes strategy. Kubernetes is great tech. There are many areas where it will most certainly help you. But as with any technology on which you intend to run production workloads, you’ll want to weigh your risks and see if the benefits outweigh them. Resources such as Kubernetes Failure Stories are useful in figuring out what the most common problems might be and even how you can work around them.

This blog was originally posted on unixism.net