Surfing the Kubernetes wave: A beginning

As Freshworks grew from a six-member team building a single product to a multibillion-dollar organization with a suite of 11 products, so did our infrastructure and engineering heft. Along the way, we embraced several exciting technologies. But few have been as challenging, or as rewarding, as Kubernetes, the red hot container orchestration technology from Google.

In the beginning, we could well do with a simple Ruby on Rails-based app hosted on the EngineYard platform. We have since evolved and now have a large part of our infrastructure hosted on Amazon Web Services. They span multiple geographies and thousands of EC2 Instances, or virtual servers for running applications on the AWS infrastructure.

As we scaled our engineering functions to match the velocity of the organization’s growth, we needed to move our stack into a modern orchestration engine. For this, we looked to Kubernetes.

But, as it became obvious when we began researching the technology, Kubernetes is burdened with multiple moving parts and inherent complexities. We were appalled by the horror stories emerging from large tech companies that were adopting Kubernetes at scale.

Even so, Kubernetes offered promising pathways to our lofty engineering goals. It was hard to ignore. So we went ahead and began moving our workloads to Kubernetes. In the process, we learned to navigate around the moving parts and the complexities. We learned a ton of things.

With this blog, we are kicking off a series on our #Kuberneteslearnings—a journey that’s still on. So ride along!

The path to containerization

As we said earlier, our flagship app, Freshdesk, was built on the Ruby on Rails framework. This model worked well in the initial days. But as we scaled we began facing a few critical challenges:

  • Deployment velocity: We care a lot about the uptime of all our applications and services. So rolling out new versions of our applications or running feature experiments needs to happen without any downtime. This applies also to rolling out security patches to our infrastructure or performing upgrades to our software stack. Performing this directly on EC2 infrastructure started slowing us and we quickly realized that we needed to move beyond EC2/virtual machine-based deployments.
  • Immutable deployments: Another challenge lay in ensuring our EC2-based deployments were immutable. We quickly moved from our early days of ‘git clone’ deployments (which involve cloning a repository into a new directory) to ‘tarball’-based deployments (which involves combining or compressing multiple files into one). But that still didn’t provide us enough confidence that code or configuration wouldn’t drift away.
  • Cellular architectures: To maintain the global availability of our apps and provide better quality services to our customers, we started dividing our infrastructure into so-called shells (drawing from inspirations such as this). Shells are essentially identical copies of our entire infrastructure stack that can serve a section of our global traffic. We can launch as many shells as we like and provide greater levels of availability to our customers. At the same time, this gives us the ability to quickly react to infrastructure incidents. Moving to shells meant needing greater predictability and control over our environments and having no drift in configuration or code.
  • Foundation services: We also started building a number of internal services that were consumed by our products. These are foundational services such as messaging and events that were developed by different teams. These teams independently decided their technology stack and a few of them started building these services as microservices.

All of these led our teams to start adopting ‘containers’ rapidly. Newer teams started deploying their apps and services as containers from the get-go. Existing teams started migrating to containers by making changes to their deployment pipelines. We also had the benefit of ‘shells’, where we were able to roll out these changes gradually across our fleet.

Managing our infrastructure

When we started adopting AWS in the early days, we searched for a configuration management tool or service to reliably configure and manage our infrastructure. We liked Chef and adopted AWS Opsworks (The Chef platform allows for automating infrastructure on AWS). Opsworks brought in a lot of automation out of the box and at the same time had flexibility through custom recipes.

However, when we moved into the world of containers, we realized that we needed a container orchestration engine that would suit the workflows and lifecycles of containers. We needed these key features/aspects in the new system:

  • Resource allocation and isolation for our containers;
  • Security/networking model similar to the one we were used to in VMs;
  • Ability to model abstractions such as components/layers that we were used to;
  • Take care of maintaining availability and scale on demand.

Enter Kubernetes

Kubernetes has become the de-facto container orchestration engine today with tremendous support from both the community and the industry. Despite the warnings around Kubernetes and our inherent fears, we developed a strong momentum towards embracing Kubernetes. While there are underlying complexities in this system, having the backing of a large community and the support of the industry provided us the confidence to move forward. It’s now time to pay it forward.

Future blogs in our #Kuberneteslearnings series will discuss our learnings in the areas of access control, networking, deployments, tuning, observability, and what the future holds for us.

So stay tuned and watch this space!