How smart teams now develop software

I first learned the foundations of software engineering back in engineering school nearly 20 years ago.

Looking back now, it is clear that the basic principles underlying these fundamentals were adopted from highly successful practices that heralded the industrial revolution as we know it. It was perhaps natural to adopt these practices. After all, software was just another product, which was often literally on the shelves for consumers to buy.

Indeed, in the first decade of my career, this model seemed to fit perfectly well, as software was often shipped in CDs/DVDs. I still remember being considered revolutionary if you managed to ship new versions more than once a year. Imagine the excitement when updates started coming over the internet!

software books
Shelved: software being sold on shelves with books

Peek behind the doors, and the assembly line stared at you from the rows of computer desks that symbolized the software company.

Software assembly line
This diagram shows a typical software assembly line

 

The second decade of my career witnessed two ground-breaking waves of technology innovation that fed off each other. They include:

  1. The ubiquity of the internet as a mass medium
  2. The rapid advances in virtualization that offered computing services on the internet, better known as the “Cloud”

Soon, application servers were upgrading themselves by pulling updates over the internet. New services were offered on the internet, consumed via the browser, requiring no shipping nor any installation or updates on the part of the consumer. Updates were, however, now expected at a much faster click – think about new products on an e-commerce catalog or a fix for a bug affecting thousands of users.

Engineers at this bleeding-edge started questioning the assembly line when it was no longer helping them meet these new business objectives attached to the software they were building. The culture of DevOps emerged from these very engineers as they attempted to squeeze the assembly line shorter to enable faster cycles, while some realized the days of the assembly line itself were numbered.

Fast forward to today, and SaaS is ubiquitous in our everyday lives.

We no longer just expect but are used to waking up to new updates to the software that we access over the browser. Engineering teams are now tasked with deploying changes to eager and impatient users at a rapid click — sometimes multiple times a day.

How do teams set themselves up to meet this demand? Software teams are in dire need of new metaphors to describe how they work together.

In this first post, we will look at the metaphors we have adopted at Freshworks and the evolution in the mindset that we realized is required across the team to prepare ourselves for these demands. In a follow-up post, we will look at how one could actually trigger this evolution based on the Freshworks experience.

The Runway

When you set out on the journey to revolutionize your software practices to be able to ship more often, a few basic principles, as outlined in an earlier post can set you off on the path. These principles include:

  • A continuous integration pipeline with increasing levels of automation
  • An evolving deployment architecture that affords your engineers multiple degrees of freedom with their deployments
  • A culture of continuous improvement that empowers the team to create new ways of working

In an ideal world, software development is treated as a creative exercise rather than the act of assembling things together. Creative work requires the space to blossom and bear fruit and often requires isolated and deep work. But the act of actually shipping the software is more mundane, and indeed akin to an assembly line, except one that is ideally fully automated.

evolved pipeline
This diagram displays an evolved pipeline

This led us to the metaphor of a Runway, which also happens to be the name we have given here at Freshworks to our continuous deployment infrastructure. Many principles for this Runway can be derived from the runways used by aircraft at a busy airport.

Principle 1: There can only be one change on the runway at a time.

Principle 2: Once on the runway, a change must leave as soon as it can — either by being deployed (taking off in the case of an aircraft) or by being aborted.

A highly productive team (just like a busy airport) will likely queue up changes for access to the runway. Smart teams figure out that smaller changes are easier to load onto the runway and ship faster and more often. A strategically architected system will often allow for multiple, independent runways mapped to individual, decoupled microservices. 

Now, Principle 2 above puts the onus on preparation — how ready your change is to be deployed. You don’t expect to discover a fault in the landing gear once you are on the runway and speeding. If you however do find one, you abort. You also don’t expect to discover that you have forgotten to load paper cups for the drinks service. But if you do, you decide to apologize and take off. 

More importantly, you don’t typically switch pilots on the runway because one is better at taxiing while the other is an expert in taking off. But we often tend to do this with software releases, where too many stakeholders get involved to shepherd the release out.

Enabling the Runway

While the macro trends of the internet and the Cloud are generally well-known, micro trends in developer productivity don’t receive as much attention. Three critical trends stand out when one looks closer at how the Runway came to be at Freshworks.

  1. Infrastructure-as-code enabling fully automated deployments
  2. Observability that enables monitoring and alerting and is expanding to make it easier to narrow problems down in production
  3. An automation revolution that enables integrating any kind of quality check into our CI pipeline

A keen observer will note that these three aspects of software delivery are typically mapped to three different roles – a DevOps engineer, a Site Reliability Engineer (SRE), and a Software Developer in Test (SDET), respectively. But the same kind of software innovation that abstracted away infrastructure iteratively to bring us PaaS, has continued to abstract away the common challenges in each of these three areas — deployments, observability, automation. We are now at the point where experts can focus on building out the runway and the custom instrumentation required to signal problems while leaving our developers to taxi onto the runway and take off on their own. 

An SDET engineer can spend most of their energy on improving the performance, reliability, and maintainability of their automation by focusing on the automation frameworks themselves, while spreading wider to cover more aspects of software testing, via automation — performance tests, chaos tests, and security.

Similarly, an SRE engineer can focus on building deep expertise with critical services commonly used across the organization, say MySQL, Redis, or Envoy, or with debugging performance challenges in production, like with the JVM or the Ruby interpreter.

Thanks to the prevalence of PaaS, provisioning a cloud service through a cloud console or an API SDK or introducing it into your infrastructure-as-code specification, following a set of instructions, no longer requires any particular specialization. This allows DevOps engineers to focus on more peripheral but emerging challenges like security, compliance, and cost engineering.

Things can still go wrong in ways that require tearing apart the abstractions developers use as part of their continuous deployment (CD) pipelines. SDETs, DevOps, and SRE will play a critical role in addressing these problems. As long as a team tends toward this being the exception, and most deployments occur seamlessly with a developer in the pilot’s seat, you can find yourself closer to the holy grail than you ever dreamed.

Supporting software in production

Despite all the safeguards and checks, things will go wrong in production — either as an outcome of a recently deployed change or because of a latent issue that surfaced at peak load or under some other triggering conditions. Since customers are impacted, it becomes important to quickly get to the bottom of this and address it — either by reverting a change, applying a patch, or putting in place a workaround. These are often moments with high stakes. Imagine fixing your aircraft in mid-flight! 

Your best shot at getting to the root cause quickly is to have someone who understands what the symptoms mean. If you have built and deployed the runway as envisioned above, you will find yourself in luck — the person who understands the change that went into production, who understands the code that describes the infrastructure for the given service, and who introduced the alerts that triggered when the service degraded, is likely all the same person! Sometimes multiple of your services might be affected, so you may have to pull in multiple service owners familiar with each of them.

They will often need guidance from someone wiser — one who’s been there before and has grown nerves of steel — to make quick decisions. If the root cause is identified as stemming from a particular platform-as-a-service, they may need to involve an expert familiar with this service. But, more often than not, you will find yourself debugging something directly related to the code that you own as a team. And who better to debug this and identify a solution quickly, than the person who understands this code best?

The well-rounded engineer

By leveraging the technological evolutions that the realm of software engineering has seen over the last couple of decades, we are able to take the cognitive burden of a release off the shoulders of an engineer. This enables us to redefine the role of an engineer from someone who takes up a spot on an assembly line — typically doing the same kind of work without having the full context of what they are doing — to someone who develops deep expertise in their functional area while regularly releasing changes to their software all by themselves. 

Now an SRE can delve deep and tinker with the insides of an open-source service we are struggling to scale for our use case. An SDET can focus on introducing innovations that drastically improve the reliability of an automation suite while cutting the run time in half. A backend engineer can design, develop, build, test, and deploy an API endpoint all by themselves, and watch their observability dashboard for immediate feedback on how it is being used and how it is performing. Most importantly of all, because these engineers have complete influence over their own work, they achieve Flow more often, and engineers who experience Flow more often are likely to be happier.

From the perspective of the business, you are deploying changes faster, with better quality, and detecting and addressing problems sooner, all leading to a much-improved customer experience. To top it all, you have found a way to make your engineers specialists in their domains by continuous learning in areas of their focus, and in return, helping them be happier with their work. Building the Runway is certainly something we recommend more teams take up as a goal!

Transforming a team from its ingrained practices that encouraged assembly-line operations to one outlined above is however an arduous journey — one that will often take years for larger organizations. How do you inculcate a culture where engineers are motivated to invest in building the runway, as well as taking end-to-end ownership of a release?

The journey itself is intriguing enough that we will want to cover that in our next post.

Stay tuned!