How to make performance testing more effective for an organization

Performance testing is the need of the hour when it comes to SaaS applications, like the ones we build here at Freshworks—which protects the privacy of each and every customer while providing service that is reliable and secure—on a massive scale. 

By design, SaaS deployments are truly multi-tenant and therefore exposed to highly variable and spiky workloads. In a SaaS company like Freshworks, which is growing at rocketship speed, we receive thousands of requests every single day. The volume of requests handled by the applications often reaches remarkable levels and places an unexpected amount of stress on the systems serving these applications.

The engineering teams need to cope with the increased workload. As a solution, in order to predict the system behavior under stress, the team must adopt performance testing as part of their engineering practices and release criteria. Today, we share a brief account of how our Freshworks Developer Platform engineering team embraced this reality and incorporated new tools and practices to adopt performance testing (or ‘Perf Testing’ as it is smartly called!)

Why performance testing

Performance testing primarily helps ensure that the performance under stress, longevity, and resilience of our services are never compromised and the services are available at all times. In order to achieve this, we must first benchmark these services to measure how they perform with respect to the performance metrics we care about. These include client-side metrics, such as peak traffic (requests per second or RPS), response time, and failure rate and  server-side metrics, such as CPU utilization, memory utilization, and network latency.

Benchmarking is a process that involves load testing the service with increased load and ensuring there is no major deviation in the performance metrics as compared to the baseline.

Let’s take an example of our Marketplace owned service for better understanding. In our Freshworks marketplace, we own a unified gallery page that works across Freshworks products. The critical functionality of this page is to list all the marketplace apps to the customers, and allow customers to install the app, update and uninstall the app. To support this we built a backend service called “Gallery backend” for authentication, authorization, listing apps and performing actions like installing, updating, or uninstalling the app by interacting with other dependent services. 

We have identified critical performance metrics of our Gallery backend service under peak traffic in production. They are Peak load, P90 Latency, Error rate, CPU utilization, Memory, etc. We must remember that these are the baseline metrics.

In our case, based on the requirement we have tested our service with 3X of the baseline load for a stipulated time period of 45 min and ensured performance metrics have not deviated much from the baseline metrics or crossed the threshold limit (For example, we have set a threshold for ourselves that CPU utilization should not cross 80% or error rate should not cross over 10% etc.,). The following table displays the data.

Performance testing table

If there is any major deviation (or crossing the threshold limit), analyze the root cause and fix it so that your system performance metrics remain the same with increased traffic. This is called ‘benchmarking’ your service.

The example we have given is to benchmark your service with your own performance metrics as a baseline. You can also benchmark your system with another baseline as that of a competitor. For example, if your competitor application load time is 250 ms, you might want to benchmark your application with that baseline or even better than that of your competitor.

Benchmarking helps to:

  1. Appreciate how your application performs under the specified load today, as measured by the performance metrics you identify and track.
  2. Uncover opportunities to improve some of the performance metrics by identifying bottlenecks in your system, which can become very expensive if found later in production.

After the service is benchmarked, we can leverage automation to integrate the performance testing cycle as a stage in your CI pipeline and ensure performance metrics are not compromised with every deployment.

This is a key step in retaining the quality of software from a usability perspective, but is often ignored or is taken up after a performance-related incident occurs. In some cases, it is performed after functional testing is complete and a new capability is about to go-live. 

When used strategically, the process of performance testing should ideally be intertwined with the process of writing and deploying code. This ensures that applications are designed to be efficient from the outset and performance testing is not a last-minute formality before putting the software out for use. 

Performance testing can also be used tactically to help identify the nature or location of a performance problem by highlighting where an application might fail or lag. This form of testing can be used to ensure that an organization is prepared for a predictable adverse event.

Seyalthiran: The expectations

In early 2020, as a platform team, we were witnessing a steady growth in load on all our services due to the pace at which our business grows here at Freshworks. We had never performed benchmarking of the load that our services are able to sustain, even though we were aware of the performance metrics we cared about. This was not a great position to be in, as big events that our customers organize or participate in always brought more traffic to our services and this caused a certain level of discomfort.

We decided that we had to invest in performance testing to ensure that we understood the scalability, availability, and reliability of all our platform services.

To achieve what we had in mind, we realized the need for a performance testing framework that would enable the team to test faster (hence, more often) across the software development lifecycle. We wanted the framework to support a “Shift Left” approach with test automation and CI integration, by enabling faster test design and analysis. 

A few salient requirements we identified for the potential framework were:

  1. Faster test design: The framework should facilitate low code and single click load testing to ensure that the framework itself does not become a hassle for those who write, configure, and run the tests.
  2. Low maintenance: The test scripts should require minimal maintenance when the expected system behavior changes so that performance testing can keep pace with the current Continuous Integration or Continuous Deployment environments.
  3. Easy integration: The framework should easily integrate into our existing Continuous Integration or Continuous Delivery setup.
  4. Cost-effective: The framework should be cost effective with respect to the operating and maintenance costs. It should specifically not require investments in dedicated servers in a dedicated test environment, load generators, or report generators.
  5. Analytics: The framework should provide a live dashboard to monitor test results, application or server-side metrics in real time when the test is running, analyzing the errors, and comparing the results of different test runs.
  6. Automatic cut over: The framework should support a ramping down of the load or a complete shutdown of the test execution in case performance metric KPIs defined by the tests are breached. For example, when a threshold for 5xx errors is breached, we may not want to continue the test anymore. This is very helpful when we boldly venture to test in production!
  7. Distributed load testing: It should be possible to generate load in a distributed fashion across various AWS regions simultaneously to capture the response times from an end-user perspective. 

We analyzed licensed tools such as Blazemeter, Load Ninja as well as a few open-source tools such as The Grinder, Gatling and Serputko. We could not, however, close in on using one of these tools mainly because we found some of the licensed tools to be very expensive, while others required a lot of further configuration or implementation and maintenance. We also found challenges in integrating them into our CI pipelines and monitoring tools.

Therefore, we chose the “Build” approach and decided to build our own framework to meet all expectations we had from the framework. This is how SEYALTHIRAN was born!

Seyalthiran: Coming together

Seyalthiran is a Tamil word meaning “performance”. The framework is built on top of JMeter. JMeter is a popular open-source tool used for performance testing. It executes performance testing equally well with static and dynamic resources. A simulated load testing carried on a server, network, or resource using this tool helps verify performance in response to a variety of payloads. 

The basic flow involved in configuring and running a test through Seyalthiran and the components involved is depicted in the following picture.

Perf testing flowchart

The tool includes a front-end React app hosted on an Express server. This app receives inputs (scripts and configuration) from users for the application under test.

Performance testing screengrab

The JMX and data files are sent to Jenkins, which orchestrates the load testing. Express also authenticates the requests sent to Jenkins.

Jenkins orchestrates jMeter master and slaves in an AWS ECS cluster for load testing and maintains reports in its workspace. Load generators are created on-demand during the test run and teared down after the tests. The number of load generators required is decided on the fly based on the load required for a particular test run.

Seyalthiran also provides live monitoring of load tests leveraging jMeter’s Influx backend listener. jMeter pushes real time metrics to influxdb during the test and with Grafana plugged in with influxdb, and graphs can be rendered in the same app UI.

The app also publishes reports hosted by Jenkins, which can be retrieved anytime after the test run is complete.

Some other key features offered by Seyalthiran are:

  • No network latency introduced due to the network (typically an office or home network in our WFH days), since the load generator is set up completely in the same cloud environment as the services we are testing.
  • Intelligence built into the framework to halt the load test run if the SLA/KPI deviation is far from what is expected.

Performance testing during the biggest cricketing event

One of our major customers was anticipating seriously high workloads during the biggest cricketing tournament in 2020. They were in fact expecting up to a 5x increase in peak traffic to their account, which would in turn generate spikes on our platform. As one of the critical accounts that consume all our developer platform features, they needed us to ensure that we were ready to deal with this for a sustained period of around 2 months.

We decided to benchmark the primary services that were expected to be in the firing line and test them at and above the expected new peaks.

We studied the utilization of the routing service that would handle most of this workload in production and noted that during peak hours it was serving close to 7.5k RPM (Requests per minute) and established this as our baseline.

The baseline metrics that we captured for the production environment in peak traffic hours with an Average Response time of 155 ms and CPU utilization of 20% is shown in the figure. Our goal was to benchmark our system without impacting existing metrics or breaching threshold even with increased load for the duration of the IPL season.

  Using Seyalthiran, we replicated the production setup and the peak load in our staging environment. After we set the baseline numbers, we started feeding the router more requests with each repetition of the test run, until finally, we benchmarked the router at 25k RPM, which is about 3.5 times more than the current peak load, and sustained this for about 4 hours. The data depicted in the following graphs clearly indicate that even with increased load no deviation in latency was observed. The router continued to tick away at 14 ms, admittedly with increased CPU utilization, but well within the threshold of the 80% utilization we had set. This result gave us confidence that we could easily accommodate and deliver to the needs of our enterprise-class customers, during the IPL.

Uncovering systemic bottlenecks

Identifying architectural bottlenecks and taking corrective action is an important part of any performance testing exercise. As part of a containerization exercise undertaken for services running in production on our platform, we decided to use Seyalthiran to uncover potential reliability, availability, and performance challenges emerging from the migration.

A memory leak is one of the most annoying problems a developer can run into. They don’t happen consistently and finding the root cause is relatively difficult. But these are not the only type of issues that can crop up during a containerization exercise. CPU, I/O, and network utilization are some of the other areas that are likely to be affected. 

 When we were in the process of containerizing the router service mentioned earlier, we tested the container deployments for scalability of the proposed architecture and found that memory used by the service was gradually increasing during the test. The system eventually crashed once it ran out of memory, clearly indicating a memory leak.

The following graph depicts the gradual increase in memory.

Tracking down a memory leak can be a tedious process. Finding the root cause required that we thoroughly analyzed the system behavior under various loads and testing configurations. We could not imagine doing this without a robust performance testing framework. 

Our original objective was to benchmark our service by migrating to dockerized containers on Node 12 with 1.5X the peak production load. The baseline metrics were collected by running load tests with the same 1.5X load in a PM2 and Node 8 deployment, which is how services were deployed prior to containerization.

In our initial test run, our performance metrics were found to be deviating from the baseline as CPU usage was 10% higher, RAM usage was 5x times more, response time was 10% higher, CPU time per tick was 5x times higher, and time spent in Garbage collection and GC calls per minute were lower than those of the baseline. We therefore decided to run load tests in various other configurations from 1X to 4X load, along with variations in NodeJS versions (8 and 12), with multiple NodeJS memory configurations, with and without restricted CPU limits and with multiple docker memory configurations. We eventually had to run the tests in about 20 different configurations to iron out the issue that was causing the deviations. The high level test results are displayed in the following table.

With Seyalthiran, we were able to use the same scripts with the same data sets and quickly compare various test results to iterate toward the root cause. 

 Conclusion

The world today runs on the World Wide Web, and the recent pandemic has only gone to show us how critical a digital identity can be. Webpages, software, and applications not only comprise the identity of an enterprise, but are also the tools that make the organization visible in the public domain. No one wishes their public identity to be tarnished because it fails to work under duress, or simply because it became too popular. This makes performance testing a very vital part of the toolkit of an engineering team.

Using a framework like Seyalthiran, we were able to build a one-stop solution to evaluate the performance of our systems. We look forward to expanding the framework with the following additional capabilities: 

  • Given we also serve a number of front-end services, we are planning to extend this framework for front-end load testing by integrating with sitespeed
  • To enable gathering a broader variety of metrics, we are also planning to integrate with Telegraf, an open source server agent, to collect metrics from databases, containers, and orchestrators.
  • To shutdown tests when performance metrics thresholds are breached, we are very close to rolling out an automatic cut-over feature.

We are excited to continue building on Seyalthiran. We plan to make this available to the broader developer community soon as an open source performance testing solution, so that we can come together as a community to take it to the next level.