A tinkerer’s tale of pushing a Rails serializer

Sometime last year, we found the opportunity to optimize one of our critical API endpoints that was served using Ruby on Rails. This led to a series of investigations before we identified the right long-term solution for our particular use case. It required us not only to leverage a new JSON serializer, but also challenged us to build efficient caching and deployment strategies to go with it. This is the tale of how we embarked on that compelling journey and what we learned as a result.

The Context

The Freshworks developer platform enables developers to build apps for customizing Freshworks products, integrating with third-party services, and automating customer workflows to increase productivity. Our Marketplace allows users to discover and manage useful apps, while our in-product app gallery additionally manages Custom (aka private) Apps for the customer’s Freshworks account. 

Freshdesk was a monolithic application built using Ruby on Rails (RoR). As the first product to integrate with our platform, we originally built the integration for Freshdesk natively within their monolith. One of the integration touchpoints was with our Global API – an API endpoint owned by our platform. The Global API is an API built using Ruby on Rails, which helps with fetching the list of available apps and supports various filter and sort algorithms. It also supports fetching detailed information about an app, the list of available app categories, available locales and additional details related to an app in the Marketplace.

Turning Red

As a company obsessed with delighting our customers, we regularly measure, review and act on performance metrics related to the many controllers in our backend services. As described in a recent post, we call these metrics ‘Delight Metrics’. Under the Delight Metrics program, Freshdesk had required that response times for their controllers must be maintained under 3s for at least 90% of the requests coming in to the controller. As the number of apps in our Marketplace steadily grew over time, the Marketplace controllers in Freshdesk that served the App Listing in the product UI turned red and needed urgent attention.

We initially decided to focus on moving Freshdesk to our new, embedded Marketplace experience as opposed to addressing performance challenges in what was marked to become legacy implementation. As the migration dragged on, we decided that we could not let the metric deteriorate forever. We chose to dive in and identify at a minimum some low-hanging fruits to pluck and start taming the metric.

Low-hanging fruits

We started by making a few performance optimizations by following guidelines listed in the ruby-performance-optimization guide, including:

  • Fetch list of apps based on different sorting algorithms in parallel and render them.
  • Optimize partial templates to load faster.

But, these changes did not improve the metric by much. We were also struggling to coordinate deploying these changes with the Freshdesk engineering team, given that the changes were in their code base. This pushed us toward focusing on fine-tuning our Global APIs to, in turn, speed up the corresponding controller in Freshdesk. Our initial goal was to help move the delight metrics for the marketplace controller to green.

As these APIs are also used by our new, embedded Gallery experience, improving the API performance would anyway reap longer-term dividends.

Benchmarking

Before we started, we wanted to set a process to benchmark our APIs and introduce performance gates in our Continuous Integration (CI) pipeline. This would ensure that prior to any deployment in our staging environment, performance tests can be run against the APIs. The pipeline would block if the set thresholds for response times were not met in the performance sanity stage.

 

We decided to kickstart the benchmarking for important APIs with a threshold of 6s (average) based on a basic load test. The plan was to decrease this threshold with each iteration until we were pleased with the API Performance.

Rails serializer table

We built a test suite in JMeter for the APIs described above and configured a test run with varying loads. We saved the JMeter project in XML format (JMX file) in the API repo and integrated with Jenkins.

Once the pipeline with the performance sanity was in place, we began analyzing and fine-tuning the endpoints.

Database optimization

The zeroth step in any performance optimization would be ensuring the database schema is efficiently designed, especially with respect to the indices available for the filters and projections applied.

Unfortunately for us, introducing the obvious indices did not show much improvement in our report.

Descending on Ruby

We use NewRelic as our APM to track the performance and reliability of our web services. The web transaction response time tracker for the service appeared to clearly show that Ruby was the prime suspect that needed closer attention.

Polishing the Ruby

Given we had narrowed down on Ruby as the prime area to focus on, it was now time to embark on a sequence of investigative, experimentative, and corrective measures that would help us generate upticks toward a much improved response time. As with everything we do, we started small — defining standards along the way — but quickly dived in within the standards we were defining. 

Step 1: Basic optimization

We decided to start by following the guidelines described in the Ruby Performance Optimization book. We chose to make an optimization to the controller and the Active Model Serializers (AMS) – a gem used to serialize ActiveRecord (Model) to a JSON object. There was, however, no significant improvement again.

Step 2: Profilers

We knew we had to look closer at the runtime behavior to understand where precisely the major chunk of time was being expended. We added a RubyProf Profiler to the middleware to break down how much time was consumed in the Ruby runtime. RubyProf supports report generation in multiple formats. By introducing ruby-prof-flamegraph, we were able to generate a few interesting and revealing flame graphs. Upon analysing the graphs, it was clear that the AMS was the bottleneck. On closer inspection alongside the source code, it was clear that this was caused due to the sideloading (adding associations data along with main objects to avoid duplicates) of app associations, such as categories, locales, etc., within the serializer.

Profilers

It so happens that there are alternatives to AMS suggested by the AMS developers that are more performant than AMS.

Step 3: FJA vs AMS

We decided to go with the Fast JSON API (FJA) (5 & 6) by Netflix because of its vibrant community, ease of adoption, and the fact that it was very suitable for models with many sideloading associations. 

As opposed to replacing the existing implementation immediately, we started by adding a new route that used FJA serializers so we could compare the performance between controllers using AMS and FJA. This would also help us with blue-green deployments – that are essential for simple and quick rollouts and rollbacks.

We ran our benchmark tests again in Jenkins, only to find that there was little improvement in performance using FJA as the response time in the 90th percentile was still greater than 6s.

Step 4: Enable Caching

We decided to go back to the profiler setup and inspect the performance of the FJA based serializers closely. We found that FJA spent most of the time in generating the JSON response (record_hash) for each app and sideloading the associations.

It just so happens that the primary advantage of using FJA over AMS is the ability to leverage caching. We started caching the record_hash generated for individual app records in memory and modified the API to fetch only the missing apps and their associations from DB. This reduced the trips to the database FJA needed to make for each request.

Note that we did not simply cache the complete response containing a collection of apps and their details. Since the response from the API varied with the parameters involved, we instead cache each app’s details separately. This way, an update to an app would not end up invalidating the cached response completely. 

FJA uses ActiveSupport::Cache::MemoryStore by default with size – 32MB and this cache cannot be shared across instances. We needed a shared cache that can scale beyond 32MB. We had to choose between Redis and Memcached – the two caching services available to our web applications in production.

Redis and Memcached compare favorably for the most part, but the main distinction that works in favor of Redis is the fact that it supports wildcards for accessing keys and supports the hash data type. The former allows us to easily invalidate individual keys in the cache as necessary. We therefore decided to proceed with Redis as the cache store for our serializer. We additionally used Redis mget to avoid an additional round trip between the application and Redis.

With caching of individual app records in Redis now in place, we started seeing a significant improvement in terms of the 90th percentile response times.

Step 5: Blue-green deployment

So far, we were only carrying out blue deployments in production for the new endpoint. If we went ahead and switched traffic to this new endpoint in production (making it green), we would start with nothing in the serializer cache, and risk really poor response times from the client perspective, including possible timeouts as the cache gets populated.

To prepare to switch traffic to the FJA changes, we added a rake task to warm the caches so that they are pre-loaded with relevant app details as part of the deployment.

With the warm cache in place, we switched traffic and enabled the Green deployment. After this last deployment, the results were spectacularly better.

We were able to achieve a P99 latency metric of ~1s.

Caching gotchas

There are a few other interesting details we have skimmed over with respect to how we efficiently manage the cache for optimal performance. Let’s look at a couple of the more interesting ones here.

Avoiding mass invalidation

When the Time-to-live (TTL) for each entry in the cache is a fixed constant, multiple cache entries might get simultaneously invalidated, leading to a spike in database calls for the following request. We therefore set a random TTL within a range of 7 days for each app. 

Do note that we had the luxury of caching a resource (app in this case) that is not expected to change very often.

Keeping the cache fresh

The cache key for an individual app record includes the ‘updated at’ timestamp of the app, which helps ensure that any change in the app and its associations would lead the record to become stale. The cache key also includes optional attributes supported by the API like version to make sure any change in the version also invalidates the cache.

Future Improvements

Having conquered this challenge of identifying a controller optimization that makes sideloading through a well-managed cache a breeze, we confidently move on to the next set of challenges within our Rails APIs.

We are now confidently extending our benchmark to the remaining APIs through the JMeter test suite. With benchmarks in place, we can again, confidently, iterate through performance improvement cycles for each endpoint. Who knows, we might end up migrating more endpoints to use FJA over AMS!