How we upgraded Rails for Freshdesk Contact Center with zero regressions and downtime

Freshdesk Contact Center(formerly Freshcaller) is a voice platform that provides call-based support to thousands of customers across the globe. Upgrading our product from Rails 4.2 to 5.2 helped us achieve the following:

  1. Latest security patches to protect your data from malicious attacks.
    Protip: Security is never optional!
  2. Gem upgrades: Gems built using Rails APIs may not be compatible with older Rails versions
  3. New features for every new version so that you don’t need to reinvent the wheel.
  4. Ease of upgrades: The older your version of Rails is, the harder it is to upgrade.

With a dedicated team and efficient coordination with stakeholders, we successfully upgraded to Rails 5.2 without any downtime or regressions and without halting any feature rollouts. This post describes our upgrade experience and the learnings we gained.

Preparation

Preparing Freshdesk Contact Center for the Rails upgrade involves test suite optimization and QA testing.

Test suite optimization: The unit tests were an essential part of our armor against app failures. However, the tests took about 1h 45m for each run. This high wait time was a severe bottleneck.

To solve this, we used a containerized approach by splitting test files sequentially into K groups. Each group was executed in a separate docker container with a dedicated database. A separate container ran static code analysis and security scans. This parallel setup reduced the total time to less than 20 mins. We included pull request checks to ensure test groups are updated whenever test files are added or removed. With this first step completed, we proceeded to the upgrade process with full vigor.

QA testing: Along with unit tests, we also had Selenium automation test suites maintained by QA engineers to prevent regression and identify new failures in the staging environment before the buggy code reaches production.

The upgrade process

The upgrade process involves upgrading dependent gems, adding or removing monkey patches, replacing deprecated methods, and resolving test case failures caused by upgraded code. The entire upgrade process comprises the following steps:

  • Dual boot – an easy mechanism to switch the Rails version
  • Deprecations – replace old methods for better functionality and performance
  • Dependencies – upgrade third-party gems to a higher version or patch it
  • Serialization – tweak your data for compatibility with both Rails versions
  • Unit test fixes – fix broken features to avoid business impact

Dual boot

We used bootboot gem to make the application code compatible with both the current and upgraded Rails version. The environment variable DEPENDENCIES_NEXT below acts like a switch to select the current Rails version.

plugin 'bootboot', '~> 0.2.0'
if ENV['DEPENDENCIES_NEXT']
 enable_dual_booting if Plugin.installed?('bootboot')
end

if ENV['DEPENDENCIES_NEXT']
 gem "rails", "5.2.0"
else
 gem "rails", "4.2.0"
end

With if-else conditions in the Gemfile, we created separate locked versions for each of the Rails versions – Gemfile.lock for 4.2 and Gemfile_next.lock for 5.2.

This dual boot setup enabled regular development by product teams to proceed in parallel with the upgrade so that feature rollouts remained unaffected.

Deprecations

Deprecation warnings combined with code coverage are your best friends during the upgrade. It points to the exact lines of code that need to be patched with the latest methods introduced in the new version. We used deprecation_toolkit to capture deprecations introduced by Rails 5.2. Since we were moving from 4.2, we also had to collect and fix the deprecations of the successive minor versions – 5.0 and 5.1. 

We created a new behavior in deprecation_toolkit to include backtraces to make it easier for developers.

class RecordDeprecationWithTrace
 extend DeprecationToolkit::ReadWriteHelper

  def self.trigger(test, collector, _)
   deprecation_file = recorded_deprecations_path(test)
   write(deprecation_file, test_name(test) => collector.deprecations)
  end
end

By setting the ActiveSupport::Deprecation::Behavior to raise in the test environment config, new deprecations will result in failures in unit tests which can then be fixed by replacing the deprecated method. This is very helpful in ensuring that no new deprecations are introduced.

We used deprecation.rails event of ActiveSupport::Notifications to identify deprecations in the code that were not covered by unit tests. This was an effective method to record the warnings raised in the production environment without polluting the logs.

ActiveSupport::Notifications.subscribe('deprecation.rails') do |name, start, finish, id, payload|
 Rails.logger.info("#{name} :: #{start} :: #{finish} :: #{id} :: #{payload[:message]}")
 # Publish to telemetry tools
end

Major deprecations

Dependencies

We first removed unused gems to reduce the compatibility overhead. Certain gems like seed-fu, airborne, and RSpec had versions compatible with both Rails versions and hence they became easy targets for upgrading. In the other cases, both versions of the gem were included in the Gemfile with if-else conditions.

We also have several internal gem repositories that have Rails-dependent business logic. In each of these repositories, we created a separate branch to add Rails 5 compatibility patches and included both the branches in the Gemfile using the dual boot setup.

Serialization

Serialization is a two-fold problem. First, existing data in the serialized database columns, cookies, and cache must be usable by the newer version. Second, the data created by the newer version must be readable by the older version. This backward compatibility is necessary to ensure data is not lost in case of any rollback. We scanned the database to identify columns that stored Rails objects directly because they would not be compatible across versions. To mitigate this, we converted the data to an equivalent plain old Ruby object.

We also introduced versioning in cache keys to ensure maximum compatibility during the transition phase. This was a temporary setup with separate Memcache keys for each Rails version.

def cache_store_key(id, account_id, class_name, version = Rails::VERSION::MAJOR)
 [class_name.to_s, version.to_s, account_id, id].join('/')
end

def clear_from_memcahed(id, class_name)
 [Rails::VERSION::MAJOR - 1, Rails::VERSION::MAJOR, Rails::VERSION::MAJOR + 1].each do |version|
    Rails.cache.delete(cache_store_key(id, class_name, version))
  end
end

Unit test fixes

Our unit test code coverage was close to 90%, a very high number. So we were able to easily find the root cause such as different behavior of the new methods, deprecations, and data incompatibility. All code changes that were compatible with both Rails versions were shipped in regular releases. We added if/else conditions in the application code to run the version-compatible method so that business logic remained unaffected. For each pull request, unit tests were executed in both Rails versions to ensure the fallback Rails version always had stable code.

Front-end impact

New Rails versions introduce changes to security headers and cookies to improve security. However, these changes may not be expected by the API consumers – front-end apps and downstream applications. To solve this, we compared the HTTP headers and JSON responses of all endpoints in both Rails versions. The Selenium automation suites were handy in this task for verifying responses and preventing regressions.

One roadblock we faced was related to CSRF tokens generated for HTML forms. Rails 5.2 introduced a security change in generating HMAC of CSRF tokens so that attackers cannot recreate each form’s token. However, it was not backward compatible with Rails 4.2 and hence we had to add a monkey patch for comparing the CSRF token with the global token in both versions.

Deployment Strategy

We added version tags in logs to differentiate the Rails version. We follow a blue-green deployment model where groups of instances are booted in both versions. Traffic was gradually switched from Rails 4 to 5 instances with continuous application monitoring using in-house logging and telemetry tools. In the event of any failure or feature breakage, we were ready for rollback to the previous color that was running stable code. The failures would be debugged, fixed, and re-deployed to production in the same manner.

There were minor hiccups during the Rails 5 rollout. Our analytics features were breaking due to DateTime precision discrepancies. We quickly modified the database column and then redeployed it to all regions. Another unexpected error was caused by the removal of the capture method in Kernel which was deprecated earlier in 4.2.

With this well-orchestrated effort, we successfully upgraded to Rails 5.2.6 without any downtime.

Post upgrade activity

Once we were stable with the new version, we proceeded with removing dead code from the older Rails version. The post-upgrade activity involves removing all if-else conditional code from the dual boot setup, backward compatibility monkey patches, the temporary version-based cache keys, and the Gemfile_next.lock file.

New configurations are introduced with each major version of Rails and these are disabled by default in the auto-generated new_framework_defaults initializer file. We analyzed the impact of each new configuration based on the business requirements. If the new default is not suitable, it can be disabled in config/application.rb or with an initializer. After making the necessary config changes, we updated the config.load_defaults value to swap the defaults.

Learnings

We had a ton of learning with the following key takeaways:

  • Upgrade dependent gems as early as possible: This prevents surprise failures during the Rails upgrade process and improves compatibility.
  • Dual boot is essential: Maintaining both versions of code is necessary to ensure that we are upgrading the latest application code and also provides a fallback in case of failures.
  • Avoid writing Rails objects to storage: Rails objects stored in serialized database columns or cache may not be forward-compatible across versions.
  • Be prepared for rollbacks: You never know if something might break even if you have 100% code coverage. So have a battle-tested fallback to mitigate failures.
  • Upgrade regularly: Staying closer to the latest Rails version provides security and performance improvements. Newer features can be readily adopted without reinventing them with monkey patches.