Horizontal sharding in a multi-tenant app with Rails 6.1

Rails, the framework built on top of Ruby, just got its latest version(6.1) released. A lot of features and enhancements have gone into the latest version of Rails. You can read the official announcement for more details.

We will be focusing particularly on the Multi-DB improvements section, what changed and how we can leverage Rails’ native multi-DB handling techniques for building scalable multi-tenant applications involving one or many horizontal shards and replicas.

Rails 6.0 was the first official rails version to support multiple databases. From the release notes:

“The new multiple database support makes it easy for a single application to connect to, well, multiple databases at the same time! You can either do this because you want to segment certain records into their own databases for scaling or isolation, or because you’re doing read/write splitting with replica databases for performance. Either way, there’s a new, simple API for making that happen without reaching inside the bowels of Active Record. The foundational work for multiple-database support was done by Eileen Uchitelle and Aaron Patterson.”

This allowed application developers to be able to define multiple database connections for a single application. Before this, developers had to use one of the many third party gems for any kind of multi-DB support in Rails. Even though the ruby/rails community is very vibrant, third party gems often come with maintenance overheads with respect to upgrades, breaking changes, bugs, performance issues, etc.

With Rails 6.0, you could define your database.yml in such a way:


# config/database.yml
default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000
development:
  primary:
    <<: *default
    database: primary_db
  primary_replica:
    <<: *default
    database: primary_db_replica
    replica: true
  animals:
    <<: *default
    database: animals_db
  animals_replica:
    <<: *default
    database: animals_db_replica
    replica: true

Then define ActiveRecord Abstract classes that could connect to these databases.


# app/models/application_record.rb

# frozen_string_literal: true
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { writing: :primary, reading: :primary_replica }
end

# app/models/animals_base.rb

# frozen_string_literal: true
class AnimalsBase < ApplicationRecord
  connects_to database: { writing: :animals, reading: :animals_replica }
end

# app/models/user.rb

# frozen_string_literal: true
class User < ApplicationRecord
end

The abstract classes and models inheriting from them would both now have access to the connected_to method which can be used to establish connection to the configured database connections.


# some_controller.rb

# frozen_string_literal: true
ApplicationRecord.connected_to(role: :reading) do
  User.do_something_thats_slow
end

This approach worked great for primary-replica setup or setups where models had clear separation, i.e. a model always queried from a single database. However, with modern multi-tenant SaaS applications, horizontal sharding is almost a basic necessity. Depending on the tenant that’s accessing the application, the application should be able to select which database it wants to query the data from. While how the application shards horizontally is DSL and can vary from a case to case basis, how it is able to connect to the underlying databases should be something that the framework should be able to handle. And so they did.

With Multi-DB improvements released in 6.1, you can now define shard connections for your abstract classes as well. The example from above changes as:


# config/database.yml

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  primary:
    <<: *default
    database: primary_db
  primary_replica:
    <<: *default
    database: primary_db_replica
    replica: true
  animals:
    <<: *default
    database: animals_db
  animals_replica:
    <<: *default
    database: animals_db_replica
    replica: true
  animals_shard1:
    <<: *default
    database: animals_db1
  animals_shard1_replica:
    <<: *default
    database: animals_db1_replica
    replica: true


# app/models/application_record.rb

# frozen_string_literal: true
class ApplicationRecord < ActiveRecord::Base
  connects_to database: { writing: :primary, reading: :primary_replica}
end

# app/models/animals_base.rb

# frozen_string_literal: true
class AnimalsBase < ApplicationRecord
  connects_to shards: { 
    default: { writing: :animals, reading: :animals_replica },
    shard1: { writing: :animals_shard1, reading: :animals_shard1_replica }
  }
end

# app/models/cat.rb

# frozen_string_literal: true
class Cat < AnimalsBase 
end

Similar to 6.0, we can then leverage the connected_to method for switching(/establishing) connections to the configured databases.


# some_controller.rb

# frozen_string_literal: true
AnimalsBase.connected_to(shard: :shard1, role: :reading) do
  Cat.all # reads all cats from animals_shard1_replica
end

One of the common design patterns for multi-tenant architectures is to associate every tenant with a unique subdomain on your root domain. In fact, at Freshworks, we follow the same pattern as well. For example, Freshservice runs on freshservice.com. marvel as a Freshservice tenant would access their instance using marvel.freshservice.com and so on.

This pattern has its own advantages (easy/faster DNS resolution when running on a multi-pod setup) and disadvantages (DNS updates for every tenant record modification). Instead of debating that, we will delve into how to implement this architecture in a Rails application using the new multi-DB setup.

We can begin by setting up the required database configurations first:


# config/database.yml

default: &default
  adapter: sqlite3
  pool: <%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
  timeout: 5000

development:
  default:
    <<: *default
    database: primary_db
  default_replica:
    <<: *default
    database: primary_db_replica
    replica: true
  shard1:
    <<: *default
    database: shard1_db
  shard1_replica:
    <<: *default
    database: shard1_db_replica
    replica: true

We will need a Tenant model. Since our tenants will be identified by subdomains, it makes sense to have a subdomain column in the tenant table along with other required attributes. Each tenant belongs to a shard and all data of that tenant would reside on that shard. So we will need a shard model as well.

We will define the required models accordingly:


# app/models/application_record.rb

# frozen_string_literal: true
class ApplicationRecord < ActiveRecord::Base
  self.abstract_class = true

  db_configs = Rails.application.config.database_configuration[Rails.env].keys

  db_configs = db_file.each_with_object({}) do |key, configs|
    # key = default, db_key = default
    # key = default_replica, db_key = default
    db_key = key.gsub('_replica', '')
    role = key.eql?(db_key) ? :writing : :reading

    db_key = db_key.to_sym
    configs[db_key] ||= {}

    configs[db_key][role] = key.to_sym
  end

  # connects_to shards: {
  #   default: { writing: :default, reading: :default_replica },
  #   shard1: { writing: :shard1, reading: :shard1_replica }
  # }
  connects_to shards: db_configs
end

With multi-tenant architectures, there will always be a global context and a tenant-specific context. We isolate such models through abstract classes ApplicationRecord and GlobalRecord. They also take care of abstracting database connections and setting up the required isolations.


# app/models/global_record.rb

# frozen_string_literal: true
class GlobalRecord < ActiveRecord::Base
  self.abstract_class = true

  connects_to database: { writing: :default, reading: :default_replica }
end

# app/models/tenant.rb

# frozen_string_literal: true

#
# Table: tenants - stores tenant related information
#
#  id                  :bigint(20)    not null, primary key
#  name            :varchar(75)    not null
#  subdomain    :varchar(25)    not null
#

class Tenant < ApplicationRecord
  include ActsAsCurrent  # https://dev.to/ritikesh/simple-multitenancy-inrails-42b6

  validates :subdomain, format: { with: DOMAIN_REGEX }
  # other DSL

  after_commit :set_shard, on: :create

  private

  def set_shard
    Shard.create!(tenant_id: self.id, domain: subdomain)
  end
end

# app/models/shard.rb

# frozen_string_literal: true

#
# Table: shards - stores shard information information for all tenants
#
#  id                  :bigint(20)    not null, primary key
#  tenant_id      :bigint(20)    not null
#  domain  	 :varchar(50)    not null
#  shard	 :varchar(10) not null

class Shard < GlobalRecord
  include ActsAsCurrent

  validates :domain, format: { with: DOMAIN_REGEX }
  validates :tenant_id

  before_create :set_current_shard

  private

  def set_current_shard
    self.shard = APP_CONFIGS[:current_shard] #shard1
  end
end

We can also leverage the BelongsToTenant pattern for all models that belong to a tenant and inherit from ApplicationRecord.

All ActiveRecord inherited models connect by default to a default shard and a writing role unless connected_to another connection. Hence, when connecting to GlobalRecord inherited models, we will not require any explicit connection handling.

We can also define a proxy class to abstract out all application-specific connection handling logic:


# app/proxies/database_proxy.rb

# frozen_string_literal: true
class DatabaseProxy
  class << self
    def on_shard(shard: , &block)
      _connect_to_(role: :writing, shard: shard, &block)
    end

    def on_replica(shard: , &block)
      _connect_to_(role: :reading, shard: shard, &block)
    end

    def on_global_replica(&block)
      _connect_to_(klass: GlobalRecord, role: :reading, &block)
    end

    # for regular executions, since Global only connects to default shard,
    # no explicit connection switching is required.
    # def on_global(&block)
    #   _connect_to_(klass: GlobalRecord, role: :writing, &block)
    # end

    private

    def _connect_to_(klass: ApplicationRecord, role: :writing, shard: :default, &block)
      klass.connected_to(role: role, shard: shard) do
        block.call
      end
    end
  end
end

With this setup in place, we can now write both application and background middleware that handle shard selection and tenant isolation on a per request or job basis.


# lib/middlewares/multitenancy.rb

# frozen_string_literal: true

require 'database_proxy'

module Middlewares
  # selecting account based on subdomain
  class Multitenancy
    def initialize(app)
      @app = app
    end

    def call(env)
      domain = env['HTTP_HOST']

      shard = Shard.find_by(domain: domain)
      return @app.call(env) unless shard

      shard.make_current
      DatabaseProxy.on_shard(shard: shard.shard) do
        account = Account.find_by(subdomain: domain)

        account&.make_current
        @app.call(env)
      end
    end
  end
end

# config/application.rb
require 'lib/middlewares/multitenancy'

config.middleware.insert_after Rails::Rack::Logger, Middlewares::Multitenancy

With more widespread adoption, the underlying framework should only get better from here. Native implementations also allow us better flexibility and control over code flows and application design.

Cover design: Vignesh Rajan