DBaaS origins: The making of a self-hosted DB solution at Freshworks

Teams at Freshworks primarily use MySQL for their database needs. Freshworks being an AWS shop, RDS MySQL has been our primary, go-to solution. Sometime back a team was formed and tasked with building a self-hosted DB solution. We called ourselves the DBaaS (Database as a service) team and started with a DBaaS solution for MySQL.

In this blog post (part I of the series) we discuss the motivation and high-level architecture of the DBaaS solution.

Motivation

One of the main reasons was better control.
Improved reliability.
- We wanted automated handling of replica failures. While RDS gave us automated primary failover, there was nothing for replicas.
- We wanted to improve on the automated Multi AZ primary failure that AWS already provided. There have been instances where the primary failover took a lot longer than desired.
Ability to use custom plugins. RDS does not support custom plugins.
Cost. Although this was not the main driving factor, we realized that with better control, cost reduction was a distinct possibility at least in the long run.

Current setup

Most of the teams internally use multiple MySQL clusters called shards. Each shard is typically a two-or three-node RDS Multi-AZ MySQL cluster with one primary and one or more replicas. Applications typically have built-in shard-awareness logic. When we sat to hash out the design, sharding was not a goal for us as our applications had sharding logic baked in their side of the code. We were aware, however, that things might change in the future with newly developed applications.

Design Components

Data plane

The data plane consists of components that processes the actual MySQL traffic.

MySQL

For MySQL nodes we went with:

EC2 nodes with EBS volume as MySQL data storage. EBS volumes made it easy for taking backups. We plan to discuss backups in a later article.
Ubuntu 18 as the operating system.
XFS file system for the data volume.

For the software we went with version 5.7 with semi-synchronous replication.

A MySQL shard was designed to comprise a primary, a candidate primary (a node which is eligible to be promoted as a primary) and typically one or more replicas. We will discuss more about shards later in this blog.

ProxySQL

ProxySQL is used for all communications with MySQL, be it connecting to the primary or replicas. We spread these nodes across multiple AZs for High Availability (HA). ProxySQL is central to the design and we will talk more on this shortly.

Control plane

The control plane consists of components that configure and manage the data plane components of various tenants.

Orchestrator

We run an orchestrator cluster in HA mode for handling automated primary failover. Orchestrator is a MySQL topology manager and a failover solution, used in production on many large MySQL instalments. It allows for detecting, querying and refactoring complex replication topologies, and provides reliable failure detection and intelligent recovery and promotion

Hashicorp Consul cluster

A consul cluster with nodes across multiple AZs. It is used as a key/value database which supports key watchers. During a primary failover the orchestrator updates keys in consul which are watched by a watcher service described next.

Custom consul watcher service

This service sits alongside the proxySQL process on every proxySQL node. This service watches for changes in keys in consul and updates proxySQL routes dynamically during primary failover. In part 2 of this blog series we will look into this in more detail.

High Level architecture diagram

As evident from the diagram above, the DBaaS VPC is centralized per AWS region. Every team using DBaaS connects from their respective VPCs using a dedicated AWS PrivateLink. The Private Link connects to the customer’s ProxySQL which then forwards traffic to the appropriate MySQL shard.

The DBaaS control plane manages resources in the customer data plane.

Before we proceed further let’s take a look at a typical MySQL shard in DBaaS.

MySQL shard

The above is a MySQL shard with three nodes. We place the primary and the candidate primary in two different AZs. These are configured with lossless semi synchronous replication. The replicas connect to the primary using regular async replication. In case of a primary failure, the orchestrator promotes the candidate primary to primary and modifies the replication setup in replica to point to the new primary. Such an event also involves proxysql route changes, We will get to that in a future blog.

ProxySQL Setup and traffic flow

Applications connect to MySQL via proxySQL. ProxSQL is used for

Routing to all shards by usernames. As mentioned earlier our apps have sharding logic built-in and can be configured to connect to shards by distinct DB usernames.
Read/write split in a given shard, again using usernames. Typically read/write split in proxysql is configured with mysql_rules table, but we wanted to avoid the penalty associated with rules parsing and, instead, use proxysql usernames. We realize that this might need change in the future where we might have to end up using the mysql_rules table.

The two main tables that we use for proxysql configuration are:

mysql_servers: This table contains all the MySQL backends. The backends are grouped into what are called hostgroups (hostgroup_id is a field in this table). A host group can have one or more hosts with specific weight.
mysql_users: It contains all the MySQL users. Note that all application users need to be present on the ProxySQL servers in addition to the MySQL backends. This table allows for associating a given user to a hostgroup known as default_hostgroup. Requests coming as a user will be sent to the hostgroup they are associated with.

Let’s take the example of a team running two shards, shard1 and shard2 in DBaaS. Let the following be the set of hosts in the shards.

Shard1

172.28.128.9 – primary

172.28.128.10 – candidate primary

172.28.128.11 – replica

Shard2

172.28.128.12 – primary

172.28.128.13 – candidate primary

172.28.128.14 – replica

We create

Three hostgroups per shard. One for writes and two for reads. From the following command output you can see that we create hostgroup 100, 101, 102 for shard1, and 200, 201 and 205 for shard2 respectively.

The writer hostgroup (100, 200) at any point of time contains only one backend (that of the primary) while the two reader host groups per shard contain the two replicas. The only difference between the two read host groups is the weight assigned to the backends.

            proxysql> select hostgroup_id,hostname, status, weight, comment, max_connections from    mysql_servers;

Three users per shard. One for writes and two for reads. Two reader accounts are not always needed though. Each of these users is associated to the appropriate hostgroup

       proxysql>select username, active, default_hostgroup from mysql_users;

From the above two tables we can see that user user_team1-shard1-writer, the writer user, is associated with hostgroup 100 and as a result any query from that user will be directed to the shard1 primary which is 172.28.128.9. Similarly applications coming with user_team1-shard2-writer username will hit the shard2 primary, 172.28.128.12.

Requests coming in as user_team1-shard1-reader1 user will be sent to hostgroup 101 with two backends (172.28.128.10, 172.28.128.11) which are the replicas in shard1. Because of the higher weight that 172.28.128.10 is configured with, it will end up serving more requests in comparison to 172.28.128.11.

The scenario becomes converse with the user user_team1-shard1-reader2 reader. In case of a replica failure, the other node in the reader host groups should keep serving the read traffic.

In short, proxySQL is the heart of this solution and we have seen how we route traffic across shards, and do read/write split within a shard as well as across shards.

Conclusion

In this blog we gave an overview of the design and the various components involved in our DBaaS solution. We showed how a DBaaS shard of a tenant looks like and the proxySQL based routing to those shards. In part II we discuss how automated primary failovers are handled.

(Cover image: Vignesh Rajan)

Authors