How we chose the right database solution for Freshsales email content

Storing and processing of large volumes of transactional data in a real-time application is always a challenge in software development. Choosing the right database is always crucial since decisions taken in that initial phase might look fine but affect the application’s performance as the feature adoption and customer base grows. This blog is about one such problem we faced and how we solved it.

Freshsales is a CRM software which makes sales a breeze by providing functionalities such as sales pipeline management, 360-degree customer view, workflows, reporting, integrated email and phone, etc. Freshsales is built using the Ruby on Rails framework and majorly uses the services provided by AWS in its production infrastructure.

Let me quickly delve into the specifics of Integrated email clients, a Freshsales functionality about which I will be talking more in this blog. As part of this functionality, Freshsales offers capabilities such as sending emails, bulk email campaigns, connecting mailboxes and syncing emails from other service providers like Autopilot right within the application to simplify customers’ interactions with their prospects.

Email use cases

When we did the technical designing for this functionality, we had MySQL as the first option for storing email data while keeping its attachments in AWS’s S3 (Simple Storage Service). The reason for inclining toward MySQL was, we had use cases in the application which needed establishing the relationship between emails, prospects and agents. And prospects and agents were already being stored in MySQL. So, keeping email data also in MySQL felt like a simple implementation in initial days.

We did benchmarking on MySQL and observed that latency of the READ requests got slightly increased along with the email data in the table. We follow shared multi-tenant architecture in the application where customers are distributed across multiple databases. And we could potentially shift customers across shards whenever we saw any performance degradation due to higher volume of email data from a single customer. So, we didn’t consider this as a major concern and consciously went ahead with RDS (fully-managed MySQL service provided by AWS) for storing email data similar to the rest of the transactional data in the application.

Email content ER old

Problems faced

After a few years, the conscious decision we took earlier on the volume of emails started causing second-order problems which affected other functionalities in the application.

1. Database replica lag: We started receiving alerts from the production servers on Database Replica lag. Upon observation we found that, while storing emails in MySQL, there was a lag in syncing them with the replica MySQL instance. This lag was very evident when there was high traffic while sending an email campaign to many users with large content bodies.

Due to the lag caused by this, syncing of other application data to the replica also got queued up due to single-threaded behavior of MySQL replication. As a result, all functionalities in the application that read data from replica instances got affected.

2. Database backup time: We have a periodic backup configured for the production database for the purpose of data recovery in case of hardware failure. This backup operation started taking a lot of time—and increased exponentially over time. We analyzed the storage trend in MySQL and found that more than 60% of the database was occupied by email content, which was the main reason for higher backup time.

3. Infra cost: We use AWS RDS to host MySQL databases for the application. Considering the fact that storage is directly proportional to the cost, we ended up paying more than 60% of the total RDS cost just for storing email content. Since we have 2 replicas attached to each of the primary database instances, the amount of RDS cost was thrice that of a single instance. The cost for persisting periodic database backups was additional.

4. Moving data across data centers: We have multiple data centers in order to achieve geographic efficiencies, and customers can either choose their data center when they sign up for the application or they can even opt for switching across data centers after trialing the application for a while. In the latter case, we move all their data from one data center to another. For high volumes of email content, this job became very time consuming.

Solutions considered

To solve the above-mentioned problems, we started evaluating alternative solutions for email content. The primary idea was to move the email body out of MySQL, keeping the rest of the meta, and email relationship with other entities, intact in MySQL. This time we had certain requirements for the solution.

  • Size of the email body can’t have a fixed boundary. So, the solution must considerably have a larger cap on the size of an object.
  • Sending and receiving of emails is a non-blocking operation, i.e., once a user sends out an email, we move that operation to the background queue for execution. So, WRITE latency was not a major concern.
  • Email content is immutable. Since updates weren’t required, capabilities like PUT and DELETE were not a concern.
  • Solution should have a full managed service to simplify infrastructure maintenance.
  • Operational availability of the solution.
  • Solution must support data replication.
  • Solution must simplify the problems of moving data across data centers.
  • Cost of the solution should be low.

We analyzed NoSQL databases like MongoDB and DynamoDB along with object storage options like S3 which could possibly meet the requirements.

  • MongoDB can be fully managed on AWS using third-party services like MongoDB Atlas. But it comes with cost and maintenance overheads. And we didn’t have any other use cases in the application which could possibly leverage MongoDB’s complete capabilities to match up with the additional cost we would be required to pay. So we dropped MongoDB from the list of contenders.
  • DynamoDB is fully managed by AWS and incorporating it into an existing AWS infrastructure is straightforward. AWS also provides capabilities like auto-scaling and cross-region replication for DynamoDB. It is also better optimized for INSERT and READ operations. But it has a document size limit of 400KB which didn’t match our requirement. So we dropped DynamoDB from the list of contenders.
  • S3 is an object storage service fully managed by AWS. S3, too, has all the advantages of a fully managed service similar to DynamoDB. It supports objects of size upto 5 terabytes. And the cost of storage for S3 is way less compared to any other storage options. So we started evaluating S3 in more detail.

As part of the S3 evaluation, we had a few concerns, especially over its network latency since that might affect the application performance.

  • Email-related operations like sending and receiving are executed in background and will not block the user’s further actions in the application. So we didn’t take WRITE latency as a major concern.
  • Most of the screens in the application wherein we had to list emails relied on the email meta rather than on the actual body. We would end up READING an email body from S3 only if the customer intended to look at the full data of the email. So we decided to analyze customer patterns in order to find the probable frequency of S3 READ operations. To our surprise, this frequency was quite low, which was a good thing for us.

Keeping this READ latency as a probable issue, we consciously went ahead with S3 as an alternative to MySQL for storing email body data.

The final solution

Email content ER New

Keeping the email meta information in MySQL, we moved the rest of the email body data to S3. This has not just solved the existing database problems like replica lag and higher backup time but also saved a lot of infra cost for us.

Considering the combined storage cost of RDS primary instance and 2 replica instances, there was more than 65% decrease in the RDS storage cost, which saved us around $6,000 every month.

RDS cost

 

S3 cost

There was a 37% increase in the S3 storage cost after we moved all of the existing email content to S3. This added an additional $30 to the monthly billing—which was far less compared to our $6,000 saving on RDS. So, overall, it was a big win for us.