democratic-design

How we prepped our data sync platform for the future—and saved $1 million

Written by on March 25, 2020

In today’s dynamic world, data is dynamic too. This is especially true of data about customers, prospects, and leads that most companies have and try to keep as current as possible. And when you consider the task of updating the humongous amounts of data residing in systems across multiple products—be it Freshdesk, Freshsales, Freshservice, or any other third-party plug-in—”being in sync” becomes all the more challenging.  

So how do we sync data between different applications? What are the challenges in keeping things integrated as frequently as possible? How do you keep sync failures to a minimum?

The answer, in one word, is Freshpipe. Freshpipe is our integration platform as a service (IPaaS) that has its roots in our January 2017 acquisition of Pipemonk, a data integration solution to automate data flows across multiple cloud products.

However, there was some heartburn before we could be proud of smoking Freshpipe, our new data darling. After roughly two years of the Pipemonk acquisition—as the number of syncs grew significantly on the back of more products and rising volumes of data—the re-christened avatar was in the middle of an existential crisis. Things almost came to a head when the Freshsales team, facing a lot of customer churn, began looking for an alternative from an external vendor. They told us that the 12% sync errors occurring between Freshdesk and Freshsales would just not cut it. 

It was a time of reckoning for all of us at Freshpipe.

Thankfully, not only did we turn Freshpipe into a thriving data integration platform, we also saved a lot of money by avoiding exorbitant expenses associated with an external solution.

Attacking it persistently through automation

What we did to tackle the situation at that time was two things: one, we agreed to evaluate an external vendor that could be the best data integration bet for Freshsales (if they had to look outside). And two, we consistently kept attacking the problem of bringing down the sync error rates and improving our platform.

After evaluating a few third-party platforms available in the market, including PieSync, Zapier, MuleSoft, and Cloud Elements, we gave our recommendation to the Freshsales team—and went back to work on our own.

By the second quarter of 2019, we reduced the error rate to 10%, but it was still quite high. 

We painstakingly documented every single thing that the Freshpipe sync app did at that time for the integrations we had built: Freshsales-Hubspot, Freshsales-Mailchimp, and Freshsales-Freshdesk.

Now, there are ‘leads’ and ‘contacts’ in Freshsales and data exists under a number of fields. The idea of a sync with Freshdesk happening correctly means updating anything in one software should accurately reflect in the other. What we did was put down what happened when each field in one tool was changed and the sync performed—and vice versa. To understand and attack the problem, such a document was necessary so we created a comprehensive one, which allowed us to look at our data and see what syncs failed and why. 

Next, we wrote test cases for each of the syncs that were expected. Also, to speed up the process, we wrote code for automating it. For instance, if you update the name of a contact in Freshsales and then perform the sync to check if the same update happened in Freshdesk—this was carried out by the automated code. 

The automation allowed us to quickly pinpoint the areas where the most sync failures were happening, and we decided to address them first. So it was a data-driven approach to fix errors. 

Then around July 2019, we also automated the process of looking at the error logs. This automation would mean that every day, we didn’t have to go and look at the errors that happened the previous day—we followed a 12-hour sync frequency typically—in order to create tickets representing the error categories. Through automation, we were able to pull out the error reports and could now better prioritize the errors to be fixed. 

Through this constant automation, prioritization, and error-fixing, we brought down the number of sync failures dramatically in a relatively short period of time. 

When we started this automated process in May, we created around 150 error tickets a day. By the end of August, this had come down to 45 a day. At the end of the year in December, it was barely around 15.

All of this resulted in the sync accuracy improving and stabilizing at approximately 98% across the multiple integrations we were running.

How we reaped extra benefits

Around November 2019, Freshsales stopped their review of Cloud Elements and told us they were happy with how the Freshpipe iPaaS was shaping up. As per our calculation, had our company gone ahead with the external vendor, it would have resulted in an additional $1 million cost to us. With Freshpipe’s continuation, we were able to save on this. 

The number of L2 tickets, which are the tickets raised by end-customers that come to the engineering team for support, saw a sharp drop  simply because we could now find errors before the end-customers discovered them—and often fixed them even before they could react. In many cases, the customers might not even know that something went wrong because we would have fixed it already. 

This was achieved by educating the customer support team on the internals of the Freshpipe platform and giving some of the team members privileged access for quickly resolving issues. So rather than the issue going from the customer to support and then from customer support to engineering and all the way back, we eliminated one whole hop. 

The challenges we faced

In our data sync journey, we faced a few key challenges. For one, when a new integration deployment had to be done, it would typically take five to six hours. During this time, the Freshpipe service became unavailable for 15 to 30 minutes. We did not want that disruption to happen. So we automated the deployment using a script. Alongside, we also introduced a ‘dual-stack system’ under which we could do the deployment in one stack while the other stack functioned as usual. The combination of an automation script and dual-stack not only reduced our deployment time to less than an hour, it also reduced the downtime to almost zero. 

An added advantage of doing a script-based deployment was that anyone in the team could be authorized to do it with a simple, single-click process as opposed to earlier when you needed to know our system thoroughly in order to do it. 

Another challenge was related to ‘session management’: when you log in, you enter your username and password only once—after that, your session is maintained in a key called the session key. This key is used for further authentication for whatever calls you make to the platform. To maintain the session key, we were using a technology called Hazelcast. 

Now, this key needed to be propagated across all the machines in our cloud hosted at AWS, and that’s where we faced a snag. Somehow, the ‘handshake’ that was supposed to happen when machines booted up was not occuring. 

To solve this problem, we removed the session keys from being maintained within individual machines in our cluster to a central point called Redis. So any machine that needed the key could now connect to Redis and fetch it from there.

One more problem we faced when working on Freshpipe had to do with the way we used MongoDB, the document storage database for apps. We did not have the old code base but were continuing to use the system (The code was written by the previous team working on Pipemonk, which had either left or shifted to some other project). Also, over the course of about two years, the size of this database had grown quite large—around 3TB. The database was not getting loaded and we were unable to write anything to it.

To address this issue, we dropped data older than 30 days as well as certain “collections” in MongoDB that were no longer in use. In addition, we removed unused “indexes”.

Earlier, our tech stack comprised five components—Redis, Hazelcast, MySQL, Mongo, and RabbitMQ. In the new architecture, we reduced it to just two, MySQL and Redis. This has really simplified and sped up the system and is symbolic of the democratic design practices we follow at Freshworks.

The future is in the pipeline 

The biggest thing we plan going forward is to move to real-time data syncs. Earlier, we were performing the syncs every 12 hours. Initially, we thought of reducing this time to three or four hours, but why not try real-time syncs? By the end of Q1, we are looking to move toward real-time syncing for the five integrations we have built thus far: Freshsales-Freshdesk, Freshdesk-Hubspot, Freshsales-Mailchimp, Freshsales-Hubspot, and Freshteam-Freshservice.

Another important step that is in the works is the opening of our platform to marketplace developers. At the moment, it is only the Freshpipe team that writes and publishes the sync apps. But very soon, we’ll be enabling Freshworks marketplace developers to use Freshpipe to create sync apps of their own. We are trying to expose REST APIs, which can be leveraged by developers to write those apps. This, we believe, will truly unlock the potential of Freshpipe, as people will be able to write sync apps for any use case specific to the products they are using.

By the end of the year, we expect 20-30 sync apps to be written on top of our platform. We want the amount of time it takes a developer to write and publish a sync app—which currently stands at close to a month—to be not more than a week. And with the new building blocks we have put in place, that should be a breeze.

Co-author: Vijay Lakshminarayanan
Writing by:
Sanjay Gupta

Sharon Kumar

Senior Software Engineer