Handling race conditions on a MySQL JSON column in Rails

Freshservice provides intelligent, unified, easy-to-use software that helps businesses modernize employee experience, maximize uptime, and extend services beyond IT. It is a SaaS product powered by Ruby on Rails and is backed by a MySQL database for persistence storage. Incidents is one of the core modules of Freshservice where thousands of write operations happen per minute. We handle a lot of structured and unstructured data, and for certain use cases we make use of the JSON column to store the unstructured data. The writes to a JSON column are prone to data inconsistencies in highly concurrent applications if they aren’t handled properly. We’ll see here how we can handle this issue during concurrent requests. 

Rails’ way of handling updates

Whenever any record is updated, Rails internally tracks the changes made for the particular model using ActiveModel::Dirty. It proceeds with the update only if there are any changed attributes. In the case of a JSON column, Rails does not consider the specific updates that were made to the JSON object as dirty changes, but instead considers the entire JSON object to be part of dirty changes. Also, Rails does not have native methods, which update only the modified keys in a JSON object. Let’s see this with an example.

ticket = Ticket.find_by(id: 1)
ticket.json_column
=> {"city1" => "Mumbai", "city2" => "Chennai"}
ticket.json_column = ticket.json_column.merge({"city1" => "Bangalore", "city3" => "Hyderabad"})

ticket.changes
=> {"json_column"=>[{"city1"=>"Mumbai", "city2"=>"Chennai"}, {"city1"=>"Bangalore", "city2"=>"Chennai", "city3"=>"Hyderabad"}]}

ticket.save
-> UPDATE `ticket` SET `json_column` = '{\\"city1\\":\\"Bangalore\\",\\"city2\\":\\"Chennai\\",\\"city3\\":\\"Hyderabad\\"}' WHERE `ticket`.`id` = 1

ticket.reload.json_column
=> {"city1" => "Bangalore", "city2" => "Chennai", "city3" => "Hyderabad"}

Here we can see that even though we didn’t modify the “city2” key, it is considered as part of the final update query, where the entire JSON object is set again. In short, a patch (or) selective update of the JSON column didn’t happen. This way of updating works fine most of the time, until we get to see the actual issue in concurrent requests.

Issue with concurrent requests

Let’s consider the below scenario, where the requests are concurrent. In both the requests, the ticket is loaded at the same instance, but the request-1 updates before request-2.

def request1
         ticket = Ticket.find_by(id: 1)
         ticket.json_column = ticket.json_column.merge({"city1" => "Mumbai",                "city3" => "Hyderabad"})
          sleep(2)
          ticket.save
          p ticket.reload.json_column

end

def request2

         ticket = Ticket.find_by(id: 1)
         ticket.json_column = ticket.json_column.merge({"city1" => "Delhi", "city4" => "Kolkata"})
          sleep(3)
          ticket.savE
          p ticket.reload.json_column

end

ticket = Ticket.find_by(id: 1)
p ticket.json_column
-> {"city1" => "Bangalore", "city2" => "Chennai"}

thread1 = Thread.new { request1 }
thread2 = Thread.new { request2 }
thread1.join
thread2.join

# Request1
-> UPDATE `ticket` SET `json_column` = '{\"city1\":\"Mumbai\",\"city2\":\"Chennai\",\"city3\":\"Hyderabad\"}' WHERE `ticket`.`id` = 1

# Request2
-> UPDATE `ticket` SET `json_column` = '{\"city1\":\"Delhi\",\"city2\":\"Chennai\",\"city4\":\"Kolkata\"}' WHERE `ticket`.`id` = 1

p ticket.reload.json_column
-> {"city1" => "Delhi", "city2" => "Chennai", "city4" => "Kolkata"}

We can see here that the “city3” added in the first request is missing due to the update done in the second request. The second request is unaware of the latest changes to the JSON column and updates based on the state of the column during the initial load.

Ways of handling the race condition

The simplest way to handle the race condition is by using the JSON_MERGE_PATCH function of MySQL. More about JSON_MERGE_PATCH is explained in this link. Let’s consider the same concurrent requests scenario by updating the JSON column using JSON_MERGE_PATCH.

def request1
        ticket = Ticket.where(id: 1)
        new_value_to_be_updated = {"city1" => "Mumbai", "city3" => "Hyderabad"}.to_json
         ticket.update_all(["json_column = JSON_MERGE_PATCH(json_column, ?)", new_value_to_be_updated])
         sleep(2)
         p ticket.reload.json_column
end
def request2
         ticket = Ticket.where(id: 1)
          new_value_to_be_updated = {"city1" => "Delhi", "city4" => "Kolkata"}.to_json
          ticket.update_all(["json_column = JSON_MERGE_PATCH(json_column, ?)", new_value_to_be_updated])
          sleep(3)
          p ticket.reload.json_column

end

ticket = Ticket.find_by(id: 1)
p ticket.json_column
-> {"city1" => "Bangalore", "city2" => "Chennai"}

thread1 = Thread.new { request1 }
thread2 = Thread.new { request2 }
thread1.join
thread2.join

# Request1
-> UPDATE `ticket` SET `json_column` = JSON_MERGE_PATCH(`json_column`, '{\"city1\":\"Mumbai\",\"city3\":\"Hyderabad\"}') WHERE `ticket`.`id` = 1

# Request2
-> UPDATE `ticket` SET `json_column` = JSON_MERGE_PATCH(`json_column`, '{\"city1\":\"Delhi\",\"city4\":\"Kolkata\"}') WHERE `ticket`.`id` = 1

p ticket.reload.json_column
-> {"city1" => "Delhi", "city2" => "Chennai", "city3" => "Hyderabad", "city4" => "Kolkata"}

Note: The update_all ActiveRecord method used here doesn’t instantiate the model as well as trigger callbacks/validations.

We can see here that the JSON merge is handled at the MySQL engine level instead of the merge at the Ruby level. When the merge is done using Ruby, we lose track of the latest data in db during concurrent requests.

The other way of handling the race condition is by using the Optimistic Locking functionality of Rails, where the updates are restricted by raising ActiveRecord::StaleObjectError if there is a change in the lock version, but this needs handling of the error to update the column again.