Learning from Python app to Golang app migration

We have a legacy Python application responsible for processing Kafka events. The Python app processes the event and updates the results in the database, Redis, and other services. The performance of this app is crucial, and the processing of each event has to happen in a few milliseconds. This service’s growth over the previous few years has been exponential. We tried to optimize the Python app, but we needed more. 

The Python app’s performance was slow. Under heavy load, it wouldn’t process fast enough and Kafka consumer lag would increase. Some of the reasons were:

  1. The initial Python app was written in non-async mode
  2. Using CPU profiling, we found that detailed logging consumed lots of CPU cycles
  3. These benchmarks show that Python is generally slower than Golang

Non-performance-related reasons were:

  1. The Python app relied on custom code to observe and log the metrics. The latest observability solutions, like Prometheus, were not integrated
  2. The Python app had low test coverage. This made the overall feature release very slow

Golang has a wide range of features that complement the requirements of modern cloud computing: 

  1. Golang has built-in support for concurrency
  2. Golang is a statically typed language. This allows the compiler to optimize the code more aggressively
  3. Golang has a wide range of library support required for our use case
  4. Golang has a built-in testing framework. Adding test cases from the initial development phase helps with faster development
  5. Golang offers cross-platform compilation, allowing developers to build and compile applications for multiple architectures and operating systems. See the complete list here

We chose to migrate our application to Golang due to its simplicity of development and potential future needs. 

Migration

The Python app was not converted to Golang all at once. Each function was converted individually. Golang has a different naming convention. Some Python functions and methods had names that were not clear or descriptive. These names were changed to make them more understandable and easier to use.

To ensure that we could track which Python function or method had been developed in Golang, we added the Python function or method name to the Golang code documentation. This allowed us to cross-check the code in both languages.

In Python, coroutines were written with asyncio library. 

import asyncio

async def hello():

print(“Hello, world!”

async def main():

task = asyncio.create_task(hello())

await task

if __name__ == “__main__”:

asyncio.run(main())

They were converted to goroutines in Golang.

package main

import (

“fmt”

“time”

)

func hello() {

fmt.Println(“Hello, world!”)

}

func main() {

go hello()

time.Sleep(1 * time.Second)

}

At regular intervals, we were looking at the processing speed. A Python script with coroutines was used to generate events for load testing. Python’s async library creates coroutines every second. Each coroutine will independently generate a specified number of events.

import asyncio

class EventProducer:

def __init__(self):

self.loop = asyncio.get_event_loop()

async def produce_events(self, count:int):

“””

We will call kafka producer to publish the event.

“””

pass

async def periodic_produce(self, msg_per_sec: int, num_secs: int, ) -> None:

“””

A coroutine function that periodically produces messages at a certain rate for a specified number of seconds.

Args:

msg_per_sec (int): The number of messages to produce per second.

num_secs (int): The total number of seconds to produce messages for.

Returns:

None.

“””

for _ in range(num_secs):

tasks = []

tasks.append(self.loop.create_task(self.produce_events(msg_per_sec), name=“produce”))

await asyncio.sleep(1.0)

await asyncio.gather(*tasks)

CPU profiling helped in choosing the functions and methods that mattered most. This code will store the pprof file in a pod after 10 minutes. We would conduct our test in those 10 minutes.

package main

import (

“log”

“os”

“runtime/pprof”

“time”

)

func run(){

// actual code here

time.Sleep(10 * time.Minute)

}

// This Golang code will do CPU profling for first 10 min

func main() {

f, perr := os.Create(“/tmp/cpu.pprof”)

if perr != nil {

log.Fatal(perr)

}

pprof.StartCPUProfile(f)

time.AfterFunc(10*time.Minute, func() {

pprof.StopCPUProfile()

log.Println(“Stopping CPU Profile”)

})

run()

}

Then we copied the pprof file to local to visualize the CPU-intensive functions and methods using the following bash commands:

kubectl cp <namespace>/<pod-name>:/tmp/cpu.pprof ~/Downloads/cpu.pprof

go tool pprof-http=”:8000″ ~/Downloads/cpu.pprof

python golang code
This is the flame graph of the initial Golang code. Function block width shows the time spent in that function. The darker the block’s color, the more CPU cycles it consumes. This helps focus on areas to improve performance.

We verified our code for race conditions.

go run -race -tags musl cmd/app/*

We started small, and as each feature was completed, we added documentation and test cases to it. We later automated the code testing with Jenkins and GitHub.

Difficulties

We were reading the Python code and converting it to Golang. Existing logic has to be converted line by line. But there were some difficulties, such as:

  1. In Python, we can create and return complex structures on the fly. This becomes the difficult part in Golang.
    • In Golang, it is essential to define a struct before utilizing it in any function, which contrasts with the flexibility of Python, where any type of data can be returned and the code will execute. While writing code, this difference can be frustrating as it disrupts the flow and continuity of development
    • In Golang, we have to stop writing the logic and decide whether the data type is reused or whether we have to create a new one. This becomes difficult with a large code base
  2. In Python, it’s easy to work with JSON. We can treat JSON data as a dictionary and write our code accordingly.
    • In Golang, we must first understand the JSON code’s structure and write an equivalent Golang struct
    • In Golang, creating a struct from a large, nested JSON document is difficult. Using an online or offline tool to convert a large, nested JSON document into Golang code is useful
    • In Golang, we have to be sure about the type of values. Writing multiple marshal and unmarshal methods for custom types is overwhelming for those coming from Python. This also helped to make sure we were not reading or writing to unwanted data types

Learnings

  1. Golang is a statically typed language. VS Code will continuously inform us if our code is not compiling. Most of the runtime errors were related to returning an empty array or null pointer from a function
  2. Using Docker-Compose at the start of this project was useful to maintain development speed. Writing test cases helped us quickly evaluate our changes
  3. We did CPU profiling to understand which part of the code is used most
  4. We implemented time metrics in Prometheus format and utilized Grafana to visualize them. This proved to be highly beneficial in analyzing the performance of our application

Scaling results

We were able to get an observable increase (5x) in performance by doing the following things:

  1. We developed a Golang codebase with synchronous processing to ensure careful handling of goroutines. The initial sync code achieved a processing speed comparable to the initial Python code
  2. Upon incorporating goroutines, our application was faster than the Python application
  3. We significantly improved by transitioning from auto-commit to manual async-commit in Kafka
  4. We implemented a caching mechanism using Redis to store frequently used DB query results. This caching mechanism significantly enhanced our processing capabilities
  5. We avoided logging nested structs and only logged them when it was required. Serialization and formatting of nested structs are expensive operations 

In conclusion, converting Python code to Golang was a challenging but rewarding experience. I think Golang is a great language for building fast and scalable web applications.