How a Simple Code Change Reduced CPU Usage by 97%

When a Routine Migration Becomes an Accidental Performance Win

Most engineering teams dread infrastructure migrations. They are tedious, time-consuming, and carry the constant risk of breaking something in production. But every now and then, a migration uncovers something far more valuable than a simple runtime upgrade — it exposes a hidden inefficiency that has been quietly draining resources for years without anyone noticing. That is exactly what happened to the engineering team at Duolingo when they upgraded one of their production services from Python 3.9 on Alpine Linux to Python 3.12 on Debian Bookworm. What started as a routine task ended with a staggering 97% reduction in CPU usage, all thanks to a single code change.

The Service Behind the Story

The service at the center of this story is responsible for handling SMS delivery for Duolingo users in China. This is no trivial workload. The service manages time-sensitive messages such as streak notifications, meaning delays or failures have a direct and measurable impact on the user experience. Phone numbers are stored in encrypted form for security, and the service runs on Alibaba Cloud ACK (Alibaba Cloud Container Service for Kubernetes). Its primary workload is a nightly batch job that kicks off around 23:00 Beijing time every day.

On the surface, the service appeared to be working just fine before the migration. There were no obvious performance complaints, no visible bottlenecks, and no reason for the team to suspect anything was wrong under the hood. The decision to migrate was driven purely by the desire to stay current with a modern Python runtime and a more stable base image — standard engineering hygiene. Nobody expected to find anything alarming.

The Migration That Broke Everything (Temporarily)

After the upgrade was deployed and the next scheduled nightly batch ran, the monitoring dashboards lit up almost immediately. Within a six-minute window, the team was hit with a cascade of alerts: high latency, upstream 5xx errors, and frequent pod restarts. The cluster was struggling to keep up with what should have been a completely normal workload. The timing was impossible to ignore — the alerts started precisely when the upgraded service began processing its batch job, making the correlation between the migration and the failures clear.

The team began investigating. CPU usage was through the roof. The pods were being overwhelmed and crashing under a load they had handled without issue just 24 hours earlier, running on the old Python 3.9 and Alpine stack. Something about the new environment was causing the service to consume dramatically more compute resources than before. But what had changed? The application code itself had not been touched.

Digging Into the Root Cause

The investigation led the team to profile the application's behavior in both environments. What they discovered was surprising: the culprit was not the Python version itself, nor was it something introduced by the Debian base image. Instead, the team traced the excessive CPU consumption back to a hidden inefficiency in how the application was handling a core operation — one that had been running inefficiently all along, but whose cost had simply been invisible until the new environment changed the performance characteristics enough to push it over the edge.

The inefficiency was related to how the service was performing cryptographic decryption of phone numbers at scale. During the nightly batch, the service decrypts a large number of phone numbers stored in encrypted form to send SMS messages. The way the decryption code was written meant it was doing far more computational work than necessary on each call. In the old Python and Alpine environment, this overhead existed but was masked by the specific performance profile of that stack. With Python 3.12 on Debian, subtle differences in how the runtime handled certain operations amplified the inefficiency to a breaking point.

Once the bottleneck was identified, the fix itself was remarkably simple. The team refactored the decryption logic to eliminate unnecessary repeated operations, ensuring that the cryptographic work was done efficiently and only as many times as genuinely needed. The change was small in terms of lines of code, but enormous in terms of impact.

The Results: 97% Less CPU, Same Workload

After deploying the optimized code, the difference was dramatic and immediate. CPU usage dropped by 97% compared to what the service had been consuming during the failing batch run on the new stack. But more importantly, when the team looked back and compared against historical metrics from the old Python 3.9 environment, it became clear that the service had been operating with significant unnecessary overhead for a long time. The migration had not introduced the problem — it had simply made a pre-existing problem impossible to ignore any longer.

The nightly batch now completes successfully, pod restarts are gone, latency is back to normal, and the cluster resources previously consumed by this service have been substantially freed up for other workloads.

Key Lessons Every Engineering Team Should Take Away

Migrations can be diagnostic tools. Upgrading a runtime or base image changes the performance envelope of your application. Hidden inefficiencies that were previously tolerable may become critical failures in a new environment — and that is not necessarily a bad thing. It forces you to confront technical debt you might never have found otherwise.
Profile before you assume. When a migration causes unexpected behavior, resist the temptation to immediately roll back. The new environment may be exposing a real problem in your code that rolling back will simply hide again.
Small code changes can have massive impact. A 97% CPU reduction from a single targeted fix is a powerful reminder that performance optimization does not always require architectural overhauls. Sometimes the biggest wins come from fixing one thing the right way.
Monitor at the operation level, not just the service level. If the team had been tracking CPU cost per individual operation rather than aggregate service metrics, the decryption inefficiency might have been spotted years earlier.
Cryptographic operations deserve scrutiny at scale. When encryption or decryption is applied to large datasets in batch jobs, even small per-operation inefficiencies multiply rapidly. These operations should be reviewed with the same performance lens applied to database queries or network calls.

The Broader Takeaway for Backend Performance

Duolingo's story is a compelling example of how performance optimization opportunities often hide in plain sight, masked by the quirks of a familiar environment. It took a migration — an event most engineers approach with caution rather than excitement — to reveal an inefficiency that had been silently costing the company compute resources for years. The fact that the fix was a simple code change makes the story even more instructive. You do not always need to scale horizontally, add caching layers, or redesign your architecture to achieve meaningful performance gains. Sometimes, you just need to look more carefully at what your code is actually doing and ask whether it needs to do all of it.

For engineering teams running batch workloads at scale, especially those involving encryption, external API calls, or heavy data processing, this case study is a strong argument for regular profiling, even when everything appears to be working. The absence of visible problems is not the same as the absence of inefficiency. As Duolingo's team discovered, the most impactful optimization you will ever make might already be hiding somewhere in your codebase, waiting for the right conditions to make itself known.