Low-hanging fruits to improve performance in your Python code

Low-hanging fruits to improve performance in your Python code

The Internet is full of examples of how to improve Python performance, the problem is that most of the solutions require spending hours on optimization. Surely, they work for specific problems that you may encounter but here I'll focus on providing low-hanging fruits on how to improve performance without overthinking.

Why should I care about performance?

That might be an unpopular opinion but if you're just writing your first small application or just started to learn the language, the short answer is - you shouldn't. But if you're working on an application that has, or eventually will have, dozens of requests per second or is used as a critical part of the system which is essentially made to be fast, then your only way to write a code is to make it as performant as possible.

It matters especially when scale comes in. By that I mean, an improvement of 1% won't be visible on a single request/function call, but will make a huge impact if the system is used worldwide by millions of users or your code is executed thousands of times in different projects, as an example of contribution to the open source projects or libraries. Then the 1% improvement might equal reducing CPU usage by hours which leads towards a reduction of costs running the code (cloud computing is expensive).

At this point, you understand the importance of writing performant code. Let's dive into examples of how to do it.

EAFP

EAFP - Easier to ask for forgiveness than for permission. Definition from official Python docs:

This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.

[source]

The code below shows that principle in the example.

from datetime import datetime

def func1(arr):
    for el in arr:
        if isinstance(el, dict):
            el["datetime"] = datetime.now()

def func2(arr):
    for el in arr:
        try:
            el["datetime"] = datetime.now()
        except TypeError:
            continue

As you can see, func1 every time checks if el is an dict. If so then datetime value is calculated and assigned. While func2 implements EAFP principle - it tries to assign value to datetime without checking the type, and in case of failure (TypeError is raised if it's impossible to assign that value, in this case, when el is not a dict) it just handles the error and continues. The key point here is that both of the functions produce exactly the same result, with differences only in the implementation.

Let's measure the time of running the code for an array that has 10.000.000 (ten million) dicts and 100.000 (one hundred thousand) integers which doesn't support such an assignment.

import timeit

def time_measure(func, arguments):
    dt_start = time.perf_counter()
    func(arguments)
    return time.perf_counter() - dt_start  # in seconds

>>> time_measure(func2, arguments)
3.6360508750076406
>>> time_measure(func1, arguments)
4.4113311669789255

func1 takes ~4.41s to complete while func2 takes only ~3.63s to complete. That's 0.78s difference, func2 is faster by ~21%! 🚀🚀🚀

It's because exception handling is fast and efficient in Python and checking the type isn't. Now you are armed with one powerful tool to improve the performance of your code.

Disclaimer: Please note that it might not be as good in a different scenario.

Let's see another one.

List comprehensions

Definition from official Python docs:

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition. [source]

Cool, it looks like just syntax sugar, but is it?

No, it also provides a more performant way of creating lists. But never trust words found on the internet, let's make a measurement:

def func3(num):
    arr = []
    for _ in range(num):
        arr.append(datetime.now())
    return arr

def func4(num):
    return [datetime.now() for _ in range(num)]

Both functions do exactly the same - they create an array of the length specified in the argument num with elements containing current datetime. So if list comprehensions are just a "concise way to create a list" then the time to perform the operation such be almost the same. But it's not:

>>> time_measure(func3, 10000000)
3.2937017909716815
>>> time_measure(func4, 10000000)
3.1355122089735232

As you can see, it takes ~3.29s to execute the function func3 and ~3.13s for function func4. It means, func4 is faster by 0.15s which equals a performance increase of 5%! 🚀🚀🚀

So utilizing list comprehensions is not only a "concise way to create a list", but it also boosts the performance. It's because in for loop (func3) list is created by appending a new element to the existing array at each iteration, it requires calling a function .append and every additional call costs time. In list comprehension no additional functions are called, it relies on internal Python implementation which is done optimally.

An additional cool fact is that doesn't apply only to lists, it could be used also for dicts, tuples and generators.

Summary

As you can see, by using simple technics you can achieve significant performance improvements in Python code.

EAFP - Easier to ask for forgiveness than permission relies on the fact that exceptions handling in Python are fast. In the provided example it boosted performance by ~21%.

Lists comprehensions are not only a concise way of creating lists in Python, but it also reduces the number of function calls which directly improves performance. In the provided example it was a ~5% increase.

Besides EAFP and lists comprehensions, there are multiple different possibilities to reduce execution time, for example, pre-compiled regexes, lru-caches and C embeddings. They might be a little more complicated and not ideal for every case.