Python mutable default arguments - why the snake bites?

Python mutable default arguments - why the snake bites?
Photo by James Wainscoat / Unsplash

I really like to ask simple looking question during interviews, what is the output of given function?

def func(a=[]):
  a.append(1)
  return a

Take a moment, don't go straight to the answer.

answer:

>>> func()
[1]
>>> func()
[1, 1]
>>> func()
[1, 1, 1]
>>>

If you aren't seasoned python developer, or didn't prepare well for most common interview questions you might end up with:

Wut? 🤷‍♂️

In this article I'll explain why having mutable default value in Python isn't the best option unless you really, really know what are you doing.

Memory

Python's memory management happens seamlessly behind the scenes, freeing developers from the burden of manual memory allocation and deallocation. It's freed by garbage collector using reference count. Once the counter goes to zero, object is erased from the memory. Of course it's not that trivial, there are cases of small-word networks (graph theory) where each variable points at each other.

  1. Reference counting: Python keeps a count of how many times an object is referenced. If the count drops to zero, the object is no longer needed, and Python can reclaim the memory.
  2. Garbage collection: When an object's reference count drops to zero, it's not immediately removed from memory. Instead, Python's garbage collector periodically checks for objects that are no longer needed and removes them from the heap.
  3. Memory allocation: When you create a new object, Python asks the memory manager for a chunk of memory from the heap. The memory manager ensures there's enough space available and returns a pointer to the allocated memory.
  4. Memory deallocation: When an object is no longer needed, Python's memory manager marks the memory as free, making it available for future use.

What's important here is that all variables are stored on the heap. Also, because functions are first-class objects, they are stored on the heap as well. When a function is called, Python creates a reference to the function object.

Code

Let's break down the code line by line and analyze what's happening behind the scenes.

def func(a=[]):
  a.append(1)
  return a
  1. In the first line, we define the func function with a default argument a initialized to an empty list []. This creates a new list object on the heap.
  2. The second line appends the integer 1 to the list a. Since a is mutable, the list is modified in place.
  3. Finally, the modified list a is returned.

Now, let's dive deeper into the memory management aspects:

  • func is a function object, which is also allocated on the heap.
  • The default argument a is a list object, which is also stored on the heap.
  • When func is called without providing an argument, it uses the reference to the default a list from the heap.
  • Since the list a is mutable, each time func is called without an argument, the list is modified by appending a new element (in this case, the integer 1).
  • As a result, if func is called multiple times without an argument, the same list a on the heap will accumulate the appended elements.

In other words, calling func() multiple times will modify the same list a on the heap, leading to unexpected behaviour. Now you clearly understand that having mutable default arguments in the function is probably not what you'd like to have in you code 😸

Sentinel

Now that you understand why it happens and the underlying mechanisms, the next logical step is to ask: how can we overcome this?

Problem looks like that: we have function func which changes the list a, it's already used on production so we can't change the contract. We want it to append 1 to the provided list, and if not provided, return [].append(1).

You've already seen the naive solution using an empty list as a default parameter.. The correct solution requires adding sentinel which handles case in which no list is provided. Here's the correct implementation:

def func(a=None):
  if a is None:
     a = []
  a.append(1)
  return a

Et voila! 🚀 It meets the contract - adds the element to provided list, and if the list is not provided, then creates new one. Because a isn't stored in the memory as list, it can't be shared between function calls.

>>> func()
[1]
>>> func()
[1]
>>> func()
[1]
>>> 

In Conclusion


Using mutable function arguments in Python can lead to unexpected behavior, bugs, and memory management issues. It's better to use immutable arguments whenever possible, and to create new objects within functions instead of modifying existing ones.


Remember, when it comes to Python functions, it's better to be safe than sorry. ✅ Don't be the person who introduces not obvious bugs to the project.


If you liked the post, subscribe to the newsletter to never miss new posts and to support this blog 😎

Maciej Marzęta

Maciej Marzęta

Founder of MarzTech. Python Technical Leader at unicorn startup. Crypto enthusiast and programmer for life. marzeta.pl
Cracow, Poland