Python mutable default arguments - why the snake bites?
I really like to ask simple looking question during interviews, what is the output of given function?
def func(a=[]):
a.append(1)
return a
Take a moment, don't go straight to the answer.
answer:
>>> func()
[1]
>>> func()
[1, 1]
>>> func()
[1, 1, 1]
>>>
If you aren't seasoned python developer, or didn't prepare well for most common interview questions you might end up with:
Wut? 🤷♂️
In this article I'll explain why having mutable default value in Python isn't the best option unless you really, really know what are you doing.
Memory
Python's memory management happens seamlessly behind the scenes, freeing developers from the burden of manual memory allocation and deallocation. It's freed by garbage collector using reference count. Once the counter goes to zero, object is erased from the memory. Of course it's not that trivial, there are cases of small-word networks (graph theory) where each variable points at each other.
- Reference counting: Python keeps a count of how many times an object is referenced. If the count drops to zero, the object is no longer needed, and Python can reclaim the memory.
- Garbage collection: When an object's reference count drops to zero, it's not immediately removed from memory. Instead, Python's garbage collector periodically checks for objects that are no longer needed and removes them from the heap.
- Memory allocation: When you create a new object, Python asks the memory manager for a chunk of memory from the heap. The memory manager ensures there's enough space available and returns a pointer to the allocated memory.
- Memory deallocation: When an object is no longer needed, Python's memory manager marks the memory as free, making it available for future use.
What's important here is that all variables are stored on the heap. Also, because functions are first-class objects, they are stored on the heap as well. When a function is called, Python creates a reference to the function object.
Code
Let's break down the code line by line and analyze what's happening behind the scenes.
def func(a=[]):
a.append(1)
return a
- In the first line, we define the func function with a default argument a initialized to an empty list []. This creates a new list object on the heap.
- The second line appends the integer 1 to the list a. Since a is mutable, the list is modified in place.
- Finally, the modified list a is returned.
Now, let's dive deeper into the memory management aspects:
func
is a function object, which is also allocated on the heap.- The default argument
a
is a list object, which is also stored on the heap. - When
func
is called without providing an argument, it uses the reference to the defaulta
list from the heap. - Since the list
a
is mutable, each timefunc
is called without an argument, the list is modified by appending a new element (in this case, the integer1
). - As a result, if
func
is called multiple times without an argument, the same lista
on the heap will accumulate the appended elements.
In other words, calling func()
multiple times will modify the same list a
on the heap, leading to unexpected behaviour. Now you clearly understand that having mutable default arguments in the function is probably not what you'd like to have in you code 😸
Sentinel
Now that you understand why it happens and the underlying mechanisms, the next logical step is to ask: how can we overcome this?
Problem looks like that: we have function func
which changes the list a
, it's already used on production so we can't change the contract. We want it to append 1
to the provided list, and if not provided, return [].append(1)
.
You've already seen the naive solution using an empty list as a default parameter.. The correct solution requires adding sentinel which handles case in which no list is provided. Here's the correct implementation:
def func(a=None):
if a is None:
a = []
a.append(1)
return a
Et voila! 🚀 It meets the contract - adds the element to provided list, and if the list is not provided, then creates new one. Because a
isn't stored in the memory as list, it can't be shared between function calls.
>>> func()
[1]
>>> func()
[1]
>>> func()
[1]
>>>
In Conclusion
Using mutable function arguments in Python can lead to unexpected behavior, bugs, and memory management issues. It's better to use immutable arguments whenever possible, and to create new objects within functions instead of modifying existing ones.
Remember, when it comes to Python functions, it's better to be safe than sorry. ✅ Don't be the person who introduces not obvious bugs to the project.
If you liked the post, subscribe to the newsletter to never miss new posts and to support this blog 😎
Member discussion