This post describes the CPython implementation of the list object. CPython is the most used Python implementation.
Lists in Python are powerful and it is interesting to see how they are implemented internally.
Following is a simple Python script appending some integers to a list and printing them.
>>> l = [] >>> l.append(1) >>> l.append(2) >>> l.append(3) >>> l [1, 2, 3] >>> for e in l: ... print e ... 1 2 3
As you can see, lists are iterable.
A list object in CPython is represented by the following C structure. ob_item is a list of pointers to the list elements. allocated is the number of slots allocated in memory.
typedef struct { PyObject_VAR_HEAD PyObject **ob_item; Py_ssize_t allocated; } PyListObject;
Let’s look at what happens when we initialize an empty list. e.g. l = [].
arguments: size of the list = 0 returns: list object = [] PyListNew: nbytes = size * size of global Python object = 0 allocate new list object allocate list of pointers (ob_item) of size nbytes = 0 clear ob_item set list's allocated var to 0 = 0 slots return list object
It is important to notice the difference between allocated slots and the size of the list. The size of a list is the same as len(l). The number of allocated slots is what has been allocated in memory. Often, you will see that allocated can be greater than size. This is to avoid needing calling realloc each time a new elements is appended to the list. We will see more about that later.
We append an integer to the list: l.append(1). What happens? The internal C function app1() is called:
arguments: list object, new element returns: 0 if OK, -1 if not app1: n = size of list call list_resize() to resize the list to size n+1 = 0 + 1 = 1 list[n] = list[0] = new element return 0
Let’s look at list_resize(). It over-allocates memory to avoid calling list_resize too many time. The growth pattern of the list is: 0, 4, 8, 16, 25, 35, 46, 58, 72, 88, …
arguments: list object, new size returns: 0 if OK, -1 if not list_resize: new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6) = 3 new_allocated += newsize = 3 + 1 = 4 resize ob_item (list of pointers) to size new_allocated return 0
4 slots are now allocated to contain elements and the first one is the integer 1. You can see on the following diagram that l[0] points to the integer object that we just appended. The dashed squares represent the slots allocated but not used yet.
Append operation amortized complexity is O(1).
We continue by adding one more element: l.append(2). list_resize is called with n+1 = 2 but because the allocated size is 4, there is no need to allocate more memory. Same thing happens when we add 2 more integers: l.append(3), l.append(4). The following diagram shows what we have so far.
Let’s insert a new integer (5) at position 1: l.insert(1,5) and look at what happens internally. ins1() is called:
arguments: list object, where, new element returns: 0 if OK, -1 if not ins1: resize list to size n+1 = 5 -> 4 more slots will be allocated starting at the last element up to the offset where, right shift each element set new element at offset where return 0
The dashed squares represent the slots allocated but not used yet. Here, 8 slots are allocated but the size or length of the list is only 5.
Insert operation complexity is O(n).
When you pop the last element: l.pop(), listpop() is called. list_resize is called inside listpop() and if the new size is less than half of the allocated size then the list is shrunk.
arguments: list object returns: element popped listpop: if list empty: return null resize list with size 5 - 1 = 4. 4 is not less than 8/2 so no shrinkage set list object size to 4 return last element
Pop operation complexity is O(1).
You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4.
Let’s pop one more element. In list_resize(), size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3.
You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3.
Python list object has a method to remove a specific element: l.remove(5). listremove() is called.
arguments: list object, element to remove returns none if OK, null if not listremove: loop through each list element: if correct element: slice list between element's slot and element's slot + 1 return none return null
To slice the list and remove the element, list_ass_slice() is called and it is interesting to see how it works. Here, low offset is 1 and high offset is 2 as we are removing the element 5 at position 1.
arguments: list object, low offset, high offset returns: 0 if OK list_ass_slice: copy integer 5 to recycle list to dereference it shift elements from slot 2 to slot 1 resize list to 5 slots return 0
Remove operation complexity is O(n).
That’s it for now. I hope you enjoyed the article. Please write a comment if you have any feedback. If you need help with a project written in Python or with building a new web service, I am available as a freelancer: LinkedIn profile. Follow me on Twitter @laurentluce.
tags: Python
posted in Uncategorized by Laurent Luce
Follow comments via the RSS Feed | Leave a comment | Trackback URL
TechnoBits.net wrote:
Very good article, uncovers the most used feature in python.
Link | March 10th, 2011 at 9:53 pm
klonuo wrote:
Very interesting article
I read it in my RSS reader as part of “Planet Python” feeds and was then “forced” to launch Firefox and see your blog
I hope for other interesting post in future
Cheers
Link | March 11th, 2011 at 3:44 am
Ivan wrote:
Hi,
Could you give us an idea of the running time of these operations?
O(1) for append ?
O(n) for insert ?
And what would be the total running time to append 1000 elements to a [] taking into account operations wasted on growing the list?
Link | March 12th, 2011 at 12:41 pm
Laurent Luce wrote:
@Ivan: I updated the article with the following information. Append is O(1), Insert is O(n), Pop is O(1), Remove is O(n). Sort is O(n log n). You can see more of those here.
Appending n elements is O(n). For n = 1000, the list is going to be resized 27 times. The growth pattern is the following: 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120. This means 27 calls to realloc. It doesn’t change the complexity. It is still O(n).
Link | March 13th, 2011 at 5:27 pm
tem wrote:
L.pop([index]) -> item — remove and return item at index.
I’m guessing pop is O(n) if you pop the first element.
Link | March 14th, 2011 at 5:49 am
Ganesh wrote:
Can you explain how is list implemented so as it can hold different datatypes.
Link | September 11th, 2011 at 10:59 am
Laurent Luce wrote:
A list object contains a list of pointers to PyObjects. A PyObject contains a pointer to the corresponding type object. See object.h in the Python source code for more details.
Link | October 10th, 2011 at 10:45 pm
Jan wrote:
Very interesting reading indeed, thanks a lot I found an article about list comlexity, but these memory pictures are really helpful from the point of view of implementation
Link | February 15th, 2012 at 7:37 am
dan wrote:
tem: agreed. I think the python wiki article can be misleading if you don’t read carefully. pop(0) becomes equivalent to remove, which is O(n) for a basic list.
Link | April 23rd, 2012 at 8:31 am
Leo Dirac wrote:
I don’t see how append can always be O(1). When resize is needed, sometimes it will need to copy the old list of pointers into the newly allocated space, right? The C function realloc can sometimes just expand the block of memory in place, but sometimes needs to copy it. I think append will be O(N) sometimes, with an amortized rate something less than that. I don’t exactly follow the resize formula, but it looks like it’s growing quadratically rather than exponentially which means the amortized cost would be O(sqrt N) rather than O(log N).
Link | August 2nd, 2013 at 10:47 am
Laurent Luce wrote:
@Leo: From the Python wiki on operations time complexity: “These operations rely on the Amortized part of Amortized Worst Case. Amortized worst case of append is O(1). Individual actions may take surprisingly long, depending on the history of the container. Internally, a list is represented as an array; the largest costs come from growing beyond the current allocation size (because everything must move), or from inserting or deleting somewhere near the beginning (because everything after that must move). If you need to add/remove at both ends, consider using a collections.deque instead.”
Link | August 31st, 2013 at 11:28 am
Laurent wrote:
Question just for the sake of understanding the internals: does it make any difference between calling l += [4] and l.append(4) ?
It looks like the former is a slightly heavier (albeit not significantly) as it’s creating a new list and merging the two lists whereas the latter directly calls INPLACE_ADD (from what dis.dis is saying)
Link | January 6th, 2014 at 1:56 pm
arete wrote:
hii Laurent Luce’
Most used feature in python
Really exciting reading through without a doubt, thank you a great deal I found a write-up in relation to record complexity, however these kind of memory pics are actually beneficial from the point of view of rendering.
Link | January 6th, 2014 at 10:24 pm
winnie wrote:
Hello, Laurent.
You wrote following:
>>You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4.
Does that mean that Integer 4 will still be accessible to GC and won’t be collected till list will be really shrinked?
Link | April 12th, 2014 at 5:33 am
Laurent Luce wrote:
@winnie: The integer 4 object will be collected when there is no reference to it. That means when there is nothing pointing to it in the list and when there are no references to it outside the list.
Link | May 25th, 2014 at 9:13 pm
Malhar Vora wrote:
Very nice and informative article. Thanks Laurent.
Link | June 24th, 2014 at 9:03 am