Back

Mutable and immutable objects and object referencing

Introduction
When you read about Python, you will frequently come across the terms 'mutable' and 'immutable'. Some objects in Python, such as strings, integers, floats, bytes and tuples, are immutable. Other objects, such as lists, dictionaries and sets are mutable. It is important to understand what these terms mean, and to understand how Python works with different objects. If you don't, you will soon be puzzled by how some of your code behaves!

Strings are immutable
Strings in Python in Python are immutable. That means that once you have created the string object, you can't modify it. To understand this, we need to look at what happens when a string is created in Python. Typically, you will use a variable and assign it to a string value. Here is an example:

my_data = 'car'

Python first creates an object in its memory and this object holds the string 'car'. It also stores an extra piece of information in the object header that tells Python that this is a string object.  A variable called my_data is then created in a different part of the computer's memory. This is just a name. Finally, a reference or pointer is created and stored with the variable name. This points from the variable name to the actual object. We can represent this using the following diagram:

obj1 

id and type
When an object is created in Python, other pieces of information are also created and stored, in the object's header. Each object is allocated an id number. The data type of the object is also held. In the above example, our object might actually look like this:

objref

You can access the information in the header at any time. They can be quite useful when developing a program if you are not sure if you are dealing with the same object as another one or need to check what data type an object is. In the above example, we can display the id and type of my_data like this:

print (id(my_data))
print (type(my_data))

Note that the id number will be different each time the program is run!

Consider this code:

temp = 'hello'
print (id(temp))

temp = temp.upper()
print (id(temp))

When you run this, you can see that an object was created to hold 'hello' and then a different object was created to hold 'HELLO', even though they used the same variable name! This is because strings are immutable. Once created, they can't be changed. If a change is needed, a new object is created.

Variable assignment and use
Whenever you use a variable in a program, Python looks up the variable name, follows the pointer information held with the variable name and then immediately replaces the variable name in your program with the actual object that the variable is pointing to. In addition to this, you must make sure that any variable you use is assigned to an object before it is used. If you don't, you will get an error message and your program will stop. Remember, everything in Python is an object, including a string, an integer, a float, a list, a tuple, a dictionary and so on. A variable name must be assigned to one of these objects before you can actually use it. 

So what is immutability?
Now we understand that Python uses variable names that reference objects, we can begin to understand what immutability means. In Python, you cannot change a string object in memory once it has been created. You couldn't do this, for example, to change car into cat:

my_data[2] = 't'

You will just get an error. However, you can tell Python to point the variable at a different object. For example, consider the following program:

my_data = 'car'
my_data = 3.14
my_data = 17
my_data = 'Hello world'

In the first line of our four-line program, a string object 'car' is created in memory. The variable name my_data is also created in a different area of memory and a reference from my_data to the string object is stored with the variable name. Now in the second line of our program, a new float object is created that holds 3.14. The reference associated with the variable my_data, which is already in memory and is pointing to 'car', is now changed so that it points to the float object. This is called 'rebinding' the variable because we are effectively attaching it (or rather pointing it) to a different object. Note, however, that the original string object is still in memory - but nothing points to it anymore! Now in the third line of the program, a new integer object is created, and again, the my_data variable is rebound so that it points to 17. Finally, we create yet another string object and again, rebind the variable my_data to this new string object. We now have my_data pointing to a string object holding 'Hello world', along with three other objects in memory, which aren't actually being pointed to by any variable. We have this situation:

obj2

So, our string object 'car' couldn't be changed. We couldn't change any of the letters in the string. It's immutable. However, we could rebind the variable as many times as we liked so that it pointed at a different object.

Garbage collection
What happens to objects that are not referenced anymore, as in the previous example? The answer is that Python automatically reclaims the space that those objects take up and puts the memory back into a pool of available memory locations, ready for use later by a different part of the program or a different program altogether. This is a process known as 'garbage collection' and happens automatically in Python. Each object has a counter. When a variable references it, the counter is increased by one. Sometimes, two counters might reference the same object, so the counter would equal two. However, as soon as the counter equals zero, that means that no variable is referencing it, and the object is garbage collected.

Shared references and counting references
Sometimes, more than one variable points to the same object. For example, consider the following code:

data1 = 'cats and dogs'
data2 = 'cats and dogs'

Are there two string objects, each holding 'cats and dogs'? Actually, Python does something very clever! It sees that the string object is the same so it saves memory; it creates just one string object and gets both variables to point to it. It can only do this because the object is immutable. We can represent what is happening with this diagram:

obj3 

If you ever want to see how many references there are to an object, you can import the sys library and use the getrefcount function, like this:

import sys

print (sys.getrefcount(1))
print (sys.getrefcount(data1))

The object holding the integer 1 is used many times by Python, not just in any program you have written!

Using 'is'
At any time, we can always check to see if two variables are pointing to the same object, by using the keyword is. For example:

data1 = 'cats and dogs'
data2 = 'cats and dogs'
print (data1 is data2)

This will return True, because both variables are pointing to the same object. Do note that this is different to testing the data using ==. When we use == we are testing to see if the data is the same. When we use the is keyword, we are testing to see if the variables point to the same object in memory.

As before, we can rebind any variable. What do you think would happen if we ran this code?

data1 = 'cats and dogs'
data2 = 'cats and dogs'
data2 = 'birds and bees'

print (data1 is data2) 

We have rebound data2. It is pointing to a different string object so data1 is data2 is now False. 

How about this code - will it display True or False?

data1 = 'cats and dogs'
data2 = 'Cats and dogs'
print (data1 is data2)

The two strings are different because of the capital letter so False will be displayed.

How about this code - will it display True or False?

data1 = 'cats and dogs'
data2 = data1
data1 = 'birds and bees'
print (data1 is data2)

We assigned data2 to whatever data1 was pointing at and then data1 was rebound to a different string object. They are therefore pointing at different objects so False will be displayed when the code is run.

Mutable objects and aliasing
When we created two identical string objects with two variable names, Python very cleverly created just one string object and pointed both variables to it. It could do this only because the object being created was immutable. If we tried to create two list objects holding the same data, however, Python would not create just one list object and then point the variable names to it. It wouldn't do this because the objects are mutable. We can show what happens in a diagram when we create two list objects, each one holding the following string data - apples, bananas and pears

obj4 

Python creates two different objects because they are mutable - we might want to change the contents of either list at some point in the future! Consider this code;

data1 = ['apples', 'bananas', 'pears']
data2 = ['apples', 'bananas', 'pears']
print (data1 is data2)
print (data1 == data2)

When we used immutable objects, data1 is data2 would return True. However, this time, Python returns False. Python didn't create just one object and get both variables to point to it because we are using mutable objects now. Note, however, that the content of the lists are the same so data1 == data2 is indeed True.

Just to complicate things very slightly with mutable objects, we could do this:

data1 = ['apples', 'bananas', 'pears']
data 2 = data 1
print (data1 is data2)
print (data1 == data2)

What we have now done is to create an alias for data1 called data2. Both data1 and data2 now do undeed point to the same object this time! Python returns True twice. This can be represented using the following diagram:

obj5

Cloning lists
If you ever need to change a list, but keep the original intact, you need to clone the list, not just the reference to the list. This can be done using the slice operator. For example:

data1 = ['apples', 'bananas', 'pears']
data2 = data1[:]

These are different objects now. If we modify the first element in the first list, and then print out both lists, we can see that they are indeed separate lists.

data1[0] = 'pineapple'
print (data1)
print (data2)

Summary
You always need to assign a variable before it can be used. It is then important to understand the distinction between a variable, a reference and the object it is pointing to. Variables can be rebound to point at different objects. Some objects in Python, such as strings, integers, floats, bytes and tuples, are immutable. Other objects, such as lists, dictionaries and sets are mutable. Python deals with mutable objects in a different way to immutable ones so you need to know what kind of object you are working with. Objects can be pointed to by more than one variable. If no variable is referencing an object then Python's automatic garbage collection system reclaims the memory space it was taking up, so it can be used for other purposes. 

Back