Removing duplicates from a list is a task that might happen more often than you think.
Maybe you are importing a bunch of rows from a CSV file and want to make sure you only have unique values.
Or you are making sure to avoid repeated values for the sake of keeping your data sanitized.
Fortunately, you can drop duplicates from a list in Python with a single line.
This is one of those simple, but powerful features that Python gives us for free and can save you a lot of trouble by applying the Pythonic way of doing things.
Removing duplicates with set
In the code snippet below we are creating a list named car_brands
.
Notice how 'bmw'
and 'toyota'
are repeated.
'bmw'
is included twice, while 'toyota'
appears three times.
To drop these duplicates we just need to convert the list to a set and then convert the result back to a list.
car_brands = ['bmw', 'mercedes', 'toyota', 'mclaren', 'toyota', 'bmw', 'toyota']
print(car_brands)
car_brands = list(set(car_brands))
print(car_brands)
The output of the code above is:
['bmw', 'mercedes', 'toyota', 'mclaren', 'toyota', 'bmw', 'toyota']
['toyota', 'mercedes', 'bmw', 'mclaren']
This works because sets, by default, don’t allow duplicates, so converting the list to set will automatically remove the duplicates.
But there is a catch, sets don’t keep the order of your items, while lists do keep the order of the items
Notice how 'toyota'
appears as the first item in the final result, even though it was the third in the original list.
So, what to do if I want to remove the duplicates but keep the order of the items?
Droping duplicates and keeping the order with dict
The simple and "straightforward" (but not recommended) way would be to loop of the original list and add only new items to a new list.
The code below implements such logic.
car_brands = ['bmw', 'mercedes', 'toyota', 'mclaren', 'toyota', 'bmw', 'toyota']
new_brands = []
for item in car_brands:
if item not in new_brands:
new_brands.append(item)
print(car_brands)
print(new_brands)
The output is:
['bmw', 'mercedes', 'toyota', 'mclaren', 'toyota', 'bmw', 'toyota']
['bmw', 'mercedes', 'toyota', 'mclaren']
But, as always, there is a better way in Python!
As of Python 3.6, you can use the method fromkeys
from dict
.
It is slower than using sets to remove duplicates, but it is the best solution to drop duplicates and keep order.
It also takes only one line.
car_brands = ['bmw', 'mercedes', 'toyota', 'mclaren', 'toyota', 'bmw', 'toyota']
car_brands = list(dict.fromkeys(car_brands))
print(car_brands)
The output of the above is:
['bmw', 'mercedes', 'toyota', 'mclaren']
Since the solution with dict
is slower, only use it if order is something you really need.
I recommend you to read How to choose a Data Structure in Python to have a broad view of each one and when to use them.