Introduction to Python Data Structures
When it comes to learning how to program, regardless of the particular programming language you use for this task, you find that there are a few major topics of your newly-chosen discipline that into which most of what you are being exposed to could be categorized. A few of these, in general order of grokking, are: syntax (the vocabulary of the language); commands (putting the vocabulary together into useful ways); flow control (how we guide the order of command execution); algorithms (the steps we take to solve specific problems… how did this become such a confounding word?); and, finally, data structures (the virtual storage depots that we use for data manipulation during the execution of algorithms (which are, again… a series of steps).
Essentially, if you want to implement the solution to a problem, by cobbling together a series of commands into the steps of an algorithm, at some point data will need to be processed, and data structures will become essential. Such data structures provide a way to organize and store data efficiently, and are critical for creating fast, modular code that can perform useful functions and scale well. Python, a particular programming language, has a series of built-in data structures of its own.
This tutorial will focus on these four foundational Python data structures:
- Lists – Ordered, mutable, allows duplicate elements. Useful for storing sequences of data.
- Tuples – Ordered, immutable, allows duplicate elements. Think of them as immutable lists.
- Dictionaries – Unordered, mutable, mapped by key-value pairs. Useful for storing data in a key-value format.
- Sets – Unordered, mutable, contains unique elements. Useful for membership testing and eliminating duplicates.
Beyond the fundamental data structures, Python also provides more advanced structures, such as heaps, queues, and linked lists, which can further enhance your coding prowess. These advanced structures, built upon the foundational ones, enable more complex data handling and are often used in specialized scenarios. But you aren't constrained here; you can use all of the existing structures as a base to implement your own structures as well. However, the understanding of lists, tuples, dictionaries, and sets remains paramount, as these are the building blocks for more advanced data structures.
This guide aims to provide a clear and concise understanding of these core structures. As you start your Python journey, the following sections will guide you through the essential concepts and practical applications. From creating and manipulating lists to leveraging the unique capabilities of sets, this tutorial will equip you with the skills needed to excel in your coding.
Step 1: Using Lists in Python
What is a List in Python?
A list in Python is an ordered, mutable data type that can store various objects, allowing for duplicate elements. Lists are defined by the use of square brackets [ ]
, with elements being separated by commas.
For example:
fibs = [0, 1, 1, 2, 3, 5, 8, 13, 21]
Lists are incredibly useful for organizing and storing data sequences.
Creating a List
Lists can contain different data types, like strings, integers, booleans, etc. For example:
mixed_list = [42, "Hello World!", False, 3.14159]
Manipulating a List
Elements in a list can be accessed, added, changed, and removed. For example:
# Access 2nd element (indexing begins at '0') print(mixed_list[1]) # Append element mixed_list.append("This is new") # Change element mixed_list[0] = 5 # Remove last element mixed_list.pop(0)
Useful List Methods
Some handy built-in methods for lists include:
sort()
– Sorts list in-placeappend()
– Adds element to end of listinsert()
– Inserts element at indexpop()
– Removes element at indexremove()
– Removes first occurrence of valuereverse()
– Reverses list in-place
Hands-on Example with Lists
# Create shopping cart as a list cart = ["apples", "oranges", "grapes"] # Sort the list cart.sort() # Add new item cart.append("blueberries") # Remove first item cart.pop(0) print(cart)
Output:
['grapes', 'oranges', 'blueberries']
Step 2: Understanding Tuples in Python
What Are Tuples?
Tuples are another type of sequence data type in Python, similar to lists. However, unlike lists, tuples are immutable, meaning their elements cannot be altered once created. They are defined by enclosing elements in parentheses ( )
.
# Defining a tuple my_tuple = (1, 2, 3, 4)
When to Use Tuples
Tuples are generally used for collections of items that should not be modified. Tuples are faster than lists, which makes them great for read-only operations. Some common use-cases include:
- Storing constants or configuration data
- Function return values with multiple components
- Dictionary keys, since they are hashable
Accessing Tuple Elements
Accessing elements in a tuple is done in a similar manner as accessing list elements. Indexing and slicing work the same way.
# Accessing elements first_element = my_tuple[0] sliced_tuple = my_tuple[1:3]
Operations on Tuples
Because tuples are immutable, many list operations like append()
or remove()
are not applicable. However, you can still perform some operations:
- Concatenation: Combine tuples using the
+
operator.
concatenated_tuple = my_tuple + (5, 6)
- Repetition: Repeat a tuple using the
*
operator.
repeated_tuple = my_tuple * 2
- Membership: Check if an element exists in a tuple with the
in
keyword.
exists = 1 in my_tuple
Tuple Methods
Tuples have fewer built-in methods compared to lists, given their immutable nature. Some useful methods include:
count()
: Count the occurrences of a particular element.
count_of_ones = my_tuple.count(1)
index()
: Find the index of the first occurrence of a value.
index_of_first_one = my_tuple.index(1)
Tuple Packing and Unpacking
Tuple packing and unpacking are convenient features in Python:
- Packing: Assigning multiple values to a single tuple.
packed_tuple = 1, 2, 3
- Unpacking: Assigning tuple elements to multiple variables.
a, b, c = packed_tuple
Immutable but Not Strictly
While tuples themselves are immutable, they can contain mutable elements like lists.
# Tuple with mutable list complex_tuple = (1, 2, [3, 4])
Note that while you can't change the tuple itself, you can modify the mutable elements within it.
Step 3: Mastering Dictionaries in Python
What is a Dictionary in Python?
A dictionary in Python is an unordered, mutable data type that stores mappings of unique keys to values. Dictionaries are written with curly braces { }
and consist of key-value pairs separated by commas.
For example:
student = {"name": "Michael", "age": 22, "city": "Chicago"}
Dictionaries are useful for storing data in a structured manner and accessing values by keys.
Creating a Dictionary
Dictionary keys must be immutable objects like strings, numbers, or tuples. Dictionary values can be any object.
student = {"name": "Susan", "age": 23} prices = {"milk": 4.99, "bread": 2.89}
Manipulating a Dictionary
Elements can be accessed, added, changed, and removed via keys.
# Access value by key print(student["name"]) # Add new key-value student["major"] = "computer science" # Change value student["age"] = 25 # Remove key-value del student["city"]
Useful Dictionary Methods
Some useful built-in methods include:
keys()
– Returns list of keysvalues()
– Returns list of valuesitems()
– Returns (key, value) tuplesget()
– Returns value for key, avoids KeyErrorpop()
– Removes key and returns valueupdate()
– Adds multiple key-values
Hands-on Example with Dictionaries
scores = {"Francis": 95, "John": 88, "Daniel": 82} # Add new score scores["Zoey"] = 97 # Remove John's score scores.pop("John") # Get Daniel's score print(scores.get("Daniel")) # Print all student names print(scores.keys())
Step 4: Exploring Sets in Python
What is a Set in Python?
A set in Python is an unordered, mutable collection of unique, immutable objects. Sets are written with curly braces { }
but unlike dictionaries, do not have key-value pairs.
For example:
numbers = {1, 2, 3, 4}
Sets are useful for membership testing, eliminating duplicates, and mathematical operations.
Creating a Set
Sets can be created from lists by passing it to the set()
constructor:
my_list = [1, 2, 3, 3, 4] my_set = set(my_list) # {1, 2, 3, 4}
Sets can contain mixed data types like strings, booleans, etc.
Manipulating a Set
Elements can be added and removed from sets.
numbers.add(5) numbers.remove(1)
Useful Set Operations
Some useful set operations include:
union()
– Returns union of two setsintersection()
– Returns intersection of setsdifference()
– Returns difference between setssymmetric_difference()
– Returns symmetric difference
Hands-on Example with Sets
A = {1, 2, 3, 4} B = {2, 3, 5, 6} # Union - combines sets print(A | B) # Intersection print(A & B) # Difference print(A - B) # Symmetric difference print(A ^ B)
Step 5: Comparing Lists, Dictionaries, and Sets
Comparison of Characteristics
The following is a concise comparison of the four Python data structures we referred to in this tutorial.
Structure | Ordered | Mutable | Duplicate Elements | Use Cases |
---|---|---|---|---|
List | Yes | Yes | Yes | Storing sequences |
Tuple | Yes | No | Yes | Storing immutable sequences |
Dictionary | No | Yes | Keys: No Values: Yes |
Storing key-value pairs |
Set | No | Yes | No | Eliminating duplicates, membership testing |
When to Use Each Data Structure
Treat this as a soft guideline for which structure to turn to first in a particular situation.
- Use lists for ordered, sequence-based data. Useful for stacks/queues.
- Use tuples for ordered, immutable sequences. Useful when you need a fixed collection of elements that should not be changed.
- Use dictionaries for key-value data. Useful for storing related properties.
- Use sets for storing unique elements and mathematical operations.
Hands-on Example Using All Four Data Structures
Let's have a look at how these structures can all work together in an example that is a little more complex than a one liner.
# Make a list of person names names = ["John", "Mary", "Bob", "Mary", "Sarah"] # Make a tuple of additional information (e.g., email) additional_info = ("john@example.com", "mary@example.com", "bob@example.com", "mary@example.com", "sarah@example.com") # Make set to remove duplicates unique_names = set(names) # Make dictionary of name-age pairs persons = {} for name in unique_names: persons[name] = random.randint(20,40) print(persons)
Output:
{'John': 34, 'Bob': 29, 'Sarah': 25, 'Mary': 21}
This example utilizes a list for an ordered sequence, a tuple for storing additional immutable information, a set to remove duplicates, and a dictionary to store key-value pairs.
Moving Forward
In this comprehensive tutorial, we've taken a deep look at the foundational data structures in Python, including lists, tuples, dictionaries, and sets. These structures form the building blocks of Python programming, providing a framework for data storage, processing, and manipulation. Understanding these structures is essential for writing efficient and scalable code. From manipulating sequences with lists, to organizing data with key-value pairs in dictionaries, and ensuring uniqueness with sets, these essential tools offer immense flexibility in data handling.
As we've seen through code examples, these data structures can be combined in various ways to solve complex problems. By leveraging these data structures, you can open the doors to a wide range of possibilities in data analysis, machine learning, and beyond. Don't hesitate to explore the official Python data structures documentation for more insights.
Happy coding!
Matthew Mayo (@mattmayo13) holds a Master's degree in computer science and a graduate diploma in data mining. As Editor-in-Chief of KDnuggets, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.
- Getting Started with Python for Data Science
- Getting Started with Python Generators
- Getting Started Cleaning Data
- Getting Started with 5 Essential Natural Language Processing Libraries
- Getting Started with Distributed Machine Learning with PyTorch and Ray
- Getting Started with Reinforcement Learning