How to Use MultiIndex for Hierarchical Data Organization in Pandas

How to Use MultiIndex for Hierarchical Data Organization in Pandas
Image by Editor | Midjourney & Canva

Let’s learn how to use MultiIndex in Pandas for hierarchical data.

Preparation

We would need the Pandas package to ensure it is installed. You can install them using the following code:

pip install pandas

Then, let’s learn how to handle MultiIndex data in the Pandas.

Using MultiIndex in Pandas

MultiIndex in Pandas refers to indexing multiple levels on the DataFrame or Series. The process is helpful if we work with higher-dimensional data in a 2D tabular structure. With MultiIndex, we can index data with multiple keys and organize them better. Let’s use a dataset example to understand them better.

import pandas as pd    index = pd.MultiIndex.from_tuples(      [('A', 1), ('A', 2), ('B', 1), ('B', 2)],      names=['Category', 'Number']  )    df = pd.DataFrame({      'Value': [10, 20, 30, 40]  }, index=index)    print(df)

The output:

                Value  Category Number         A        1          10           2          20  B        1          30           2          40

As you can see, the DataFrame above has a two-level Index with the Category and Number as their index.

It’s also possible to set the MultiIndex with the existing columns in our DataFrame.

data = {      'Category': ['A', 'A', 'B', 'B'],      'Number': [1, 2, 1, 2],      'Value': [10, 20, 30, 40]  }  df = pd.DataFrame(data)  df.set_index(['Category', 'Number'], inplace=True)    print(df)

The output:

                Value  Category Number         A        1          10           2          20  B        1          30           2          40

Even with different methods, we have similar results. That’s how we can have the MultiIndex in our DataFrame.

If you already have the MultiIndex DataFrame, it’s possible to swap the level with the following code.

print(df.swaplevel())

The output:

                Value  Number Category         1      A            10  2      A            20  1      B            30  2      B            40

Of course, we can return the MultiIndex to columns with the following code:

print(df.reset_index())

The output:

 Category  Number  Value  0        A       1     10  1        A       2     20  2        B       1     30  3        B       2     40

So, how to access MultiIndex data in Pandas DataFrame? We can use the .loc method for that. For example, we access the first level of the MultiIndex DataFrame.

print(df.loc['A']) 

The output:

       Value  Number         1          10  2          20

We can access the data value as well with Tuple.

print(df.loc[('A', 1)])

The output:

Value    10  Name: (A, 1), dtype: int64

Lastly, we can perform statistical aggregation with MultiIndex using the .groupby method.

print(df.groupby(level=['Category']).sum())

The output:

         Value  Category         A            30  B            70

Mastering the MultiIndex in Pandas would allow you to gain insight into hierarchal data.

Additional Resources

  • Pandas: How to Remove MultiIndex in Pivot Table
  • 7 Steps to Mastering Data Cleaning with Python and Pandas
  • How to Index, Slice and Reshape NumPy Arrays for Machine Learning

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

More On This Topic

  • A new book that will revolutionize the way your organization…
  • The Role of the MLOps Engineer in an Organization
  • Unveiling Hidden Patterns: An Introduction to Hierarchical Clustering
  • How to Use Conditional Formatting in Pandas to Enhance Data Visualization
  • How to Use the pivot_table Function for Advanced Data Summarization…
  • How to Effectively Use Pandas GroupBy
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments