How to Use the pivot_table Function for Advanced Data Summarization in Pandas

How to Use the pivot_table Function for Advanced Data Summarization in Pandas
Image by Author | Midjourney

Let me guide you on how to use the Pandas pivot_table function for your data summarization.

Preparation

Let's start with installing the necessary packages.

pip install pandas seaborn

Then, we would load the packages and the dataset example, which is Titanic.

import pandas as pd  import seaborn as sns    titanic = sns.load_dataset('titanic')

Let's move on to the next section after successfully installing the package and loading the dataset.

Pivot Table with Pandas

Pivot tables in Pandas allow for flexible data reorganization and analysis. Let's examine some practical applications, starting with the simple one.

pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc='mean')  print(pivot)
Output>>>  sex        female       male  class                         First   34.611765  41.281386  Second  28.722973  30.740707  Third   21.750000  26.507589

The resulting pivot table displays average ages, with passenger classes on the vertical axis and gender categories across the top.

We can go even further with the pivot table to calculate both the mean and the sum of fares.

pivot = pd.pivot_table(titanic, values='fare', index='class', columns='sex', aggfunc=['mean', 'sum'])  print(pivot)
Output>>>               mean                   sum             sex         female       male     female       male  class                                                First   106.125798  67.226127  9975.8250  8201.5875  Second   21.970121  19.741782  1669.7292  2132.1125  Third    16.118810  12.661633  2321.1086  4393.5865

We can create our function. For example, we create a function that takes the data maximum and minimum values differences and divides them by two.

def data_div_two(x):      return (x.max() - x.min())/2    pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc=data_div_two)  print(pivot)
Output>>>  sex     female    male  class                   First   30.500  39.540  Second  27.500  34.665  Third   31.125  36.790

Lastly, you can add the margins to see the differences between the overall grouping average and the specific sub-group.

pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc='mean', margins=True)  print(pivot)
Output>>>  sex        female       male        All  class                                    First   34.611765  41.281386  38.233441  Second  28.722973  30.740707  29.877630  Third   21.750000  26.507589  25.140620  All     27.915709  30.726645  29.699118

Mastering the pivot_table function would allow you to get insight from your dataset.

Additional Resources

  • 7 Steps to Mastering Data Wrangling with Pandas and Python
  • 10 Essential Pandas Functions Every Data Scientist Should Know
  • Massaging Data Using Pandas

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

More On This Topic

  • A Beginner's Guide to Pandas Melt Function
  • Approaches to Text Summarization: An Overview
  • Getting Started with Automated Text Summarization
  • Summarization with GPT-3
  • Text Summarization Development: A Python Tutorial with GPT-3.5
  • Unlocking GPT-4 Summarization with Chain of Density Prompting
Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...