
Image by Author | Midjourney
Let me guide you on how to use the Pandas pivot_table function for your data summarization.
Preparation
Let's start with installing the necessary packages.
pip install pandas seaborn
Then, we would load the packages and the dataset example, which is Titanic.
import pandas as pd import seaborn as sns titanic = sns.load_dataset('titanic')
Let's move on to the next section after successfully installing the package and loading the dataset.
Pivot Table with Pandas
Pivot tables in Pandas allow for flexible data reorganization and analysis. Let's examine some practical applications, starting with the simple one.
pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc='mean') print(pivot)
Output>>> sex female male class First 34.611765 41.281386 Second 28.722973 30.740707 Third 21.750000 26.507589
The resulting pivot table displays average ages, with passenger classes on the vertical axis and gender categories across the top.
We can go even further with the pivot table to calculate both the mean and the sum of fares.
pivot = pd.pivot_table(titanic, values='fare', index='class', columns='sex', aggfunc=['mean', 'sum']) print(pivot)
Output>>> mean sum sex female male female male class First 106.125798 67.226127 9975.8250 8201.5875 Second 21.970121 19.741782 1669.7292 2132.1125 Third 16.118810 12.661633 2321.1086 4393.5865
We can create our function. For example, we create a function that takes the data maximum and minimum values differences and divides them by two.
def data_div_two(x): return (x.max() - x.min())/2 pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc=data_div_two) print(pivot)
Output>>> sex female male class First 30.500 39.540 Second 27.500 34.665 Third 31.125 36.790
Lastly, you can add the margins to see the differences between the overall grouping average and the specific sub-group.
pivot = pd.pivot_table(titanic, values='age', index='class', columns='sex', aggfunc='mean', margins=True) print(pivot)
Output>>> sex female male All class First 34.611765 41.281386 38.233441 Second 28.722973 30.740707 29.877630 Third 21.750000 26.507589 25.140620 All 27.915709 30.726645 29.699118
Mastering the pivot_table function would allow you to get insight from your dataset.
Additional Resources
- 7 Steps to Mastering Data Wrangling with Pandas and Python
- 10 Essential Pandas Functions Every Data Scientist Should Know
- Massaging Data Using Pandas
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.
- A Beginner's Guide to Pandas Melt Function
- Approaches to Text Summarization: An Overview
- Getting Started with Automated Text Summarization
- Summarization with GPT-3
- Text Summarization Development: A Python Tutorial with GPT-3.5
- Unlocking GPT-4 Summarization with Chain of Density Prompting