How to Use Conditional Formatting in Pandas to Enhance Data Visualization

Conditional Formatting in Pandas
Image by Author | DALLE-3 & Canva

While pandas is mainly used for data manipulation and analysis, it can also provide basic data visualization capabilities. However, plain dataframes can make the information look cluttered and overwhelming. So, what can be done to make it better? If you've worked with Excel before, you know that you can highlight important values with different colors, font styles, etc. The idea of using these styles and colors is to communicate the information in an effective way. You can do similar work with pandas dataframes too, using conditional formatting and the Styler object.

In this article, we will see what conditional formatting is and how to use it to enhance your data readability.

Conditional Formatting

Conditional formatting is a feature in pandas that allows you to format the cells based on some criteria. You can easily highlight the outliers, visualize trends, or emphasize important data points using it. The Styler object in pandas provides a convenient way to apply conditional formatting. Before covering the examples, let's take a quick look at how the Styler object works.

What is the Styler Object & How Does It Work?

You can control the visual representation of the dataframe by using the property. This property returns a Styler object, which is responsible for styling the dataframe. The Styler object allows you to manipulate the CSS properties of the dataframe to create a visually appealing and informative display. The generic syntax is as follows:

df.style.<method>(<arguments>)

Where <method> is the specific formatting function you want to apply, and <arguments> are the parameters required by that function. The Styler object returns the formatted dataframe without changing the original one. There are two approaches to using conditional formatting with the Styler object:

Built-in Styles: To apply quick formatting styles to your dataframe
Custom Stylization: Create your own formatting rules for the Styler object and pass them through one of the following methods (Styler.applymap: element-wise or Styler.apply: column-/row-/table-wise)

Now, we will cover some examples of both approaches to help you enhance the visualization of your data.

Examples: Built-in-Styles

Let’s create a dummy stock price dataset with columns for Date, Cost Price, Satisfaction Score, and Sales Amount to demonstrate the examples below:

import pandas as pd  import numpy as np    data = {'Date': ['2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08', '2024-03-09', '2024-03-10'],          'Cost Price': [100, 120, 110, 1500, 1600, 1550],          'Satisfaction Score': [90, 80, 70, 95, 85, 75],          'Sales Amount': [1000, 800, 1200, 900, 1100, None]}    df = pd.DataFrame(data)  df

Output:

Original Unformatted Dataframe

1. Highlighting Maximum and Minimum Values

We can use highlight_max and highlight_min functions to highlight the maximum and minimum values in a column or row. For column set axis=0 like this:

# Highlighting Maximum and Minimum Values  df.style.highlight_max(color='green', axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount']).highlight_min(color='red', axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount'])

Output:

Max & Min Values

2. Applying Color Gradients

Color gradients are an effective way to visualize the values in your data. In this case, we will apply the gradient to satisfaction scores using the colormap set to 'viridis'. This is a type of color coding that ranges from purple (low values) to yellow (high values). Here is how you can do this:

# Applying Color Gradients  df.style.background_gradient(cmap='viridis', subset=['Satisfaction Score'])

Output:

Colormap - viridis
Colormap – viridis

3. Highlighting Null or Missing Values

When we have large datasets, it becomes difficult to identify null or missing values. You can use conditional formatting using the built-in df.style.highlight_null function for this purpose. For example, in this case, the sales amount of the 6th entry is missing. You can highlight this information like this:

# Highlighting Null or Missing Values  df.style.highlight_null('yellow', subset=['Sales Amount'])

Output:

Highlighting Missing Values

Examples: Custom Stylization Using `apply()` & `applymap()`

1. Conditional Formatting for Outliers

Suppose that we have a housing dataset with their prices, and we want to highlight the houses with outlier prices (i.e., prices that are significantly higher or lower than the other neighborhoods). This can be done as follows:

import pandas as pd  import numpy as np    # House prices dataset  df = pd.DataFrame({     'Neighborhood': ['H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7'],     'Price': [50, 300, 360, 390, 420, 450, 1000],  })    # Calculate Q1 (25th percentile), Q3 (75th percentile) and Interquartile Range (IQR)  q1 = df['Price'].quantile(0.25)  q3 = df['Price'].quantile(0.75)  iqr = q3 - q1    # Bounds for outliers  lower_bound = q1 - 1.5 * iqr  upper_bound = q3 + 1.5 * iqr    # Custom function to highlight outliers  def highlight_outliers(val):     if val < lower_bound or val > upper_bound:        return 'background-color: yellow; font-weight: bold; color: black'     else:        return ''    df.style.applymap(highlight_outliers, subset=['Price'])

Output:

Highlighting Outliers

2. Highlighting Trends

Consider that you run a company and are recording your sales daily. To analyze the trends, you want to highlight the days when your daily sales increase by 5% or more. You can achieve this using a custom function and the apply method in pandas. Here’s how:

import pandas as pd    # Dataset of Company's Sales  data = {'date': ['2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13', '2024-02-14'],          'sales': [100, 105, 110, 115, 125]}    df = pd.DataFrame(data)    # Daily percentage change  df['pct_change'] = df['sales'].pct_change() * 100    # Highlight the day if sales increased by more than 5%  def highlight_trend(row):      return ['background-color: green; border: 2px solid black; font-weight: bold' if row['pct_change'] > 5 else '' for _ in row]    df.style.apply(highlight_trend, axis=1)

Output:

Highlight ></noscript>5% Increase in Sales
Highlight >5% Increase in Sales

3. Highlighting Correlated Columns

Correlated columns are important because they show relationships between different variables. For example, if we have a dataset containing age, income, and spending habits and our analysis shows a high correlation (close to 1) between age and income, then it suggests that older people generally have higher incomes. Highlighting correlated columns helps to visually identify these relationships. This approach becomes extremely helpful as the dimensionality of your data increases. Let's explore an example to better understand this concept:

import pandas as pd    # Dataset of people  data = {      'age': [30, 35, 40, 45, 50],      'income': [60000, 66000, 70000, 75000, 100000],      'spending': [10000, 15000, 20000, 18000, 12000]  }    df = pd.DataFrame(data)    # Calculate the correlation matrix  corr_matrix = df.corr()    # Highlight highly correlated columns  def highlight_corr(val):      if val != 1.0 and abs(val) > 0.5:   # Exclude self-correlation          return 'background-color: blue; text-decoration: underline'      else:          return ''    corr_matrix.style.applymap(highlight_corr)

Output:

Correlated Columns

Wrapping Up

These are just some of the examples I showed as a starter to up your game of data visualization. You can apply similar techniques to various other problems to enhance the data visualization, such as highlighting duplicate rows, grouping into categories and selecting different formatting for each category, or highlighting peak values. Additionally, there are many other CSS options you can explore in the official documentation. You can even define different properties on hover, like magnifying text or changing color. Check out the "Fun Stuff" section for more cool ideas. This article is part of my Pandas series, so if you enjoyed this, there's plenty more to explore. Head over to my author page for more tips, tricks, and tutorials.

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook "Maximizing Productivity with ChatGPT". As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She's also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

How to Use Conditional Formatting in Pandas to Enhance Data Visualization

Conditional Formatting

What is the Styler Object & How Does It Work?

Examples: Built-in-Styles

1. Highlighting Maximum and Minimum Values

2. Applying Color Gradients

3. Highlighting Null or Missing Values

Examples: Custom Stylization Using `apply()` & `applymap()`

1. Conditional Formatting for Outliers

2. Highlighting Trends

3. Highlighting Correlated Columns

Wrapping Up

More On This Topic

Berkeley Lab: New AI Sensor ‘Sniffs’ Out Spectral Targets

From Prediction to Production: MIT’s AI System Helps Synthesize New Materials

Latest stories

Berkeley Lab: New AI Sensor ‘Sniffs’ Out Spectral Targets

From Prediction to Production: MIT’s AI System Helps Synthesize New...

PNNL: Earthquake Data Provide Solid Footing for AI Foundation Science...

Cadence Introduces Agentic AI System for Chip Design and Verification

AI: The New Apex Predator of Technology

You might also like...

Berkeley Lab: New AI Sensor ‘Sniffs’ Out Spectral Targets

From Prediction to Production: MIT’s AI System Helps Synthesize New Materials

PNNL: Earthquake Data Provide Solid Footing for AI Foundation Science Model

How to Use Conditional Formatting in Pandas to Enhance Data Visualization

Conditional Formatting

What is the Styler Object & How Does It Work?

Examples: Built-in-Styles

1. Highlighting Maximum and Minimum Values

2. Applying Color Gradients

3. Highlighting Null or Missing Values

Examples: Custom Stylization Using apply() & applymap()

1. Conditional Formatting for Outliers

2. Highlighting Trends

3. Highlighting Correlated Columns

Wrapping Up

More On This Topic

Latest stories

You might also like...

Examples: Custom Stylization Using `apply()` & `applymap()`