How to Use Conditional Formatting in Pandas to Enhance Data Visualization

Conditional Formatting in Pandas
Image by Author | DALLE-3 & Canva

While pandas is mainly used for data manipulation and analysis, it can also provide basic data visualization capabilities. However, plain dataframes can make the information look cluttered and overwhelming. So, what can be done to make it better? If you've worked with Excel before, you know that you can highlight important values with different colors, font styles, etc. The idea of using these styles and colors is to communicate the information in an effective way. You can do similar work with pandas dataframes too, using conditional formatting and the Styler object.

In this article, we will see what conditional formatting is and how to use it to enhance your data readability.

Conditional Formatting

Conditional formatting is a feature in pandas that allows you to format the cells based on some criteria. You can easily highlight the outliers, visualize trends, or emphasize important data points using it. The Styler object in pandas provides a convenient way to apply conditional formatting. Before covering the examples, let's take a quick look at how the Styler object works.

What is the Styler Object & How Does It Work?

You can control the visual representation of the dataframe by using the property. This property returns a Styler object, which is responsible for styling the dataframe. The Styler object allows you to manipulate the CSS properties of the dataframe to create a visually appealing and informative display. The generic syntax is as follows:

df.style.<method>(<arguments>)

Where <method> is the specific formatting function you want to apply, and <arguments> are the parameters required by that function. The Styler object returns the formatted dataframe without changing the original one. There are two approaches to using conditional formatting with the Styler object:

  • Built-in Styles: To apply quick formatting styles to your dataframe
  • Custom Stylization: Create your own formatting rules for the Styler object and pass them through one of the following methods (Styler.applymap: element-wise or Styler.apply: column-/row-/table-wise)

Now, we will cover some examples of both approaches to help you enhance the visualization of your data.

Examples: Built-in-Styles

Let’s create a dummy stock price dataset with columns for Date, Cost Price, Satisfaction Score, and Sales Amount to demonstrate the examples below:

import pandas as pd  import numpy as np    data = {'Date': ['2024-03-05', '2024-03-06', '2024-03-07', '2024-03-08', '2024-03-09', '2024-03-10'],          'Cost Price': [100, 120, 110, 1500, 1600, 1550],          'Satisfaction Score': [90, 80, 70, 95, 85, 75],          'Sales Amount': [1000, 800, 1200, 900, 1100, None]}    df = pd.DataFrame(data)  df

Output:

Unformatted Dataframe
Original Unformatted Dataframe

1. Highlighting Maximum and Minimum Values

We can use highlight_max and highlight_min functions to highlight the maximum and minimum values in a column or row. For column set axis=0 like this:

# Highlighting Maximum and Minimum Values  df.style.highlight_max(color='green', axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount']).highlight_min(color='red', axis=0 , subset=['Cost Price', 'Satisfaction Score', 'Sales Amount'])

Output:

Max & Min Values
Max & Min Values

2. Applying Color Gradients

Color gradients are an effective way to visualize the values in your data. In this case, we will apply the gradient to satisfaction scores using the colormap set to 'viridis'. This is a type of color coding that ranges from purple (low values) to yellow (high values). Here is how you can do this:

# Applying Color Gradients  df.style.background_gradient(cmap='viridis', subset=['Satisfaction Score'])

Output:

Colormap - viridis
Colormap – viridis

3. Highlighting Null or Missing Values

When we have large datasets, it becomes difficult to identify null or missing values. You can use conditional formatting using the built-in df.style.highlight_null function for this purpose. For example, in this case, the sales amount of the 6th entry is missing. You can highlight this information like this:

# Highlighting Null or Missing Values  df.style.highlight_null('yellow', subset=['Sales Amount'])

Output:

Highlighting Missing Values
Highlighting Missing Values

Examples: Custom Stylization Using apply() & applymap()

1. Conditional Formatting for Outliers

Suppose that we have a housing dataset with their prices, and we want to highlight the houses with outlier prices (i.e., prices that are significantly higher or lower than the other neighborhoods). This can be done as follows:

import pandas as pd  import numpy as np    # House prices dataset  df = pd.DataFrame({     'Neighborhood': ['H1', 'H2', 'H3', 'H4', 'H5', 'H6', 'H7'],     'Price': [50, 300, 360, 390, 420, 450, 1000],  })    # Calculate Q1 (25th percentile), Q3 (75th percentile) and Interquartile Range (IQR)  q1 = df['Price'].quantile(0.25)  q3 = df['Price'].quantile(0.75)  iqr = q3 - q1    # Bounds for outliers  lower_bound = q1 - 1.5 * iqr  upper_bound = q3 + 1.5 * iqr    # Custom function to highlight outliers  def highlight_outliers(val):     if val < lower_bound or val > upper_bound:        return 'background-color: yellow; font-weight: bold; color: black'     else:        return ''    df.style.applymap(highlight_outliers, subset=['Price'])  

Output:

Highlighting Outliers
Highlighting Outliers

2. Highlighting Trends

Consider that you run a company and are recording your sales daily. To analyze the trends, you want to highlight the days when your daily sales increase by 5% or more. You can achieve this using a custom function and the apply method in pandas. Here’s how:

import pandas as pd    # Dataset of Company's Sales  data = {'date': ['2024-02-10', '2024-02-11', '2024-02-12', '2024-02-13', '2024-02-14'],          'sales': [100, 105, 110, 115, 125]}    df = pd.DataFrame(data)    # Daily percentage change  df['pct_change'] = df['sales'].pct_change() * 100    # Highlight the day if sales increased by more than 5%  def highlight_trend(row):      return ['background-color: green; border: 2px solid black; font-weight: bold' if row['pct_change'] > 5 else '' for _ in row]    df.style.apply(highlight_trend, axis=1)

Output:

Highlight ></noscript>5% Increase in Sales
Highlight >5% Increase in Sales

3. Highlighting Correlated Columns

Correlated columns are important because they show relationships between different variables. For example, if we have a dataset containing age, income, and spending habits and our analysis shows a high correlation (close to 1) between age and income, then it suggests that older people generally have higher incomes. Highlighting correlated columns helps to visually identify these relationships. This approach becomes extremely helpful as the dimensionality of your data increases. Let's explore an example to better understand this concept:

import pandas as pd    # Dataset of people  data = {      'age': [30, 35, 40, 45, 50],      'income': [60000, 66000, 70000, 75000, 100000],      'spending': [10000, 15000, 20000, 18000, 12000]  }    df = pd.DataFrame(data)    # Calculate the correlation matrix  corr_matrix = df.corr()    # Highlight highly correlated columns  def highlight_corr(val):      if val != 1.0 and abs(val) > 0.5:   # Exclude self-correlation          return 'background-color: blue; text-decoration: underline'      else:          return ''    corr_matrix.style.applymap(highlight_corr)

Output:

Correlated Columns
Correlated Columns

Wrapping Up

These are just some of the examples I showed as a starter to up your game of data visualization. You can apply similar techniques to various other problems to enhance the data visualization, such as highlighting duplicate rows, grouping into categories and selecting different formatting for each category, or highlighting peak values. Additionally, there are many other CSS options you can explore in the official documentation. You can even define different properties on hover, like magnifying text or changing color. Check out the "Fun Stuff" section for more cool ideas. This article is part of my Pandas series, so if you enjoyed this, there's plenty more to explore. Head over to my author page for more tips, tricks, and tutorials.

Kanwal Mehreen Kanwal is a machine learning engineer and a technical writer with a profound passion for data science and the intersection of AI with medicine. She co-authored the ebook "Maximizing Productivity with ChatGPT". As a Google Generation Scholar 2022 for APAC, she champions diversity and academic excellence. She's also recognized as a Teradata Diversity in Tech Scholar, Mitacs Globalink Research Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having founded FEMCodes to empower women in STEM fields.

More On This Topic

  • Five Ways to do Conditional Filtering in Pandas
  • How to Use Data Visualization to Add Impact to Your Work Reports…
  • Revamping Data Visualization: Mastering Time-Based Resampling in Pandas
  • 7 Pandas Plotting Functions for Quick Data Visualization
  • How to Use the pivot_table Function for Advanced Data Summarization…
  • How to Build Data Frameworks with Open Source Tools to Enhance…
Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...