How to Handle Time Zones and Timestamps Accurately with Pandas

How to Handle Time Zones and Timestamps Accurately with Pandas
Image by Author | Midjourney

Time-based data can be unique when we face different time-zones. However, interpreting timestamps can be hard because of these differences. This guide will help you manage time zones and timestamps with the Pandas library in Python.

Preparation

In this tutorial, we'll use the Pandas package. We can install the package using the following code.

pip install pandas

Now, we'll explore how to work with time-based data in Pandas with practical examples.

Handling Time Zones and Timestamps with Pandas

Time data is a unique dataset that provides a time-specific reference for events. The most accurate time data is the timestamp, which contains detailed information about time from year to millisecond.

Let's start by creating a sample dataset.

import pandas as pd    data = {      'transaction_id': [1, 2, 3],      'timestamp': ['2023-06-15 12:00:05', '2024-04-15 15:20:02', '2024-06-15 21:17:43'],      'amount': [100, 200, 150]  }    df = pd.DataFrame(data)  df['timestamp'] = pd.to_datetime(df['timestamp'])

The 'timestamp' column in the example above contains time data with second-level precision. To convert this column to a datetime format, we should use the pd.to_datetime function."

Afterward, we can make the datetime data timezone-aware. For example, we can convert the data to Coordinated Universal Time (UTC)

df['timestamp_utc'] = df['timestamp'].dt.tz_localize('UTC')  print(df)
Output>>     transaction_id           timestamp  amount             timestamp_utc  0               1 2023-06-15 12:00:05     100 2023-06-15 12:00:05+00:00  1               2 2024-04-15 15:20:02     200 2024-04-15 15:20:02+00:00  2               3 2024-06-15 21:17:43     150 2024-06-15 21:17:43+00:00

The 'timestamp_utc' values contain much information, including the time-zone. We can convert the existing time-zone to another one. For example, I used the UTC column and changed it to the Japan Timezone.

df['timestamp_japan'] = df['timestamp_utc'].dt.tz_convert('Asia/Tokyo')  print(df)
Output>>>    transaction_id           timestamp  amount             timestamp_utc    0               1 2023-06-15 12:00:05     100 2023-06-15 12:00:05+00:00     1               2 2024-04-15 15:20:02     200 2024-04-15 15:20:02+00:00     2               3 2024-06-15 21:17:43     150 2024-06-15 21:17:43+00:00                   timestamp_japan    0 2023-06-15 21:00:05+09:00    1 2024-04-16 00:20:02+09:00    2 2024-06-16 06:17:43+09:00 

We could filter the data according to a particular time-zone with this new time-zone. For example, we can filter the data using Japan time.

start_time_japan = pd.Timestamp('2024-06-15 06:00:00', tz='Asia/Tokyo')  end_time_japan = pd.Timestamp('2024-06-16 07:59:59', tz='Asia/Tokyo')    filtered_df = df[(df['timestamp_japan'] >= start_time_japan) & (df['timestamp_japan'] <= end_time_japan)]    print(filtered_df)
Output>>>    transaction_id           timestamp  amount             timestamp_utc    2               3 2024-06-15 21:17:43     150 2024-06-15 21:17:43+00:00                   timestamp_japan    2 2024-06-16 06:17:43+09:00 

Working with time-series data would allow us to perform time-series resampling. Let's look at an example of data resampling hourly for each column in our dataset.

resampled_df = df.set_index('timestamp_japan').resample('H').count()

Leverage Pandas' time-zone data and timestamps to take full advantage of its features.

Additional Resources

  • How to Identify Missing Data in Time-Series Datasets
  • Time Series Analysis: ARIMA Models in Python
  • Create a Time Series Ratio Analysis Dashboard

Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and data tips via social media and writing media. Cornellius writes on a variety of AI and machine learning topics.

More On This Topic

  • 7 Techniques to Handle Imbalanced Data
  • KDnuggets News, August 31: The Complete Data Science Study Roadmap…
  • How to Handle Missing Data with Scikit-learn's Imputer Module
  • Revamping Data Visualization: Mastering Time-Based Resampling in Pandas
  • How to Select Rows and Columns in Pandas Using [ ], .loc, iloc, .at…
  • Cutting Down Implementation Time by Integrating Jupyter and KNIME
Follow us on Twitter, Facebook
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 comments
Oldest
New Most Voted
Inline Feedbacks
View all comments

Latest stories

You might also like...