Exception Handling Data in Pandas Apply: A Deep Dive
In this article, we will explore the concept of exception handling when working with date formats using the Pandas library. We will delve into how to handle errors and exceptions that occur during data cleaning and processing.
Introduction
When working with date formats, it is common to encounter invalid or malformed dates that can cause errors in our code. In this article, we will discuss how to exception handle data in Pandas apply, a powerful feature used for data manipulation and analysis.
Understanding the Problem
The problem at hand is to clean the date field in a list of dictionaries using regular expressions to convert it into a standardized date format. However, there are rows that contain invalid or missing dates, causing errors when trying to process them.
Solution Overview
To solve this problem, we will create a function called clean_data that takes a list of dictionaries as input and returns two outputs: the cleaned list and an error list. We will use exception handling to catch any errors that occur during data cleaning and processing.
The conv_date Function
Before we dive into the solution, let’s take a look at the conv_date function, which is used to convert dates into a standardized format:
from datetime import datetime, date
import re
def conv_date(dte: str) -> date:
acceptable_mappings = {
"\d{4}-\d{2}-\d{2}": "%Y-%m-%d",
"\d{2}-\d{2}-\d{4}": "%d-%m-%Y",
"\d{4}/\d{2}/\d{2}": "%Y/%m/%d",
"\d{2}/\d{2}/\d{4}": "%d/%m/%Y",
"\d{8}": "%d%m%Y",
"\d{2}\s\d{2}\s\d{4}": "%d %m %Y",
"\d{4}-\d{2}-\d{2}\s\d{2}\:\d{2}\:\d{2}": "%Y-%m-%d %H:%M:%S",
"\d{4}-\d{2}-\d{2}\s\d{2}\D{1}\d{2}\D{1}\d{2}\s\w{3}": "%Y-%m-%d %H:%M:%S %Z",
}
for regex in acceptable_mappings.keys():
if re.fullmatch(regex, dte):
return datetime.strptime(dte, acceptable_mappings[regex]).date()
raise Exception(
f"Expected date is not in one of the supported formats, got ***{dte}***"
)
The clean_data Function
Now that we have the conv_date function, let’s move on to the clean_data function, which takes a list of dictionaries as input and returns two outputs: the cleaned list and an error list:
def clean_data(input_list: list) -> set[list, list]:
err_list = []
for item in input_list:
try:
item["date"] = conv_date(item["date"])
except Exception as e:
print(e)
input_list.remove(item)
err_list.append(item)
return input_list, err_list
Example Usage
Here’s an example of how to use the clean_data function:
from datetime import datetime, date
import pandas as pd
def conv_date(dte: str) -> date:
# ... (same implementation as above)
def clean_data(input_list: list) -> set[list, list]:
# ... (same implementation as above)
# Test data
input_list = [
{"name": "dz", "role": "legend", "date": "2023-07-26"},
{"name": "mc", "role": "sounds like a dj", "date": "26-07-2023"},
{"name": "xc", "role": "loves xcom", "date": "2023/07/26"},
{"name": "lz", "role": "likes to fly", "date": "26/07/2023"},
{"name": "wc", "role": "has a small bladder", "date": "26072023"},
{"name": "aa", "role": "warrior of the crystal", "date": "26 07 2023"},
{"name": "xx", "role": "loves only-fans", "date": "2023-07-26 12:46:21"},
{
"name": "jm",
"role": "is stack overflow",
"date": "2023-10-26 12:46:21 UTC",
},
{"name": "ee", "role": "enjoys nan bread", "date": None},
]
cleaned_list, err_list = clean_data(input_list)
print("Cleaned List:")
for item in cleaned_list:
print(item)
print("\nError List:")
for item in err_list:
print(item)
Conclusion
In this article, we have discussed the concept of exception handling when working with date formats using Pandas apply. We have created a function called clean_data that takes a list of dictionaries as input and returns two outputs: the cleaned list and an error list. This function uses exception handling to catch any errors that occur during data cleaning and processing.
By following the example code provided in this article, you should be able to create your own function for exception handling when working with date formats using Pandas apply.
Last modified on 2024-04-26