Customer Data Analysis: Uncovering Top Products, Average Revenue, and Last Transactions per Customer

Finding Top Product, Average Revenue, and Last Transaction per Customer

Overview

In this article, we will explore a problem where we need to analyze customer data to determine their top product(s), average revenue, and most recent transaction date. We will use Python with the Pandas library to handle and process the data.

Customer Data Definition

A customer is defined by their mobileNumber. The dataset contains information about each customer’s purchase history, including the item purchased, net revenue, and transaction date.

Data Analysis Requirements

The problem requires us to perform the following analysis:

  1. Find the top product(s) for each customer, where the preferred item is defined as the one bought in the greatest quantity. In case of a tie, we want to break ties based on highest revenue. If both are equal, we get all items.
  2. Calculate the average revenue per customer.
  3. Determine the most recent transaction date per customer.

Code Implementation

We will implement these requirements using Python with the Pandas library.

Step 1: Importing Libraries and Loading Data

import pandas as pd

First, we load the required libraries and define our dataset.

df = pd.DataFrame(
    {
        'mobileNumber': ['253717011202', '253717011202', '253717011202', '253717011202', '253717011202', 
                          '2530727013202', '2530727013202', '2530727013202', '12345678910', '12345678910'],
        'smartcard': ['25106064745190900', '25106064745190900', '25106064745190900', '25106064745190900', 
                      '25106064745190900', '2510606474519120', '2510606474519120', '2510606474519120', '2311607474519120', 
                      '2311607474519120'],
        'country': ['Latvia', 'Latvia', 'Latvia', 'Latvia', 'Latvia', 'Sudan', 'Sudan', 'Sudan', 'Zambia', 'Zambia'],
        'item': ['GOLD', 'SILVER', 'EXTRA', 'EXTRA', 'EXTRA', 'GOLD', 'GOLD', 'EXTRA', 'GOLD', 'EXTRA'],
        'Net Revenue': [6779.661017, 5084.745763, 2542.372881, 2542.372881, 2542.372881, 
                        2542.372881, 2542.372881, 2542.372881, 2542.372881, 2542.372881],
        'transaction date': ['26/06/2021', '17/06/2021', '09/04/2021', '10/03/2021', '18/02/2020', 
                              '09/04/2021', '10/03/2021', '18/02/2020', '09/04/2021', '10/03/2021'],
    }
)

Step 2: Function to Find Preferred Item

We create a function preferred_items that takes a group of customers as input and returns the preferred item(s) for each customer.

def preferred_items(group):
    # Count the number of occurrences of each item
    counts = group['item'].value_counts()

    # Get the most frequently purchased item(s)
    most_common_items = counts[counts == counts.max()]

    if len(most_common_items) == 1:
        # If there is only one most common item, return it
        return most_common_items.index[0]
    else:
        # Otherwise, break ties based on highest revenue
        max_revenue = group.loc[group['item'].isin(most_common_items.index), 'Net Revenue'].max()
        return ', '.join(sorted(group.loc[(group['item'].isin(most_common_items.index)) & 
                                           (group['Net Revenue'] == max_revenue), 'item'].tolist()))

Step 3: Function to Calculate Average Revenue

We create a function average_revenue that calculates the average revenue per customer.

def average_revenue(group):
    return group['Net Revenue'].mean()

Step 4: Function to Find Most Recent Transaction Date

We create a function most_recent_transaction that finds the most recent transaction date for each customer.

def most_recent_transaction(group):
    return pd.to_datetime(group['transaction date'], format='%d/%m/%Y').max()

Step 5: Grouping Customers and Calculating Analysis Results

We group customers by their mobileNumber, smartcard, and country. For each customer, we apply the functions to calculate the preferred item(s), average revenue, and most recent transaction date.

output = df.groupby(['mobileNumber', 'smartcard', 'country']).apply(
    lambda group: pd.Series(
        {
            'preferred_item': preferred_items(group),
            'average_revenue': average_revenue(group),
            'last_transaction': most_recent_transaction(group),
        }
    )
).reset_index()

Step 6: Displaying Analysis Results

Finally, we display the results in a tabular format.

print(output)

Conclusion

In this article, we have demonstrated how to analyze customer data using Python with the Pandas library. We have created functions to find the preferred item(s), average revenue, and most recent transaction date per customer, and applied these functions to group customers by their mobileNumber, smartcard, and country. The results are displayed in a tabular format for easy understanding.

Example Use Cases

This code can be used in various scenarios where customer data needs to be analyzed. For instance:

  • Customer Segmentation: By analyzing customer behavior, businesses can identify target segments for personalized marketing campaigns.
  • Recommendation Systems: The preferred item(s) function can be used to build recommendation systems that suggest products based on a customer’s purchase history and preferences.
  • Performance Analysis: The average revenue and most recent transaction date functions can help analyze sales performance over time, enabling businesses to make informed decisions about pricing strategies, inventory management, and resource allocation.

Last modified on 2024-02-25