Replacing NaN Values in Pandas DataFrame Based on Another DataFrame

Replacing Dataframe Cells with NaN Based on Indexes and Columns of Another DataFrame

In this article, we will explore how to replace cells in a Pandas dataframe with NaN values based on the indexes and columns of another dataframe. We will use the DataFrame.mask method to achieve this.

Introduction

When working with dataframes, it’s often necessary to manipulate or transform data in various ways. One common operation is replacing missing values (NaN) with new values. In this case, we want to replace cells in one dataframe based on the indexes and columns of another dataframe.

Problem Statement

Let’s consider an example using two dataframes, df1 and df2, as shown below:

import numpy as np
import pandas as pd

# Create df1 with NaN values
df1 = pd.DataFrame([
                    [1,np.NaN,np.NaN,np.NaN,np.NaN,np.NaN],
                    [np.NaN,2,np.NaN,np.NaN,np.NaN,np.NaN],
                    [np.NaN,np.NaN,3,np.NaN,np.NaN,np.NaN],
                    [np.NaN,np.NaN,np.NaN,4,np.NaN,np.NaN],
                    [np.NaN,np.NaN,np.NaN,np.NaN,5,np.NaN],
                    [np.NaN,np.NaN,np.NaN,np.NaN,np.NaN,6],
                    ], columns=['AA','BB','CC','DD', 'EE', 'FF'])

# Create df2 with numeric values
df2 = pd.DataFrame([[100, 200, 300, 400, 500, 600],
                    [110, 210, 310, 410, 510, 610],
                    [120, 220, 320, 420, 520, 620],
                    [130, 230, 330, 430, 530, 630],
                    [140, 240, 340, 440, 540, 640],
                    [150, 250, 350, 450, 550, 650]
                    ], columns=['AA', 'BB', 'CC', 'DD', 'EE', 'FF'])

We want to create a new dataframe, df3, that keeps the values of df2 and replaces the cells in df1 with NaN values based on the indexes and columns of df2.

Solution

To achieve this, we can use the DataFrame.mask method. Here’s an example:

# Create df3 by masking df1 with values from df2
df3 = df2.mask(df1.isna())

print (df3)

This will create a new dataframe, df3, where NaN values in df1 are replaced with values from df2.

Explanation

The DataFrame.mask method is used to replace specified cells with NaN values. In this case, we pass the result of df1.isna(), which returns a boolean mask indicating whether each cell in df1 contains NaN values.

By using df2.mask() on this mask, we effectively replace all non-NaN cells in df1 with values from df2.

Discussion

The use of DataFrame.mask provides an efficient and straightforward way to achieve our goal. However, it’s essential to note that this method modifies the original dataframe.

If you want to preserve the original dataframes and create a new one, as shown in our example, be cautious when using this method.

Additionally, if your dataframe contains a mix of numeric and string values, you may need to modify the mask to accommodate these differences.

Last modified on 2025-04-24