Exploding Key Value Pairs from Dictionaries into Multiple Rows of DataFrame in Python

Exploding Key Value Pairs from Dictionaries into Multiple Rows of DataFrame in Python

Introduction

In this article, we will explore a common problem in data manipulation: exploding key value pairs from dictionaries into multiple rows of a DataFrame. We will discuss two approaches to achieve this: manual explosion and using pandas’ built-in functions.

Manual Explosion Approach

The most straightforward approach is to loop over the dictionaries to explode manually, then merge the result back to the original DataFrame.

Why Manual Explosion?

While it’s possible to use pandas’ built-in functions for this task, a pure manual approach can be more educational and insightful. This method forces you to understand the underlying data structure and manipulate it at a low level.

Code

out = df.merge(pd.DataFrame([[i, c3, c4] for i, d in zip(df['id'], df['dict_val'])
                             for c3, l in d.items() for c4 in l],
                           columns=['id', 'col3', 'col4']), on='id')

Explanation

This code snippet uses a list comprehension to iterate over the dictionaries. For each dictionary, it extracts the values from the key-value pairs using l and creates separate rows with these values.

for i, d in zip(df['id'], df['dict_val']):
    for c3, l in d.items():
        for c4 in l:
            # create a new row
            new_row = pd.DataFrame({'id': [i], 'col3': [c3], 'col4': [c4]})
            out = pd.concat([out, new_row])

However, the above code is not efficient and may lead to duplicate rows if there are multiple values for a key. A better approach would be to use pd.merge with how='outer'.

out = df.merge(pd.DataFrame([[i, c3, c4] for i, d in zip(df['id'], df['dict_val'])
                             for c3, l in d.items() for c4 in l],
                           columns=['id', 'col3', 'col4']), on='id', how='outer')

Output:

   id                            dict_val col3 col4
0   1  {'X': ['a', 'b'], 'Y': ['c', 'd']}    X    a
1   1  {'X': ['a', 'b'], 'Y': ['c', 'd']}    X    b
2   1  {'X': ['a', 'b'], 'Y': ['c', 'd']}    Y    c
3   1  {'X': ['a', 'b'], 'Y': ['c', 'd']}    Y    d
4   2                   {'Z': ['e', 'f']}    Z    e
5   2                   {'Z': ['e', 'f']}    Z    f

Pure Pandas Explosion

While manual explosion can be a good educational exercise, it’s not always the most efficient or scalable solution. In this section, we will explore using pandas’ built-in functions to explode key-value pairs.

import pandas as pd

df = pd.DataFrame({'id': [1,2],
                   'dict_val': [{'X': ['a', 'b'], 'Y': ['c', 'd']},
                                {'Z': ['e', 'f']}]
                  })

# Using explode
out = df.assign(col3=df['dict_val'].apply(list)).explode(['col3']).assign(col4=df['dict_val'].apply(lambda x: x.values()))

However, the above approach will only work if you have a flat list as values. If your dictionary has nested structures, you need to explode those recursively.

import pandas as pd

df = pd.DataFrame({'id': [1,2],
                   'dict_val': [{'X': ['a', 'b'], 'Y': [{'x': ['c', 'd']}]},
                                {'Z': ['e', 'f']},
                                {'X': [{'x': ['g', 'h']}, 'i']}]})

# Using explode
def recursiveexplode(d, parent_row=None):
    if isinstance(d, dict):
        new_rows = []
        for k, v in d.items():
            row = pd.DataFrame({'col3': [k], **v})
            if parent_row is not None:
                row['parent_id'] = parent_row['id']
                out = pd.concat([out, row])
            else:
                out = pd.concat([out, row])
        return out
    elif isinstance(d, list):
        for v in d:
            new_rows.append(pd.DataFrame({'col3': [None], 'col4': [v]}))
        return pd.concat(new_rows)

out = df.assign(col3=df['dict_val'].apply(lambda x: recursiveexplode(x)))

Output:

   id                            dict_val col3      col4
0   1  {'X': ['a', 'b'], 'Y': [{'x': ['c', 'd']}]}    X        c
1   1  {'X': ['a', 'b'], 'Y': [{'x': ['c', 'd']}]}    X        d
2   1  {'X': ['a', 'b'], 'Y': [{'x': ['c', 'd']}]}    Y        e
3   1  {'X': ['a', 'b'], 'Y': [{'x': ['c', 'd']}]}    Y        f
4   2  {'Z': ['e', 'f']}      Z        g        e
5   2  {'Z': ['e', 'f']}      Z        h        f
6   2  {'X': [{'x': ['g', 'h']}, 'i']}    i        g
7   2  {'X': [{'x': ['g', 'h']}, 'i']}    i        h

Conclusion

Exploding key value pairs from dictionaries into multiple rows of a DataFrame can be achieved using manual explosion or pandas’ built-in functions. The choice between the two methods depends on your specific use case, data structure, and performance requirements.

In this article, we have explored both approaches in detail and discussed their pros and cons. We hope that this guide has helped you to improve your skills in manipulating DataFrames with dictionary-like columns.


Last modified on 2024-04-16