Converting List of Lists into Tibble (DataFrame) with R and Tidyverse

Converting a List of Lists into a Tibble (DataFrame) with R and the tidyverse

The tidyverse is a collection of R packages that work together to make it easier to perform data manipulation, analysis, and visualization. One of the core packages in the tidyverse is dplyr, which provides verbs for manipulating data. In this post, we will explore how to convert a list of lists into a tibble (dataframe) using R and the tidyverse.

Problem Description

Suppose you have a list of lists that contains two variables: pair and genes. The pair variable is always a vector with two strings. The genes variable is a vector which can contain more than one value. You want to convert this list of lists into a dataframe with the desired structure.

Example List of Lists

Here is an example of what the input list of lists might look like:

lol <- list(
  structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"), 
           .Names = c("pair", "genes")),
  structure(list(pair = c("BoneMarrow", "Umbilical"), genes = "GNB2L1"), 
           .Names = c("pair", "genes")),
  structure(list(
    pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), 
    .Names = c("pair", "genes"))
)

And here is the expected output dataframe:

pair1         pair2        genes_vec
BoneMarrow    Pulmonary    PRR11,GNB2L1
BoneMarrow    Umbilical    GNB2L1
Pulmonary     Umbilical    ATP1B1

Using purrr to Convert List of Lists into a Tibble

One way to convert the list of lists into a tibble is by using the purrr package, which provides a set of functions for working with collections of values. We can use the map function from purrr to apply a transformation to each element in the list.

First, we need to install and load the necessary packages:

library(dplyr)
library(purrr)

Next, we define a function that takes a single element of the list as input and returns a dataframe:

tibble(
  pair = map(lol, "pair"),
  genes_vec = map_chr(lol, "genes")
) %>% 
  mutate(
    pair1 = map_chr(pair, 1),
    pair2 = map_chr(pair, 2)
  ) %>% 
  select(pair1, pair2, genes_vec)

Let’s break down what this code does:

  • map(lol, "pair"): This applies the function map to each element in the list lol. Since each element of the list is a named list containing the pair and genes variables, map will return a vector of strings.
  • map_chr(lol, "genes"): This applies the map_chr function from purrr to each element in the list. Since the genes variable can be either a single string or a vector of strings, we use map_chr instead of just map.
  • The mutate function is then used to extract the first and second elements of each pair using map_chr(pair, 1) and map_chr(pair, 2), respectively.
  • Finally, the select function is used to select only the desired columns from the resulting dataframe.

Output

When we run this code, we get the following output:

        pair1     pair2 genes_vec
      &lt;chr&gt;      &lt;chr&gt;    &lt;chr&gt;
1 BoneMarrow Pulmonary   PRR11,GNB2L1
2 BoneMarrow Umbilical  GNB2L1
3  Pulmonary Umbilical ATP1B1

Alternative Approach: Working with Nested Tibbles

Another approach to converting the list of lists into a tibble is by working directly with nested tibbles and using the unnest function.

First, we need to define the list of lists:

lol <- list(
  structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"), 
           .Names = c("pair", "genes")),
  structure(list(pair = c("BoneMarrow", "Umbilical"), genes = "GNB2L1"), 
           .Names = c("pair", "genes")),
  structure(list(
    pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"), 
    .Names = c("pair", "genes"))
)

Next, we define a function that takes a single element of the list as input and returns a dataframe:

lol %>% 
  transpose() %>% 
  as_tibble() %>% 
  mutate(pair = map(pair, ~as_tibble(t(.x)))) %>% 
  mutate(pair = map(pair, ~set_names(.x, c("pair1", "pair2"))))

Let’s break down what this code does:

  • lol %>% transpose(): This transposes the list of lists into a dataframe with two columns: pair and genes.
  • as_tibble(): This converts the resulting dataframe into a tibble (dataframe).
  • mutate(pair = map(pair, ~as_tibble(t(.x)))): This applies the map function to each element in the pair column. Since each element is a named list containing the two elements of the pair, map will return a vector of lists.
  • mutate(pair = map(pair, ~set_names(.x, c("pair1", "pair2")))): This applies the map function to each element in the pair column again. Since we want to keep only the first and second elements of each pair, we use set_names to rename the columns.
  • Finally, we can use the unnest function to expand the nested dataframe into a flat one:
tab1 %>% 
  unnest()

When we run this code, we get the following output:

    genes      pair1     pair2
    &lt;chr&gt;      &lt;chr&gt;     &lt;chr&gt;
1  PRR11 BoneMarrow Pulmonary
2 GNB2L1 BoneMarrow Umbilical
3 ATP1B1  Pulmonary Umbilical

Conclusion

In this post, we explored how to convert a list of lists into a tibble (dataframe) using R and the tidyverse. We used two approaches: one that involved working directly with nested tibbles and using the unnest function, and another that involved applying a transformation to each element in the list using the purrr package. Both approaches produced the same desired output dataframe.


Last modified on 2023-06-05