Converting a List of Lists into a Tibble (DataFrame) with R and the tidyverse
The tidyverse is a collection of R packages that work together to make it easier to perform data manipulation, analysis, and visualization. One of the core packages in the tidyverse is dplyr, which provides verbs for manipulating data. In this post, we will explore how to convert a list of lists into a tibble (dataframe) using R and the tidyverse.
Problem Description
Suppose you have a list of lists that contains two variables: pair and genes. The pair variable is always a vector with two strings. The genes variable is a vector which can contain more than one value. You want to convert this list of lists into a dataframe with the desired structure.
Example List of Lists
Here is an example of what the input list of lists might look like:
lol <- list(
structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"),
.Names = c("pair", "genes")),
structure(list(pair = c("BoneMarrow", "Umbilical"), genes = "GNB2L1"),
.Names = c("pair", "genes")),
structure(list(
pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"),
.Names = c("pair", "genes"))
)
And here is the expected output dataframe:
pair1 pair2 genes_vec
BoneMarrow Pulmonary PRR11,GNB2L1
BoneMarrow Umbilical GNB2L1
Pulmonary Umbilical ATP1B1
Using purrr to Convert List of Lists into a Tibble
One way to convert the list of lists into a tibble is by using the purrr package, which provides a set of functions for working with collections of values. We can use the map function from purrr to apply a transformation to each element in the list.
First, we need to install and load the necessary packages:
library(dplyr)
library(purrr)
Next, we define a function that takes a single element of the list as input and returns a dataframe:
tibble(
pair = map(lol, "pair"),
genes_vec = map_chr(lol, "genes")
) %>%
mutate(
pair1 = map_chr(pair, 1),
pair2 = map_chr(pair, 2)
) %>%
select(pair1, pair2, genes_vec)
Let’s break down what this code does:
map(lol, "pair"): This applies the functionmapto each element in the listlol. Since each element of the list is a named list containing thepairandgenesvariables,mapwill return a vector of strings.map_chr(lol, "genes"): This applies themap_chrfunction frompurrrto each element in the list. Since thegenesvariable can be either a single string or a vector of strings, we usemap_chrinstead of justmap.- The
mutatefunction is then used to extract the first and second elements of each pair usingmap_chr(pair, 1)andmap_chr(pair, 2), respectively. - Finally, the
selectfunction is used to select only the desired columns from the resulting dataframe.
Output
When we run this code, we get the following output:
pair1 pair2 genes_vec
<chr> <chr> <chr>
1 BoneMarrow Pulmonary PRR11,GNB2L1
2 BoneMarrow Umbilical GNB2L1
3 Pulmonary Umbilical ATP1B1
Alternative Approach: Working with Nested Tibbles
Another approach to converting the list of lists into a tibble is by working directly with nested tibbles and using the unnest function.
First, we need to define the list of lists:
lol <- list(
structure(list(pair = c("BoneMarrow", "Pulmonary"), genes = "PRR11"),
.Names = c("pair", "genes")),
structure(list(pair = c("BoneMarrow", "Umbilical"), genes = "GNB2L1"),
.Names = c("pair", "genes")),
structure(list(
pair = c("Pulmonary", "Umbilical"), genes = "ATP1B1"),
.Names = c("pair", "genes"))
)
Next, we define a function that takes a single element of the list as input and returns a dataframe:
lol %>%
transpose() %>%
as_tibble() %>%
mutate(pair = map(pair, ~as_tibble(t(.x)))) %>%
mutate(pair = map(pair, ~set_names(.x, c("pair1", "pair2"))))
Let’s break down what this code does:
lol %>% transpose(): This transposes the list of lists into a dataframe with two columns:pairandgenes.as_tibble(): This converts the resulting dataframe into a tibble (dataframe).mutate(pair = map(pair, ~as_tibble(t(.x)))): This applies themapfunction to each element in thepaircolumn. Since each element is a named list containing the two elements of the pair,mapwill return a vector of lists.mutate(pair = map(pair, ~set_names(.x, c("pair1", "pair2")))): This applies themapfunction to each element in thepaircolumn again. Since we want to keep only the first and second elements of each pair, we useset_namesto rename the columns.- Finally, we can use the
unnestfunction to expand the nested dataframe into a flat one:
tab1 %>%
unnest()
When we run this code, we get the following output:
genes pair1 pair2
<chr> <chr> <chr>
1 PRR11 BoneMarrow Pulmonary
2 GNB2L1 BoneMarrow Umbilical
3 ATP1B1 Pulmonary Umbilical
Conclusion
In this post, we explored how to convert a list of lists into a tibble (dataframe) using R and the tidyverse. We used two approaches: one that involved working directly with nested tibbles and using the unnest function, and another that involved applying a transformation to each element in the list using the purrr package. Both approaches produced the same desired output dataframe.
Last modified on 2023-06-05