Reading and manipulating nested data
Ways of handling nested data
Recently, I downloaded JSON data from BigQuery and had to make sense of the data. This starts with getting the data into tabular form.
Here are the libraries I used:
library(jsonlite)
library(tidyverse)
First, read in JSON data. Once read in, we check its class type to see that its a list. We’ll want to get it into a data frame.
# read data out into Large list (321 elements, 2.4 Mb)
# each row is *another* list
funnel <- lapply(readLines("bq-mixpanel-funnel.json"), fromJSON)
# "list" class
class(funnel)
After searching online, three approaches continually resurfaced.
First, using unlist()
and converting into matrix()
before wrapping that in a data.frame()
:
# Approach 1: convert to matrix, array
unlist_funnel <- matrix(unlist(funnel), byrow = TRUE, ncol = length(funnel[[1]]))
rownames(unlist_funnel) <- names(funnel)
as.data.frame(unlist_funnel) %>% view()
These next approaches get us closer (note: I know from interacting with the data in BigQuery that there should be 321 rows):
# Approach 2: Convert list to data frame
df <- data.frame(matrix(unlist(funnel), nrow = length(funnel), byrow = TRUE))
df2 <- data.frame(matrix(unlist(funnel), nrow = length(funnel), byrow = FALSE))
df3 <- data.frame(matrix(unlist(funnel), nrow = 321, byrow = TRUE), stringsAsFactors = FALSE)
The next approach is to use lapply()
#works but everything is on one column
unlist(lapply(funnel, c)) %>% view()
# this makes everything a list, but we want everything into a vector
t(lapply(funnel, c)) %>% view()
Finally, the approach that worked best, in this particular case was sapply()
. This functions turns things into vector, which can then be converted into a dataframe:
#still the ideal, this works because 'c' is used ot combine lists
t(sapply(funnel, c)) %>% view()