Create DataFrames from Nested JSON data

Turning Nested JSON data into dataframes

Situation: You’ve connected to an API endpoint, that is structured as a nested json, here’s how to loop through and select certain values into a dataframe for further processing.

In one example, we’re looping through a particular endpoint from Coordinape. We’re going to loop through one level (list of dictionaries) and append data to empty lists.

Second example, we’ll conditionally loop through the second level (a dictionary) to grab specific values. As a bonus, we’ll also use try-except(error handling) to handle an AttributeError.

Finally, we’ll create the dataframe from lists.


# empty lists
name_list_24 = []
address_list_24 = []
circle_id_list_24 = []
discord_username_list_24 = []
profile_address_24 = []


# loop through level 1
for dct in df_manifest_24['circle.users'][0]:
    name_list_24.append(dct['name'])
    address_list_24.append(dct['address'])
    circle_id_list_24.append(dct['circle_id'])


# conditionally loop through level 2 
# use try-except
for dct in df_manifest_24['circle.users'][0]:
    try:
        for k, v in dct['profile'].items():
            if k == 'discord_username':
                discord_username_list_24.append(v)
            elif k == 'address':
                profile_address_24.append(v)
            else:
                print("Done.")
    except:
        AttributeError
    pass

# create dataframe from lists
df_24 = pd.DataFrame(list(zip(name_list_24, address_list_24, circle_id_list_24)),
                     columns=['Name', 'Address', 'Circle_Id'])

Loading Nested Data json_normalize

Situation: Before creating a dataframe as we saw above, sometimes we just need to convert nested JSON data.

Data breakdown:

  • data is variable we save json dump coming in from http request.
  • flatten a portion of data, data[‘businesses’] and load into dataframe, ‘bookstores’
  • the ‘bookstores’ dataframe has many columns, some of which is a column of arrays
  • one column, categories is a column of arrays (deeply nested data)
  • now, flatten that data (categories) setting parameters to json_normalize, including
  • data, sep, record_path, meta and meta-prefix
# import json_normalize
import pandas as pd
import requests
from pandas.io.json import json_normalize

# use dotenv to hide token info
import os
from dotenv import load_dotenv   
load_dotenv()    

# headers
api_url = os.environ.get("API_URL")
api_key = os.environ.get("API_KEY")
headers = {"Authorization": "Bearer {}".format(api_key)}
params = {"term": "bookstore",
          "location": "San Francisco"}
          
# make API call and extract JSON data
response = requests.get(api_url,
                        headers=headers,
                        params=params)
                        
# data is entire json dump                       
data = response.json()

# data is also nested json
# flatten data and load to dataframe with _ separator
bookstores = json_normalize(data['businesses'], sep="_")
print(list(bookstores))

# print deeply nested data
# within bookstores dataframe, choose "categories" column
print(bookstores.categories.head())

# flatten categories data, bring in business details
df = json_normalize(data['businesses'],
                    sep="_",
                    record_path = "categories",
                    meta = ["name",
                            "alias",
                            "rating",
                            ["coordinates", "latitude"],
                            ["coordinates", "longitude"]],
                    meta_prefix="biz_")





For more content on data and web3 find me on Twitter.

Previous
Next