0
votes

I am extracting data from a REST API that I need to write to a SQL table. My approach is adding the JSON data to a dictionary, pass the dictionary to a dataframe and write the dataframe to SQL.

I get the following error when passing the balances() function to the dataframe:

ValueError: If using all scalar values, you must pass an index

What am I doing wrong? Also feel free to provide feedback on the structure of my code, I feel like there are easier ways of extracting the data with less code.

def balances():   
    for b in get_balances["balances"]:
        result = {}
        result["employeeID"] = int(b.get("employeeID"))
        result["resourceID"] = int(b.get("resourceID"))
        result["resourceType"] = int(b.get("resourceType"))
        if b.get("startDate") is None:
            pass
        else:
            result["startDate"] = b.get("startDate").split("#")[0]
        if b.get("endDate") is None:
            pass
        else:
            result["endDate"] = b.get("endDate").split("#")[0]
        result["minutesLeft"] = b.get("minutesLeft")
        result["minutestoTake"] = b.get("minutestoTake")
        result["minutesTaken"] = b.get("minutesTaken")
        result["minutesTakenPast"] = b.get("minutesTakenPast")
        result["minutestakenFuture"] = b.get("minutesTakenFuture")
        result["periodMinutesToTake"] = b.get("periodMinutesToTake")
        result["periodMinutesTaken"] = b.get("periodMinutesTaken")
        for h in b.get("history"):
            if h.get("planningDate") is None:
                pass
            else:
                result["planningDate"] = h.get("planningDate").split("#")[0]
            result["resourceTypeHistory"] = h.get("resourceType")
            result["resourceIDHistory"] = h.get("resourceID")
            result["minutes"] = h.get("minutes")
            result["balanceMinutes"] = h.get("balanceMinutes")
            result["remark"] = h.get("remark")
            yield result

print(pd.DataFrame(balances()))

#ValueError: If using all scalar values, you must pass an index

Sample output data of 2 rows:

{'employeeID': 569, 'resourceID': 230, 'resourceType': 144, 'startDate': '2020-01-01', 'endDate': '2020-12-31', 'minutesLeft': 11281, 'minutestoTake': None, 'minutesTaken': 960, 'minutesTakenPast': 0, 'minutestakenFuture': -960, 'periodMinutesToTake': 0, 'periodMinutesTaken': 0, 'planningDate': '2020-01-01', 'resourceTypeHistory': 15, 'resourceIDHistory': 3, 'minutes': 12000, 'balanceMinutes': 12000, 'remark': ''}
{'employeeID': 877, 'resourceID': 33, 'resourceType': 125, 'startDate': '2020-01-01', 'endDate': '2020-12-31', 'minutesLeft': 11281, 'minutestoTake': None, 'minutesTaken': 960, 'minutesTakenPast': 0, 'minutestakenFuture': -960, 'periodMinutesToTake': 0, 'periodMinutesTaken': 0, 'planningDate': '2020-06-05', 'resourceTypeHistory': 2, 'resourceIDHistory': 3, 'minutes': -480, 'balanceMinutes': 11281, 'remark': ''}
1

1 Answers

0
votes

Works using json_normalize for your sample JSON:

import pandas as pd
import json

with open('1.json', 'r+') as f:
    data = json.load(f)

df = pd.json_normalize(data)
print(df)

   employeeID  resourceID  resourceType   startDate     endDate  ...  resourceTypeHistory resourceIDHistory  minutes  balanceMinutes  remark
0         569         230           144  2020-01-01  2020-12-31  ...                   15                 3    12000           12000
1         877          33           125  2020-01-01  2020-12-31  ...                    2                 3     -480           11281