10
votes

Given a JSON file, the JSON package happily parses it. But if I would like it as a DataFrame (or any other columnar data structure), what would be a good way to get it?

Currently, for example, I have:

using JSON
using DataFrames

json_str = """
[{ "color": "red", "value": "#f00" }, { "color": "green", "value": "#0f0" }, 
 { "color": "blue", "value": "#00f" }, { "color": "cyan", "value": "#0ff" }, 
 { "color": "magenta", "value": "#f0f" }, { "color": "yellow", "value": "#ff0" }, 
 { "color": "black", "value": "#000" } ]
  """

function jsontodf(a)
    ka = union([keys(r) for r in a]...)
    df = DataFrame(;Dict(Symbol(k)=>get.(a,k,NA) for k in ka)...)
    return df
end

a = JSON.Parser.parse(json_str)
jsontodf(a)

which results in:

7×2 DataFrames.DataFrame
│ Row │ color     │ value  │
├─────┼───────────┼────────┤
│ 1   │ "red"     │ "#f00" │
│ 2   │ "green"   │ "#0f0" │
│ 3   │ "blue"    │ "#00f" │
│ 4   │ "cyan"    │ "#0ff" │
│ 5   │ "magenta" │ "#f0f" │
│ 6   │ "yellow"  │ "#ff0" │
│ 7   │ "black"   │ "#000" │

and also handles some missing fields with NAs. Anything cleaner / faster (Julia v0.6+) ?

1
Check once github.com/johnmyleswhite/DataFramesIO.jl/commit/… , however I would also like to know the solution to this question.Rahul
@RahulLakhanpal DataFramesIO seems to have the same purpose, but I'm not sure this package is healthy for v0.6Dan Getz
Also it has no support for handling NA values by itself.Rahul
The answer below works fine. Could you accept it to make it more visible to bots and searchers?cmc

1 Answers

6
votes

I have dug out this old question, and now we have a better solution for it as of DataFrames.jl 0.18.0.

If all entries in JSON have the same fields you can write:

reduce(vcat, DataFrame.(a))

If you have to handle the possibility of different fields in each dict then write:

vcat(DataFrame.(a)..., cols=:union)

This can be slightly problematic if a has a lot of entries as it does splatting. I have just submitted a PR so that you will be also able to write:

reduce(vcat, DataFrame.(a), cols=:union)

in the near future.