1
votes

I have a csv file like this:

'3', '8948', 'f678'
'3', '5654', 'f644'
'6', '5567', 'g3335'
'9', '4467', 'g3356'
'9', '7666', 'h4433'

The CSV holds various records. The first column represents an ID field.

I have looped through the CSV file and added the rows to a list.

I have then used that list to make a JSON file. Which looks like this:

[
    [
        "3",
        "8948",
        "f678"
    ],
    [
        "3",
        "5654",
        "f644"
    ],
    [
        "6",
        "5567",
        "g3335"
    ]
     ...

But as I understand it, I wont be able to read from this JSON and perform tasks on it? From what I can see I need it to be a dictionary, but how can I make a dictionary from my CSV, especially since the ID field is repeated and wont be unique. The only other option is to just use a row number, if this is correct - how do I create a dictionary from my CSV with a row number?

1
But as I understand it, I wont be able to read from this JSON and perform tasks on it? Why? - Yevhen Kuzmovych
I thought you needed some kind of unique key to read from a json file - sr546
Does the csv file really contain a [ and ] at the start and end of each line? If not, then you can just read it using panda and provide the column names, like df = pd.read_csv(filename, sep=',', names=['field1', 'field2', 'field3'], header=None) and then use df.to_json() - BdR
Is there a way to do it without Panda? Sorry no there are no brackets in the CSV, that was just me copy and pasting from a list - sr546
You don't need any unique key. It just needs to be a correct json format (which it is). Your example works for me - Yevhen Kuzmovych

1 Answers

0
votes

I'm guessing your code so far looks something like this:

import csv
import json

data: list[list[str]] = []
with open("input.csv", newline="") as f_in:
    reader = csv.reader(f_in)
    for row in reader:
        data.append(row)

with open("data.json", "w") as f_out:
    json.dump(data, f_out, indent=2)

To address your first issue/concern about how valid this JSON is or isn't...

The bigger concept to take away is that Python's json module produces valid JSON. If the module didn't complain about something while you were dumping the data then the JSON is good.

But to more directly address your concern, JSON can look like a lot of different things:

print(json.dumps(1))
print(json.dumps("A"))
print(json.dumps({}))
print(json.dumps([]))

Each one of those dumps() produces valid JSON. I don't know how to formally prove that's valid, but I do trust tools like Python's json module (it's been vetted over many years of real-world use and probably the world over). I also went to https://jsonlint.com/ and entered those simple examples directly and got "Valid JSON" for all.

Now, what to do about the JSON you have?

You can process it the way it is, or you can create the structure you want by providing column names/keys yourself (assuming you know what the data represents):

data_keyed: list[dict[str, Any]] = []
with open("input.csv", newline="") as f_in:
    reader = csv.reader(f_in)
    for row in reader:
        data_row = {"Col1": row[0], "Col2": row[1], "Col3": row[2]}
        data_keyed.append(data_row)

with open("data_keyed.json", "w") as f_out:
    json.dump(data_keyed, f_out, indent=2)

and now we get:

[
  {
    "Col1": "3",
    "Col2": "8948",
    "Col3": "f678"
  },
  {
    "Col1": "3",
    "Col2": "5654",
    "Col3": "f644"
  },
  ...