JSON Parser Python Script Issues

Question

I'm a first year student in CS, attempting to debug a simple Python script.

The script is attempting to parse a directory of JSON files, aka an AWS bucket. I can't figure out where these errors come from, however:

import json
import os
from pprint import pprint

jsonDirectory = "/path/to/dir/"
targetRegion = "-insert-region-here"

print("Searching for records with AWS Region: " + targetRegion)
print("")

for filename in os.listdir(jsonDirectory):
print("Reading: " + filename)
data = json.dumps(open(jsonDirectory + filename))

for i in range(len(data["Records"])):
    if data["Records"][i]["awsRegion"] == targetRegion:
        print("---------------------------")
        print("Record #" + str(i))
        print("Username: " + data["Records"][i]["userIdentity"]    ["userName"])
        print("Event name: " + data["Records"][i]["eventName"])
        print("Event time: " + data["Records"][i]["eventTime"])
        print("---------------------------")

print("")

print("Completed reading files.")

The errors:

Traceback (most recent call last): File "/path/to/file.py", line 13, in data = json.dumps(open(jsonDirectory + filename)) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/init.py", line 231, in dumps return _default_encoder.encode(obj) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/encoder.py", line 180, in default o.class.name) TypeError: Object of type 'TextIOWrapper' is not JSON serializable

Thanks! I'm still getting errors though: "Traceback (most recent call last): File "/path/of/script.py", line 13, in <module> data = json.load(open(jsonDirectory + filename)) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 296, in load return loads(fp.read(), File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 37: invalid start byte" — Steve8302
check your file, you have an error like indicated in position 37 — YGouddi

mementum mementum · Accepted Answer · 2018-01-26T10:29:57

Let me assume that you are not in Western Europe or USA and the default encoding is not UTF-8 or something usually very compatible like iso-8859-1. From the comments above

data = json.load(open(jsonDirectory + filename))

If you separate that statement in:

f = open(jsonDirectory + filename)
fdata = f.read()
data = json.loads(fdata)

You will see that the error happens in fdata = f.read(). The suggestion is to do:

f = open(jsonDirectory + filename, encoding='my-encoding')
fdata = f.read()
data = json.loads(fdata)

If you are unsure, try to force open to ignore/bypass the errors. From the Python docs at: https://docs.python.org/3/library/functions.html#open

errors is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode. A variety of standard error handlers are available (listed under Error Handlers), though any error handling name that has been registered with codecs.register_error() is also valid. The standard names include:

'strict' to raise a ValueError exception if there is an encoding error. The default value of None has the same effect.

'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss.

'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data.

'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the surrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding.

'xmlcharrefreplace' is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference &#nnn;.

'backslashreplace' replaces malformed data by Python’s backslashed escape sequences.

'namereplace' (also only supported when writing) replaces unsupported characters with \N{...} escape sequences.

Start with ignore as in:

f = open(jsonDirectory + filename, errors='ignore')
fdata = f.read()
data = json.loads(fdata)

And check if the output satisfies you or where things have gone wrong.

JSON Parser Python Script Issues

1 Answers