0
votes

I have a json url and I am trying to extract data from the response. below is my code

url = urllib2.urlopen("https://i1.adis.ws/s/foo/M0011126_001_SET.js?func=app.mjiProduct.handleJSON&protocol=https")
content = url.read()
soup = BeautifulSoup(content, "html.parser")
print(soup.prettify())
print(soup.items)
newDictionary=json.loads(str(soup))

Below is the response.content

app.mjiProduct.handleJSON({"name":"M0011126_001_SET","items":[{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_MAIN","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT1","width":3200,"height":4800,"format":"TIFF","opaque":"true"},{"type":"img","src":"https://i1.adis.ws/i/foo/M0011126_001_ALT2","width":3200,"height":4800,"format":"TIFF","opaque":"true"}]});

I am new to JSON and unable to understand the response. In addition, I need to parse the response in json or in some form to extract image sources. But the above code gives me below error.

No JSON object could be decoded

Can Anyone please guide me ? Thanks

3

3 Answers

0
votes

first of all your url isn't working it returns app.mjiProduct.handleJSON({"status":"error","errorMsg":"Failed to get set"});

the second thing is that you don't have to pass the content to Beautifulsoup, you could pass it directly to json like I did in my code bellow without the Beautifulsoup object.

I used httpbin to test but this should work in your url. I used python3 tho

from urllib.request import urlopen
import json
url = urlopen("http://httpbin.org/get")
content = url.read()
newDictionary=json.loads(content)
print(newDictionary)

output: {'args': {}, 'headers': {'Accept-Encoding': 'identity', 'Connection': 'close', 'Host': 'httpbin.org', 'User-Agent': 'Python-urllib/3.6'}, 'origin': '', 'url': 'http://httpbin.org/get'}

0
votes

Below is the code that worked for me.

json_data=url.read()
purify_data = json_data.split('handleJSON(')[1].split(');')[0]
loaded_json = json.dumps(json_data)
print(loaded_json['items'][0]['src'])

actually, I figured out that json_data was of type string and I was unable to decode because of the format of that string, that was

app.mjiProduct.handleJSON(REQUIRED JSON)

So, first I filtered my string and then loaded it with json and the problem is solved.

0
votes

The response doesn't contain valid JSON. It looks like a executable code (probably JavaScript). But the part {"name":"M0011126_001_SET","items":[...]} is valid JSON. So if you know for sure that response has always this format you can strip the function call like this:

content = url.read()[26:-2] # Cut first 26 characters and last two
newDictionary=json.loads(str(content))

I don't know much the Beautiful Soup but what I find it's a library for processing HTML files while your response is not HTML so I think you shouldn't use it for it.