29
votes

I have some code which I run in very similar circumstances. This is the first circumstance, where I have an imdb_id of a film I want details of:

url = "http://mymovieapi.com/?id=#{self.imdb_id}&type=json&plot=none&episode=0&lang=en-US&aka=simple&release=simple&business=0&tech=0"
doc = Hpricot(open(url)).to_s
json = JSON.parse(doc)

puts json
puts json["imdb_id"]

And this gives the following result:

{"rating_count"=>493949,
"genres"=>["Drama", "Romance"],
"rated"=>"PG-13",
"language"=>["English", "French", "German", "Swedish", "Italian", "Russian"],
"rating"=>7.6,
"country"=>["USA"],
"release_date"=>19980403,
"title"=>"Titanic",
"year"=>1997,
"filming_locations"=>"Santa Clarita, California, USA",
"imdb_id"=>"tt0120338",
"directors"=>["James Cameron"],
"writers"=>["James Cameron"],
"actors"=>["Leonardo DiCaprio", "Kate Winslet", "Billy Zane", "Kathy Bates", "Frances Fisher", "Gloria Stuart", "Bill Paxton", "Bernard Hill", "David Warner", "Victor Garber", "Jonathan Hyde", "Suzy Amis", "Lewis Abernathy", "Nicholas Cascone", "Anatoly M. Sagalevitch"],
"also_known_as"=>["Tai tan ni ke hao"],
"poster"=>{"imdb"=>"http://ia.media-imdb.com/images/M/MV5BMjExNzM0NDM0N15BMl5BanBnXkFtZTcwMzkxOTUwNw@@._V1_SY317_CR0,0,214,317_.jpg", "cover"=>"http://imdb-poster.b0.upaiyun.com/000/120/338.jpg!cover?_upt=66ac07591382594194"},
"runtime"=>["194 min"],
"type"=>"M",
"imdb_url"=>"http://www.imdb.com/title/tt0120338/"}

tt0120338

This is as expected. In the second circumstance, I have the title and year of the same film:

url = "http://mymovieapi.com/?title=#{self.title}&type=json&plot=simple&episode=0&limit=1&year=#{self.year}&yg=1&mt=none&lang=en-US&offset=&aka=simple&release=simple&business=0&tech=0"
doc = Hpricot(open(url)).to_s
json = JSON.parse(doc)

puts json
puts json["imdb_id"]

From this, I get the exact same JSON output:

{"rating_count"=>493949,
"genres"=>["Drama", "Romance"],
"rated"=>"PG-13", "language"=>["English", "French", "German", "Swedish", "Italian", "Russian"],
"rating"=>7.6,
"country"=>["USA"],
"release_date"=>19980403,
"title"=>"Titanic",
"year"=>1997,
"filming_locations"=>"Santa Clarita, California, USA",
"imdb_id"=>"tt0120338",
"directors"=>["James Cameron"],
"writers"=>["James Cameron"],
"actors"=>["Leonardo DiCaprio", "Kate Winslet", "Billy Zane", "Kathy Bates", "Frances Fisher", "Gloria Stuart", "Bill Paxton", "Bernard Hill", "David Warner", "Victor Garber", "Jonathan Hyde", "Suzy Amis", "Lewis Abernathy", "Nicholas Cascone", "Anatoly M. Sagalevitch"],
"also_known_as"=>["Tai tan ni ke hao"],
"poster"=>{"imdb"=>"http://ia.media-imdb.com/images/M/MV5BMjExNzM0NDM0N15BMl5BanBnXkFtZTcwMzkxOTUwNw@@._V1_SY317_CR0,0,214,317_.jpg",
"cover"=>"http://imdb-poster.b0.upaiyun.com/000/120/338.jpg!cover?_upt=ec8bdec31382594417"},
"runtime"=>["194 min"],
"type"=>"M",
"imdb_url"=>"http://www.imdb.com/title/tt0120338/"}

But when I try and call puts json["imdb_id"], I get this error:

no implicit conversion of String into Integer (TypeError)

This always happens when fetching using the title and year, yet it seems unexplained as the JSON output is exactly the same.

3
Is it "put json["imdb_id"]" or "puts"? And can you post the json returned for the second type? - Amith Koujalgi
@AmithKoujalgi It's puts, I typed that small part out here and made a mistake. I've copied in the 2nd json output, but it is exactly the same. - Andrew
Why are you using Hpricot? And why are you using Hpricot to "parse" the JSON output received? First, it's not needed, it's also been replaced by Nokogiri, so, IF you needed to parse HTML, which you don't, you'd do better with a newer HTML parser. - the Tin Man
@theTinMan I'm fairly new to JSON (this is my first real app with it), so thanks for the heads up on that. I wasn't sure how to get the JSON from the webpage, and I was using Hpricot elsewhere so used it to do that. How should I be doing it? Using Nokogiri, or some other way? - Andrew
Don't use Nokogiri or Hpricot. You're requesting a JSON response and that's what you're getting. - the Tin Man

3 Answers

45
votes

From the exception, it seems that json in the second response is an array with only one element, so the output of puts json is the same (brackets don't get output with puts), but json["string"] fails because [] expects an Integer to use as index.

Check p json or that json.is_a?(Array) and if it is indeed an array, try with json.first['imdb_id'].

3
votes

Here's what's wrong and how to fix it:

  • You're NOT getting HTML back, so you do NOT need to parse HTML. Look at the contents of doc in the examples below. Notice, there is no HTML parsing going on, nor is there any need for it because you're specifically requesting a JSON response: type=json.
  • The first request returns a single response because you're asking for a specific imdb_id. You can only get one response back, so you get just a single object/hash.
  • The second request returns an array of responses, because there could be multiple items with the same title and year, though it seems unlikely. As a result, you have to grab a particular item from the returned array after parsing the JSON. I used first, but your mileage might vary since you could get multiple items; You'll have to figure out which is the appropriate one if you do get more than one.

Here's some code:

require 'json'
require 'open-uri'

imdb_id = 'tt0120338'
url = "http://mymovieapi.com/?id=#{ imdb_id }&type=json&plot=none&episode=0&lang=en-US&aka=simple&release=simple&business=0&tech=0"
doc = open(url).read
doc[0..5] # => "{\"rati"
json = JSON.parse(doc) # => {"rating_count"=>493949, "genres"=>["Drama", "Romance"], "rated"=>"PG-13", "language"=>["English", "French", "German", "Swedish", "Italian", "Russian"], "rating"=>7.6, "country"=>["USA"], "release_date"=>19980403, "title"=>"Titanic", "year"=>1997, "filming_locations"=>"Santa Clarita, California, USA", "imdb_id"=>"tt0120338", "directors"=>["James Cameron"], "writers"=>["James Cameron"], "actors"=>["Leonardo DiCaprio", "Kate Winslet", "Billy Zane", "Kathy Bates", "Frances Fisher", "Gloria Stuart", "Bill Paxton", "Bernard Hill", "David Warner", "Victor Garber", "Jonathan Hyde", "Suzy Amis", "Lewis Abernathy", "Nicholas Cascone", "Anatoly M. Sagalevitch"], "also_known_as"=>["Tai tan ni ke hao"], "poster"=>{"imdb"=>"http://ia.media-imdb.com/images/M/MV5BMjExNzM0NDM0N15BMl5BanBnXkFtZTcwMzkxOTUwNw@@._V1_SY317_CR0,0,214,317_.jpg", "cover"=>"http://imdb-poster.b0.upaiyun.com/000/120/338.jpg!cover?_upt=7dedce781382606097"}, "runtime"=>["194 min"], "type"=>"M", "imdb_url"=>"http://www.imdb.com/title/tt0120338/"}
json["imdb_id"] # => "tt0120338"

This returned a single item as a JSON response. You can see that by looking at the doc variable and see that it is JSON, NOT HTML. Parsing it using the JSON parser returned a hash.

url = "http://mymovieapi.com/?title=#{ json['title'] }&type=json&plot=simple&episode=0&limit=1&year=#{ json['year'] }&yg=1&mt=none&lang=en-US&offset=&aka=simple&release=simple&business=0&tech=0"
doc = open(url).read
doc[0..5] # => "[{\"rat"
json = JSON.parse(doc).first # => {"rating_count"=>493949, "genres"=>["Drama", "Romance"], "rated"=>"PG-13", "language"=>["English", "French", "German", "Swedish", "Italian", "Russian"], "rating"=>7.6, "country"=>["USA"], "release_date"=>19980403, "title"=>"Titanic", "year"=>1997, "filming_locations"=>"Santa Clarita, California, USA", "imdb_id"=>"tt0120338", "directors"=>["James Cameron"], "writers"=>["James Cameron"], "actors"=>["Leonardo DiCaprio", "Kate Winslet", "Billy Zane", "Kathy Bates", "Frances Fisher", "Gloria Stuart", "Bill Paxton", "Bernard Hill", "David Warner", "Victor Garber", "Jonathan Hyde", "Suzy Amis", "Lewis Abernathy", "Nicholas Cascone", "Anatoly M. Sagalevitch"], "plot_simple"=>"A seventeen-year-old aristocrat, expecting to be married to a rich claimant by her mother, falls in love with a kind but poor artist aboard the luxurious, ill-fated R.M.S. Titanic.", "poster"=>{"imdb"=>"http://ia.media-imdb.com/images/M/MV5BMjExNzM0NDM0N15BMl5BanBnXkFtZTcwMzkxOTUwNw@@._V1_SY317_CR0,0,214,317_.jpg", "cover"=>"http://imdb-poster.b0.upaiyun.com/000/120/338.jpg!cover?_upt=7dedce781382606097"}, "runtime"=>["194 min"], "type"=>"M", "imdb_url"=>"http://www.imdb.com/title/tt0120338/", "also_known_as"=>["Tai tan ni ke hao"]}
json["imdb_id"] # => "tt0120338"

This request returned an array of hashes, which is reasonable. The JSON string is an array, which is visible in the doc variable. Parsing it, then grabbing just the first element makes it possible to read the value of imdb_id.

Again, notice that there is no HTML parser involved, nor is one needed. You have to look at the data you're getting back, don't just assume.

0
votes

The error must be somewhere else in your code from what I can tell.

I set the JSON you gave as your output to the variable a in IRB: (JSON from your second example)

1.9.3p194 :043 > a
 => {"rating_count"=>493949, "genres"=>["Drama", "Romance"], "rated"=>"PG-13", "language"=>["English", "French", "German", "Swedish", "Italian", "Russian"], "rating"=>7.6, "country"=>["USA"], "release_date"=>19980403, "title"=>"Titanic", "year"=>1997, "filming_locations"=>"Santa Clarita, California, USA", "imdb_id"=>"tt0120338", "directors"=>["James Cameron"], "writers"=>["James Cameron"], "actors"=>["Leonardo DiCaprio", "Kate Winslet", "Billy Zane", "Kathy Bates", "Frances Fisher", "Gloria Stuart", "Bill Paxton", "Bernard Hill", "David Warner", "Victor Garber", "Jonathan Hyde", "Suzy Amis", "Lewis Abernathy", "Nicholas Cascone", "Anatoly M. Sagalevitch"], "also_known_as"=>["Tai tan ni ke hao"], "poster"=>{"imdb"=>"http://ia.media-imdb.com/images/M/MV5BMjExNzM0NDM0N15BMl5BanBnXkFtZTcwMzkxOTUwNw@@._V1_SY317_CR0,0,214,317_.jpg", "cover"=>"http://imdb-poster.b0.upaiyun.com/000/120/338.jpg!cover?_upt=ec8bdec31382594417"}, "runtime"=>["194 min"], "type"=>"M", "imdb_url"=>"http://www.imdb.com/title/tt0120338/"} 

Then I called:

1.9.3p194 :046 > puts a["imdb_id"]
tt0120338
 => nil 

This gave me the right output.