1
votes

I'm using Julia, and right now I'm trying to use the PyCall package so that I can use the BeautifulSoup module for web parsing. My Julia code looks something like

using PyCall
pyinitialize("python3")
@pyimport bs4 #need BeautifulSoup
@pyimport urllib.request as urllib #need urlopen

url_base = "blah"
html = urllib.urlopen(url_base).read()
soup = bs4.BeautifulSoup(html, "lxml")

However, when I try to run it, I get complaints about the read() function. I first thought that read() would be a built-in Python function, but pybuiltin("read") didn't work.

I'm not sure what Python module I can import to get the read function. I tried importing the io module and using io.read(), but that didn't work. Additionally, using Julia's built-in read functions didn't work, since urllib.urlopen(url_base) is a PyObject.

2
What Python version are you using? There have been a few changes between Python 2 and 3 with urllib: docs.python.org/2/library/urllib.html - FlipperPA
I'm using Python 3 right now. I have urllib.request as the module I'm importing, though I renamed it to urllib since it's easier to type (and I think the "." can't be overloaded in Julia?). - Uthsav Chitra

2 Answers

1
votes

You have a typo:

html = urllib.urlopen(url_base).read()

should be

html = urllib.urlopen(url_base)[:read]()

See the PyCall documentation:

Important: The biggest difference from Python is that object attributes/members are accessed with o[:attribute] rather than o.attribute, so that o.method(...) in Python is replaced by o[:method](...) in Julia. Also, you use get(o, key) rather than o[key]. (However, you can access integer indices via o[i] as in Python, albeit with 1-based Julian indices rather than 0-based Python indices.)

0
votes

You need to split out to read the response. Instead of:

html = urllib.urlopen(url_base).read()

Try:

with urllib.urlopen(url_base) as response:
    html = response.read()

Python 3 goes a long way improving clarity and readability.