0
votes

I am trying to scrape xml using beautiful soup and minidom but getting errors in python.

Below is my code and error for the same.

Code:

import xml.dom.minidom
import bs4 as bs
import urllib.request
source = urllib.request.urlopen('somelink.xml').read()
soup = bs.BeautifulSoup(source,'lxml')
doc = xml.dom.minidom.parse(soup)

Error:

Traceback (most recent call last):File "", line 1, inrunfile('D:/NLTK/Rwire Interface/untitled0.py', wdir='D:/NLTK/Rwire Interface')File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfileexecfile(filename, namespace)File "C:\ProgramData\Anaconda3\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfileexec(compile(f.read(), filename, 'exec'), namespace)File "D:/NLTK/Rwire Interface/untitled0.py", line 13, indoc = xml.dom.minidom.parse(soup)File "C:\ProgramData\Anaconda3\lib\xml\dom\minidom.py", line 1958, in parsereturn expatbuilder.parse(file)File "C:\ProgramData\Anaconda3\lib\xml\dom\expatbuilder.py", line 913, in parseresult = builder.parseFile(file)File "C:\ProgramData\Anaconda3\lib\xml\dom\expatbuilder.py", line 204, in parseFilebuffer = file.read(16*1024)TypeError: 'NoneType' object is not callable

1
So you want to first parse the Source using Beautiful Soup that is soup? and then want to parse that soup again using minidom parse? Both parsers work independently. I want to know what are you trying to achieve.NoorJafri
I basically want to parse XML, and then extract data inside it.Pankaj Garg

1 Answers

0
votes

Your data is already parsed see below:

import xml.dom.minidom
import bs4 as bs
import urllib.request
source = urllib.request.urlopen('somelink.xml').read()
soup = bs.BeautifulSoup(source,'lxml') #Soup has parsed data
doc = xml.dom.minidom.parse(source)  #Doc has parsed data 

xml.dom.minidom.parse() expects xml either in string format or file format. You are providing it with soup object which it doesn't accept. Hence your error message.