I'm using lxml and python 3 to parse many files and merge files that belong together. The files are actually stored in pairs of two (that are also merged first) inside zip files but i don't think that matters here.
We're talking about 100k files that are about 900MB in zipped form.
My problems is that my script works fine but at somepoint (for multiple runs it's not always the same point so it shouldn't be a problem with a certain file) i get this error:
File "C:\Users\xxx\workspace\xxx\src\zip2xml.py", line 110, in _writetonorm normroot.getroottree().write(norm_file_path) File "lxml.etree.pyx", line 1866, in lxml.etree._ElementTree.write (src/lxml\lxml.etree.c:46006) File "serializer.pxi", line 481, in lxml.etree._tofilelike (src/lxml\lxml.etree.c:93719) File "serializer.pxi", line 187, in lxml.etree._raiseSerialisationError (src/lxml\lxml.etree.c:90965) lxml.etree.SerialisationError: IO_WRITE
I have no idea what causes this error. The entire code is a little cumbersome so i hope the relevant areas suffice:
def _writetonorm(self, outputpath):
'''Writes the current XML to a file.
It'll update the file if it already exists and create the file otherwise'''
#Find Name
name = None
try:
name = self._xml.xpath("xxx")[0].text.rstrip().lstrip()
except Exception as e:
try:
name = self._xml.xpath("xxx")[0].text.rstrip().lstrip()
except Exception as e:
name = "damn it!"
if name != None:
#clean name a bit
name = name[:35]
table = str.maketrans(' /#*"$!&<>-:.,;()','_________________')
name = name.translate(table)
name = name.lstrip("_-").rstrip("_-")
#generate filename
norm_file_name = name + ".xml"
norm_file_path = os.path.join(outputpath, norm_file_name)
#Check if we have that completefile already. If we do, update it.
if os.path.isfile(norm_file_path):
norm_file = etree.parse(norm_file_path, self._parser)
try:
normroot = norm_file.getroot()
except:
print(norm_file_path + "is broken !!!!")
time.sleep(10)
else:
normroot = etree.Element("norm")
jurblock = etree.Element("jurblock")
self._add_jurblok_attributes(jurblock)
jurblock.insert(0, self._xml)
normroot.insert(0, jurblock)
try:
normroot.getroottree().write(norm_file_path) #here the Exception occurs
except Exception as e:
print(norm_file_path)
raise e
I know that my exception handling isn't great but this is just a proof of work for now. Can anyone tell me why the error happens ?
Looking at the file that causes the error it's not wellformed but I suspect that is because the error happened and it was fine before the latest iteration.
lxml
and maybe even different Python version. It would be good if you were able to isolate error (provide small self-contained code and data that reproduce error) so I could check myself. – Tupteq