I was attempting to generate a choropleth map by modifying an SVG map depicting all counties in the US. The basic approach is captured by Flowing Data. Since SVG is basically just XML, the approach leverages the BeautifulSoup parser.
The thing is, the parser does not capture all path
elements in the SVG file. The following captured only 149 paths (out of over 3000):
#Open SVG file
svg=open(shp_dir+'USA_Counties_with_FIPS_and_names.svg','r').read()
#Parse SVG
soup = BeautifulSoup(svg, selfClosingTags=['defs','sodipodi:namedview'])
#Identify counties
paths = soup.findAll('path')
len(paths)
I know, however, that many more exist from both physical inspection, and the fact that ElementTree methods capture 3,143 paths with the following routine:
#Parse SVG
tree = ET.parse(shp_dir+'USA_Counties_with_FIPS_and_names.svg')
#Capture element
root = tree.getroot()
#Compile list of IDs from file
ids=[]
for child in root:
if 'path' in child.tag:
ids.append(child.attrib['id'])
len(ids)
I have not yet figured out how to write from the ElementTree
object in a way that is not all messed up.
#Define style template string
style='font-size:12px;fill-rule:nonzero;stroke:#FFFFFF;stroke-opacity:1;'+\
'stroke-width:0.1;stroke-miterlimit:4;stroke-dasharray:none;'+\
'stroke-linecap:butt;marker-start:none;stroke-linejoin:bevel;fill:'
#For each path...
for child in root:
#...if it is a path....
if 'path' in child.tag:
try:
#...update the style to the new string with a county-specific color...
child.attrib['style']=style+col_map[child.attrib['id']]
except:
#...if it's not a county we have in the ACS, leave it alone
child.attrib['style']=style+'#d0d0d0'+'\n'
#Write modified SVG to disk
tree.write(shp_dir+'mhv_by_cty.svg')
The modification/write routine above yields this monstrosity:
My primary question is this: why did BeautifulSoup fail to capture all of the path
tags? Second, why would the image modified with the ElementTree
objects have all of that extracurricular activity going on? Any advice would be greatly appreciated.
svg_soup = BeautifulSoup(svg); paths = svg_soup.find_all('path'); len(paths)
which outputted 3143. Perhaps you need to upgradebs4
? - MattDMo