3
votes

I'm using python to create a function that can take in any pom.xml file, then return groupId, artifactId, and version from within dependency.
I found the following pom.xml from https://www.javatpoint.com/maven-pom-xml to show the structure I'm trying to parse.

<project xmlns="http://maven.apache.org/POM/4.0.0"   
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0   
http://maven.apache.org/xsd/maven-4.0.0.xsd">  

  <modelVersion>4.0.0</modelVersion>  

  <groupId>com.javatpoint.application1</groupId>  
  <artifactId>my-application1</artifactId>  
  <version>1.0</version>  
  <packaging>jar</packaging>  

  <name>Maven Quick Start Archetype</name>  
  <url>http://maven.apache.org</url>  

  <dependencies>  
    <dependency>  
      <groupId>junit</groupId>  
      <artifactId>junit</artifactId>  
      <version>4.8.2</version>  
    </dependency>  
  </dependencies>  
.
.
.
  <dependencies>  
    <dependency>  
      <groupId>abc</groupId>  
      <artifactId>def</artifactId>  
      <version>4.8.3</version>  
    </dependency>  
  </dependencies> 

</project>  

I have tried using minidom and etree.ElementTree but I am brand new to all of this & haven't been able to make progress. I also want it to be able to handle pom.xml files with varying numbers of dependencies, so I think it would have to be a loop. Based off of other stackoverflow responses, something I came up with is below.

from xml.dom import minidom

dependencyInfo = {}

dom = minidom.parse('pom.xml')
depend = dom.getElementsByTagName("dependency")

for dep in depend:
    info = {}
    info['groupId'] = dep.attributes['groupId'].value
    info['artifactId'] = dep.attributes['artifactId'].value
    info['version'] = dep.attributes['version'].value
    dependencyInfo[] = info

print(dependencyInfo)

Is there a way to get it to return a nested dictionary containing the dependencies with its info in a way similar to this?

dependencyInfo = { 'junit': {'artifactId': 'junit', 'version': '4.8.2'},
                'abc': {'artifactId': 'def', 'version': '4.8.3'}}
1
Beware that xml parsing only works for very simple poms. Versions can be inherited from <dependencyManagement> which can also be part of a parent POM. Versions can also be set through properties. - J Fabian Meier

1 Answers

3
votes

This can be done by using a couple of libraries:

pom= """[your xml above]"""

from lxml import etree
from collections import defaultdict

root = etree.fromstring(pom) #or .parse('pom.xml') if you read it from that file
tree = etree.ElementTree(root)

depend = tree.xpath("//*[local-name()='dependency']")
dependencyInfo = defaultdict(dict)

for dep in depend:
    infoList = []
    for child in dep.getchildren():
        infoList.append(child.tag.split('}')[1])
        infoList.append(child.text)


    dependencyInfo[infoList[1]].update({infoList[2] : infoList[3],infoList[4] : infoList[5]})

dependencyInfo

Output:

defaultdict(dict,
        {'junit': {'artifactId': 'junit', 'version': '4.8.2'},
         'abc': {'artifactId': 'def', 'version': '4.8.3'}})