1
votes

I have to remove some specific tag in apache-tomcat web.xml files

web.xml

    <?xml version="1.0" encoding="ISO-8859-1"?>



<web-app xmlns="http://java.sun.com/xml/ns/javaee"
  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://java.sun.com/xml/ns/javaee
                      http://java.sun.com/xml/ns/javaee/web-app_3_0.xsd"
  version="3.0">

  <!-- ======================== Introduction ============================== -->
  <!-- This document defines default values for *all* web applications      -->
  <!-- loaded into this instance of Tomcat.  As each application is         -->
  <!-- deployed, this file is processed, followed by the                    -->
  <!-- "/WEB-INF/web.xml" deployment descriptor from your own               -->
  <!-- applications.                                                        -->
  <!--                                                                      -->
  <!-- WARNING:  Do not configure application-specific resources here!      -->
  <!-- They should go in the "/WEB-INF/web.xml" file in your application.   -->

     <servlet>
        <servlet-name>default</servlet-name>
        <servlet-class>org.apache.catalina.servlets.DefaultServlet</servlet-class>
        <init-param>
            <param-name>debug</param-name>
            <param-value>0</param-value>
        </init-param>
        <init-param>
            <param-name>listings</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>1</load-on-startup>
    </servlet>
   <servlet>
        <servlet-name>jsp</servlet-name>
        <servlet-class>org.apache.jasper.servlet.JspServlet</servlet-class>
        <init-param>
            <param-name>fork</param-name>
            <param-value>false</param-value>
        </init-param>
        <init-param>
            <param-name>xpoweredBy</param-name>
            <param-value>false</param-value>
        </init-param>
        <load-on-startup>3</load-on-startup>
    </servlet>

    <servlet>
        <servlet-name>cgi</servlet-name>
        <servlet-class>org.apache.catalina.servlets.CGIServlet</servlet-class>
        <init-param>
          <param-name>debug</param-name>
          <param-value>0</param-value>
        </init-param>
        <init-param>
          <param-name>cgiPathPrefix</param-name>
          <param-value>WEB-INF/cgi</param-value>
        </init-param>
         <load-on-startup>5</load-on-startup>
    </servlet>
</<web-app>

if servlet-name== cgi i need to remove entier servlet tag. my code as follows:

    from xml.etree.ElementTree import ElementTree
    tree = ElementTree()
    tree.parse('web.xml')
    servlets = tree.findall('servlet')
    print "servlets : ",servlets
    for servlet in servlets:
      servlet_names = foo.findall('servlet-name')
      for servlet_name  in servlet_names:
            if servlet_name == "cgi" :
                    print "servlet_name :", servlet_name
                    servlet.remove(servlet-name)

I am getting o/p as servlets : [] instead of all servlets and unable to enter the for loop. Can any one help me ?.

I am not getting Any exception

#!/usr/bin/python
from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                print "removed the cgi serverlet", root.remove(servlet)

=====output=============== servlets : [http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b35a8>, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3878>, http://java.sun.com/xml/ns/javaee}servlet at 7f84e09b3bd8>] servlet_name : cgi removed the cgi serverlet None

==== i have used pdb tracer to find out the element(servlet) value its shwoing as \n..

> /apps/manu/python/manunamespace.py(10)<module>()
-> servlet_name=servlet.find('{http://java.sun.com/xml/ns/javaee}servlet-name')
(Pdb) servlet_name
<Element {http://java.sun.com/xml/ns/javaee}servlet-name at 882878>
(Pdb) servlet_name.text
'jsp'
(Pdb) n
> /apps/manu/python/manunamespace.py(11)<module>()
-> print "servlet_name:", servlet_name.text
(Pdb) servlet_name.text
'cgi'
(Pdb) servlet.text
'\n        '
(Pdb) n
servlet_name: cgi
> /apps/manu/python/manunamespace.py(12)<module>()
-> if servlet_name.text == "cgi":
(Pdb) n
> /apps/manu/python/manunamespace.py(13)<module>()
-> print "remove the element"
(Pdb) n
remove the element
> /apps/manu/python/manunamespace.py(14)<module>()
-> print "remove : ",root.remove(servlet)
(Pdb) servlet
<Element {http://java.sun.com/xml/ns/javaee}servlet at 882d88>
(Pdb) servlet.text
'\n 

   '
2
You're just not printing the changed tree, only the result of root.remove, which is None. The code works, you just need to do something with the tree after modifying it. To print it you can use tree.write(sys.stdout), or treee.write(open('filename', 'w')) to write it to a file.mata
got it and its working now,but its unable to write name space and excluding all comments in the file when i write into the file.Manohar Reddy

2 Answers

1
votes

This is failing:

servlets = tree.findall('servlet')

Because there are no servlet elements in your document. The root element specifies:

xmlns="http://java.sun.com/xml/ns/javaee"

Which means that all elements, unless otherwise specified, are in this XML namespace. So you want:

>>> tree.findall('{http://java.sun.com/xml/ns/javaee}servlet')
[<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec681b8>,
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec68200>, 
<Element {http://java.sun.com/xml/ns/javaee}servlet at 0x7f280ec682d8>]
>>> 
1
votes

You are not finding the tags you are searching for because they are in the default namespace (http://java.sun.com/xml/ns/javaee).

Also if you want to test an elements content, you need to use its text attribute, not compare to the element itself. If it matches, you need to remove the servlet-tag from the root, not the servlet-name tag from the servlet.

Try this:

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
servlets = root.findall('jee:servlet', nsmap)
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall('jee:servlet-name', nsmap)
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)

Or using the supported xpath syntax more efficiently:

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
nsmap = {'jee': 'http://java.sun.com/xml/ns/javaee'}
for servlet in root.findall("./jee:servlet[jee:servlet-name='cgi']", nsmap):
    root.remove(servlet)

Edit: For older python versions (tested with python2.5):

from xml.etree.ElementTree import ElementTree
tree = ElementTree()
tree.parse('web.xml')
root = tree.getroot()
ns = '{http://java.sun.com/xml/ns/javaee}'
servlets = root.findall(ns + 'servlet')
print "servlets : ",servlets
for servlet in servlets:
  servlet_names = servlet.findall(ns + 'servlet-name')
  for servlet_name  in servlet_names:
        if servlet_name.text == "cgi" :
                print "servlet_name :", servlet_name.text
                root.remove(servlet)