3
votes

I'm finding the usage of xml-> extremely confusing. I've read the docs and the examples but can't figure out how to get the nested nodes of an xml doc.

Assume the following xml is in a zipper (as from xml-zip):

<html>
 <body>
  <div class='one'>
    <div class='two'></div>
  </div>
 </body>
</html>

I am trying to return the div with class='two'.

I was expecting this to work:

(xml-> z :html :body :div :div)

Or this:

(xml-> z :html :body :div (attr= :class "two"))

Kind of like css selectors.

But it returns only the first level, and it doesn't search down through the tree.

The only way I can make it work is:

(xml-> z :html :body :div children leftmost?)

Is that what I'm supposed to do?

The whole reason I started using xml-> was for convenience and to avoid navigating the zipper up and down and left and right. If xml-> can not get nested nodes then I don't see the value over clojure.zip.

Thanks.

2

2 Answers

1
votes

Two consequitive :div match the same node. You should have come down. And I believe you've forgotten to get the node with zip/node.

(ns reagenttest.sample
    (:require 
              [clojure.zip :as zip]
              [clojure.data.zip.xml :as data-zip]))
(let [s "..."
      doc (xml/parse (java.io.ByteArrayInputStream. (.getBytes s)))]
(prn (data-zip/xml-> (zip/xml-zip doc) :html :body :div zip/down (data-zip/attr= :class "two") zip/node)))

or you could use custom-made abstraction if you are not happy with xml->:

(defn xml->find [loc & path]
    (let [new-path (conj (vec (butlast (interleave path (repeat zip/down)))) zip/node)]
        (apply (partial data-zip/xml-> loc) new-path)))

Now you can do this:

(xml->find z :html :body :div :div)
(xml->find z :html :body :div (data-zip/attr= :class "two"))
0
votes

You can solve this problem using tupelo.forest from the Tupelo library. The forest contains functions for searching and manipulating trees of data. It is like Enlive on steroids. Here is a solution for your data:

(dotest
  (with-forest (new-forest)
    (let [xml-str         "<html>
                             <body>
                               <div class='one'>
                                 <div class='two'></div>
                               </div>
                             </body>
                           </html>"

          enlive-tree     (->> xml-str
                            java.io.StringReader.
                            en-html/xml-resource
                            only)
          root-hid        (add-tree-enlive enlive-tree)

          ; Removing whitespace nodes is optional; just done to keep things neat
          blank-leaf-hid? (fn [hid] (ts/whitespace? (hid->value hid))) ; whitespace pred fn
          blank-leaf-hids (keep-if blank-leaf-hid? (all-leaf-hids)) ; find whitespace nodes
          >>              (apply remove-hid blank-leaf-hids) ; delete whitespace nodes found

          ; Can search for inner `div` 2 ways
          result-1        (find-paths root-hid [:html :body :div :div]) ; explicit path from root
          result-2        (find-paths root-hid [:** {:class "two"}]) ; wildcard path that ends in :class "two"
    ]
       (is= result-1 result-2) ; both searches return the same path
       (is= (hid->bush root-hid)
         [{:tag :html}
          [{:tag :body}
           [{:class "one", :tag :div}
            [{:class "two", :tag :div}]]]])
      (is=
        (format-paths result-1)
        (format-paths result-2)
        [[{:tag :html}
          [{:tag :body}
           [{:class "one", :tag :div}
            [{:class "two", :tag :div}]]]]])

       (is (val= (hid->elem (last (only result-1)))
             {:attrs {:class "two", :tag :div}, :kids []})))))

There are many examples in the unit tests and in the forest-examples demo file.