4
votes

I am trying to parse different links from an xml file. I read the docs and every post I found about parsing xml files but I didn't find a way to access nodes like I want. For example:

<link rel="self" type="text/html" title="title0" length="8359" href="http://example0.com"/>
<link rel="alternate" type="text/html" title="title1" length="8359" href="http://example3.com"/>
<link rel="related" type="text/html" title="title2" length="8359" href="http://example4.com"/>
<link rel="related" type="text/html" title="title3" length="8359" href="http://example4.com"/>
<link rel="related" type="text/html" title="title4" length="8359" href="http://example5.com"/>
<link rel="related" type="text/html" title="title5" length="8359" href="http://example5.com"/>

How can I access:

  1. The href of the link that has a rel="self" (return String).
  2. The href of the link that has a rel="alternate" (return String).
  3. The hrefs of the links that has a rel="related" (return Array).

Using SimpleXML:

$xml=simplexml_load_file('url_to_xml') or die('Error: Cannot create object');

...

6

6 Answers

4
votes

You generally want to use XPath or something like it for parsing XML like documents. SimpleXML supports it. Example:

<?php
$string = <<<XML
<div>
  <link rel="self" type="text/html" title="title0" length="8359" href="http://example0.com"/>
  <link rel="alternate" type="text/html" title="title1" length="8359" href="http://example3.com"/>
  <link rel="related" type="text/html" title="title2" length="8359" href="http://example4.com"/>
  <link rel="related" type="text/html" title="title3" length="8359" href="http://example4.com"/>
  <link rel="related" type="text/html" title="title4" length="8359" href="http://example5.com"/>
  <link rel="related" type="text/html" title="title5" length="8359" href="http://example5.com"/>
</div>
XML;
$xml = new SimpleXMLElement($string);
foreach(['self', 'alternate', 'related', 'dne'] as $rel) {
  $val = @$xml->xpath("//link[@rel='$rel']/@href");
  $val = $val ? array_map(function($n) { return (string)$n; }, $val) : [];
  $val = count($val) == 1 ? $val[0] : $val;
  var_dump($val);
}
1
votes

If you're not comfortable using xpath then you could access the link element a bit like you would do an object:

    <?php
    $string = <<<XML
    <div>
      <link rel="self" type="text/html" title="title0" length="8359" href="http://example0.com"/>
      <link rel="alternate" type="text/html" title="title1" length="8359" href="http://example3.com"/>
      <link rel="related" type="text/html" title="title2" length="8359" href="http://example4.com"/>
      <link rel="related" type="text/html" title="title3" length="8359" href="http://example4.com"/>
      <link rel="related" type="text/html" title="title4" length="8359" href="http://example5.com"/>
      <link rel="related" type="text/html" title="title5" length="8359" href="http://example5.com"/>
    </div>
    XML;

    $xml = new SimpleXMLElement($string);

    $related = [];

    foreach($xml->link as $link) {

        switch($link['rel']){
            case 'self':
                $self = $link['href'];
                break;
            case 'alternate':
                $alternate = $link['href'];
                break;
            case 'related':
                array_push($related, $link['href']);
                break;
        }

    }

    print $self;
    // outputs : http://example0.com

    print $alternate;
    // outputs : http://example3.com

    print_r($related);
    /* outputs : Array
(
    [0] => SimpleXMLElement Object
        (
            [0] => http://example4.com
        )

    [1] => SimpleXMLElement Object
        (
            [0] => http://example4.com
        )

    [2] => SimpleXMLElement Object
        (
            [0] => http://example5.com
        )

    [3] => SimpleXMLElement Object
        (
            [0] => http://example5.com
        )

)
*/

If you don't like the switch statement you can use 'if' conditional statements instead:

foreach($xml->link as $link) {
    if($link['rel'] == 'self'){
       $self = $link['href'];
    }
    if($link['rel'] == 'alternate'){
       $alternate = $link['href'];
    }
    if($link['rel'] == 'related'){
        array_push($related, $link['href']);
    }
}
1
votes

The problem can be stated in general as "how to access an XML element's attributes based on the value of one of its other attributes". There are two basic approaches: iterate over all candidate elements, and check the attribute value; or use XPath to search the document.

Once you've found the matching elements, you need to access the attribute, which in SimpleXML means knowing two pieces of syntax:

  • $something['bar'] goes from an object representing an element (e.g. <foo>) to an object representing one of its attributes (e.g. bar="...")
  • (string)$something casts a variable to a string, which for SimpleXML gives you the full string content of an element or attribute

Using iteration is simple with SimpleXML, because you can just use foreach and if in what should be a fairly intuitive way. Assuming $xml is already pointing at the parent element of the <link> elements:

foreach ( $xml->link as $link ) {
    if ( $link['rel'] == 'self' ) {
        // Found <link rel="self">
        // assign to variable, return from function, etc
        // To access the attribute, we use $link['href']
        // To get the text content of the selected node,
        //   we cast to string with (string)$link['href']
        $self_link = (string)$link['href'];
    }
}

Using XPath allows you to search the whole document for elements with a particular name and attribute value using a compact expression:

  • //foo searches for all elements named <foo>, anywhere in the document
  • [bar] means "which has a child element named "bar"
  • [@bar] means instead "which has an attribute named "bar", which is what we want
  • [@bar="baz"] means the value of the "bar" attribute must be "baz"

So in our case, //link[@rel="self"].

In SimpleXML, you can call ->xpath() on any node, and get an array of zero or more objects. You will then want to loop through these, extracting the appropriate value:

$xpath_results = $xml->xpath('//link[@rel="self"]');
foreach ( $xpath_results as $node ) {
     // Again, we have a SimpleXMLElement object, and want 
     //    the string content of the 'href' attribute:
     $self_link = (string)$node['href'];
}
0
votes

Well you can use if/switch statement, for example.

foreach($xml->getElementsByTagName('link') as $tag) {
   switch($tag->getAttribute('rel')) {
      case 'self':
         $href_of_self = $tag->getAttribute('href');
         break;
      case 'related':
         ...
   }
}

Getting elements by tag and getting an element's attribute can be done via these methods: http://php.net/manual/en/domdocument.getelementsbytagname.php http://php.net/manual/en/domelement.getattribute.php

0
votes

You may use http://sabre.io/xml as describes itself as "An XML library for PHP you may not hate". Pay attention at function parseCurrentElement() https://github.com/fruux/sabre-xml/blob/master/lib/Reader.php

You can create your custom reader

class CustomXmlReader extends \Sabre\Xml\Reader {}
class CustomXmlService extends \Sabre\Xml\Service {}
-3
votes

If you are working with large files it may be an idea to split the file into lines and then process each line using preg_match. This obviously works best if your XML files have a similar structure.