2
votes

I'm working with PHP 5.3+, specifically simplexml_load_string(). I've tried searching for a solution for a few hours with not luck, so any help would be greatly appreciated.

I need to have a systematic way of identifying all the tag names present in an XML file on a certain level.

Example XML:

<?xml version="1.0"?>
<properties>
  <property>
    <ID>243</ID>
    <area>5,000</area>
    <bathrooms>5</bathrooms>
    <bedrooms>4</bedrooms>
    <images>
      <image>http://urltoimage.com/image1.jpg</image>
      <image>http://urltoimage.com/image2.jpg</image>
    </image>
  </property>
  <property>
    <ID>332</ID>
    <garage>2</garage>
    <bathrooms>2</bathrooms>    
    <images>
      <image>http://urltoimage.com/image5.jpg</image>
      <image>http://urltoimage.com/image1.jpg</image>
    </image>    
  </property>
<properties>

I need to be able to retrieve an array of:

  • ID
  • area
  • bathrooms
  • bedrooms
  • garage

As you can see the first 'property' element does not have a 'garage', so all child elements across the XML are aggregated. I need to be able to identify all the tag names present below the 'property' element, ideally excluding any elements that have children. I could work around exuding elements that have children ('images' in this example) - but would be nice to have XPath take care of that part as well.

The reason behind this - we're aggregating multiple XML feeds of property data that have different tag variables, and prior to importing we need to have an idea of all the different tag names used in the XML before we pass that data over to the rest of the program.

So, is there an XPath query that can be constructed? Performance is a factor and I'm not sure what the optimal configuration on PHP function is, so looking for suggestions.

2
Both solutions worked - thanks guys. I did some benchmarking to see which one is faster, and they are very close. With a small XML file Phil's method was faster (.008 vs 0.010 seconds). With larger XML files they were virtually identical.Andy

2 Answers

2
votes

Try something like this

$doc = simplexml_load_string($xml);
$nodes = $doc->xpath('//property/*[not(*)]');
$properties = array();
foreach ($nodes as $node) {
    $properties[$node->getName()] = true;
}
$properties = array_keys($properties);

Within the foreach loop, you could check to see if the value has already been entered but I figured the above would be faster.

1
votes

You'll want to use the SimpleXMLElement::children() function to find the children of a property.

Example:

<?php

$string = <<<END
<?xml version="1.0"?>
<properties>
  <property>
    <ID>243</ID>
    <area>5,000</area>
    <bathrooms>5</bathrooms>
    <bedrooms>4</bedrooms>
    <images>
      <image>http://urltoimage.com/image1.jpg</image>
      <image>http://urltoimage.com/image2.jpg</image>
    </images>
  </property>
  <property>
    <ID>332</ID>
    <garage>2</garage>
    <bathrooms>2</bathrooms>    
    <images>
      <image>http://urltoimage.com/image5.jpg</image>
      <image>http://urltoimage.com/image1.jpg</image>
    </images>    
  </property>
</properties>
END;

// Load the XML using the SimpleXML class.
$xml = simplexml_load_string($string);

// Loop through all of the properties.
foreach ( $xml->property as $property )
{
  // Reset the property tags array for this property.
  $property_tags = array();

  foreach ( $property->children() as $children )
  {
    // If a tag was found, add it to the array.
    if ( ! empty($children[0]) )
      $property_tags[] = $children[0]->getName();
  }

  // Output the list to the screen (this could be removed).
  print_r($property_tags);
}

Output:

Array
(
    [0] => ID
    [1] => area
    [2] => bathrooms
    [3] => bedrooms
    [4] => images
)
Array
(
    [0] => ID
    [1] => garage
    [2] => bathrooms
    [3] => images
)

If you'd rather get a list of all available tags (for all property contained in the XML document), simply do this:

// Loop through all of the properties.
foreach ( $xml->property as $property )
{
  foreach ( $property->children() as $children )
  {
    // If a tag was found, add it to the array if it's not already in it.
    if ( ! empty($children[0]) && ! in_array($children[0]->getName(), $property_tags) )
      $property_tags[] = $children[0]->getName();
  }
}

// Output the list to the screen (this could be removed).
print_r($property_tags);

Output:

Array
(
    [0] => ID
    [1] => area
    [2] => bathrooms
    [3] => bedrooms
    [4] => images
    [5] => garage
)