0
votes

I am currently trying to feed some XML documents into a script and initialize them as [PSCustomObject]. The XML document needs to be broken into several objects and then added together.

Here is my script:

[xml]$CourseStructureIn = Get-Content .\Sample.xml

$data = foreach ($Instances in $CourseStructureIn.Node.instances.courseInstance) {
    $instancearray = foreach ($instance in $instances) {
        $hash = [ordered]@{ CourseInstanceID = $instance.courseInstanceID}

        [PSCustomObject]@{
            CourseCode = $instance.CourseCode
            InstanceCode = $instance.instanceCode
            session = $instance.session
            quota = $instance.quota
        }
    }
    $hash.Add("Instances", $instancearray)

    $modules = $instance.L1Modules.L1Module
    $modulearray = foreach ($module in $modules) {
        [PSCustomObject]@{
            moduleCode = $module.moduleCode 
            moduleTypeCode = $module.moduleTypeCode
            moduleInstanceID = $module.moduleInstanceID
            semester = $module.semester
            credits = $module.credits
            overallGradeWeighting = $module.overallGradeWeighting
            fees = $module.fees
            documents = $module.documents
        }
    }
    $hash.Add("Modules", $modulearray)

    $roles = $instance.L1Modules.L1Module.roles.role
    $rolearray = foreach ($role in $roles) {
        [PSCustomObject]@{
            rolesGUID = $role.GUID
            rolesIDNumber = $role.idnumber
            roleFirstName = $role.firstname
            roleSurname = $role.surname
        }
    }
    $hash.Add("Roles", $rolearray)

This correctly imports the XML structure into 2 distinct instances of an array of objects - I should mention that the XML is originally from a normailzed database and so each XML document represents more or less a table - which ends up being a multidimensional array in PowerShell.

$data.GetType()

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     Object[]                                 System.Array

I can refer to the individual array objects

$data[0].roles | ft

rolesGUID rolesIDNumber roleFirstName  roleSurname
--------- ------------- -------------  -----------
55001420  55001420      R              M
55001414  55001414      S              C
55001234  55001234      C              H
55001342  55001342      O              C
55001414  55001414      S              C
55001342  55001342      O              C
55001445  55001445      M              M
55001422  55001422      A              H
55001001  55001001      P              M
55001079  55001079      V              S
55000770  55000770      A              M
55000906  55000906      M              B

I want to be able to ConverTo-Html to make a report - but I have no idea how to enumerate this type of structure - what I should be left with is a table with one to many relationships (Is that a multidimensional array or jagged array?) Can someone give me some pointers about how to output these types of structures? Iterating through an array or object is fine when the output is some type of matrix - but what do we use when the structure is multiple rows for some columns and single rows for others.

For example my output with Format-Table:

$data | ft

CourseInstanceID Instances                                                                                                                                                      Modules
---------------- ---------                                                                                                                                                      -------
PGDDA_353650876  @{CourseCode=PGDDA; InstanceCode=PGDSP; session=2014; quota=999; Instances=; CourseInstanceID=PGDDA_353650876; Modules=System.Object[]; Roles=System.Object[]} {@{moduleCode=H...
PGDDA_418403503  @{CourseCode=PGDDA; InstanceCode=PGDSP; session=2015; quota=999; Instances=; CourseInstanceID=PGDDA_418403503; Modules=System.Object[]; Roles=System.Object[]} {@{moduleCode=H...

I have tried expanding the properties and have been reading all over the web, so any pointers would be greatly appreciated.

Here are the members:

    $data | gm

       TypeName: System.Management.Automation.PSCustomObject

    Name             MemberType   Definition
    ----             ----------   ----------
    Equals           Method       bool Equals(System.Object obj)
    GetHashCode      Method       int GetHashCode()
    GetType          Method       type GetType()
    ToString         Method       string ToString()
    CourseInstanceID NoteProperty string CourseInstanceID=PGDDA_353650876
    Instances        NoteProperty Selected.System.Management.Automation.PSCustomObject Instances=@{CourseCode=PGDDA; InstanceCode=PGDSP; session=2014; quota=999; Instances=; CourseInstanceID=PGDD...
    Modules          NoteProperty Object[] Modules=System.Object[]
    Roles            NoteProperty Object[] Roles=System.Object[]
    

Thank you - I cannot share the XML document here but I will digress from my specific example to a general one.

Suppose we have many of the following XML documents

<catalog>
   <book id="bk101">
      <author>Gambardella, Matthew</author>
      <author>Tiny Tim</author>
      <title>XML Developer's Guide</title>
      <genre>Computer</genre>
      <price>$44.95</price>
      <price>€40.50</price>
      <price>£35.99</price>
      <publish_date>2000-10-01</publish_date>
      <description>An in-depth look at creating applications with XML.</description>
      <TableOfContents>
      <chapter Title = "Introduction" Number = "1" Page = "1"></chapter>
      <chapter Title = "XSD" Number = "2" Page = "14"></chapter>
      <chapter Title = "XPATH" Number = "3" Page = "27"></chapter>
      <chapter Title = "XQUERY" Number = "4" Page = "42"></chapter>
      <chapter Title = "XHTML" Number = "5" Page = "58"></chapter>
      <chapter Title = "XSLT" Number = "6" Page = "75"></chapter>
      </TableOfContents>
   </book> 
</catalog>

I would like to convert this into a table such as this format (image sorry)

sample table

I should also mention that there could be many book nodes in an document so I would like a table for each book.

1
Please update your answer with sample data, and an example of what your desired output would be.TheMadTechnician
Thanks, I have added a more general example to my questionSum1sAdmin
Can you use my answer here to accomplish what you need? It will need some changes to work with your data, but it's the same basic concept.TheMadTechnician
@TheMadTechnician - thanks I will try this tomorrowSum1sAdmin
@TheMadTechnician - I have tried you prior solution, which works very well. Only - my object has over 30 properties. Wish there was a better way of doing this!. thanks.Sum1sAdmin

1 Answers

1
votes

Ok, so it's been over a week, but I was out of the office. I think I have a solution to your issue. I wrote up a function that recursively calls itself to flatten your XML arrays. Since arrays could potentially contain columns with the same name I flatten them to the naming context of Array1.Column1, Array1.Column2. For a better reference, in your example the chapter array would be flattened to columns chapter.Title, chapter.Number, and chapter.Page.

So, first I load up your example XML:

[xml]$Data = @'
<catalog>
    <book id="bk101">
        <author>Gambardella, Matthew</author>
        <author>Tiny Tim</author>
        <title>XML Developer's Guide</title>
        <genre>Computer</genre>
        <price>$44.95</price>
        <price>€40.50</price>
        <price>£35.99</price>
        <publish_date>2000-10-01</publish_date>
        <description>An in-depth look at creating applications with XML.</description>
        <TableOfContents>
        <chapter Title = "Introduction" Number = "1" Page = "1"></chapter>
        <chapter Title = "XSD" Number = "2" Page = "14"></chapter>
        <chapter Title = "XPATH" Number = "3" Page = "27"></chapter>
        <chapter Title = "XQUERY" Number = "4" Page = "42"></chapter>
        <chapter Title = "XHTML" Number = "5" Page = "58"></chapter>
        <chapter Title = "XSLT" Number = "6" Page = "75"></chapter>
        </TableOfContents>
    </book> 
</catalog>
'@

Now that I have that to work with, I made a function that would recursively flatten out any arrays, and looks through any objects that it finds to see if they contain arrays:

Function Flatten-SubNode([object]$NodeIn){
    $Node = $NodeIn.Clone()
    $XMLExcludes = $Node.psobject.Methods.name|where({$_ -like 'get_*'})|%{$_.Substring(4)}
    [array]$NodeProps = $Node.psobject.properties.name|?{$_ -notin $XMLExcludes}
    $FlattenedNode = New-Object PSObject
    $FlattenedProps = @()
    ForEach($Property in $NodeProps) 
    {
        Switch($Node.$Property){
            {$_ -is [string]} {If($Property -notin $FlattenedProps){Add-Member -InputObject $FlattenedNode -NotePropertyName $Property -NotePropertyValue @();$FlattenedProps += $Property};$FlattenedNode.$Property += $_; continue}
            {$_ -is [array]}  {
                Switch($_){
                    {$_ -is [string]}{[array]$Strings += $_;continue}
                    {$_ -isnot [array]}{
                        $SubItem=$_
                        ForEach($SubProp in ($_.PSObject.Properties.Name|Where{$_ -notin $XMLExcludes})){
                            If("$Property.$SubProp" -notin $FlattenedProps){
                                Add-Member -InputObject $FlattenedNode -NotePropertyName "$Property.$SubProp" -NotePropertyValue @()
                                $FlattenedProps += "$Property.$SubProp"
                            }
                            $FlattenedNode."$Property.$SubProp" += $SubItem.$SubProp
                        }
                        Continue}
                    default {
                        Flatten-SubNode $_|%{
                            $SubItem=$_
                            ForEach($SubProp in ($_.PSObject.Properties.Name|Where{$_ -notin $XMLExcludes})){
                                If("$Property.$SubProp" -notin $FlattenedProps){
                                    Add-Member -InputObject $FlattenedNode -NotePropertyName "$Property.$SubProp" -NotePropertyValue @()
                                    $FlattenedProps += "$Property.$SubProp"
                                }
                                $FlattenedNode."$Property.$SubProp" += $SubItem.$SubProp
                            }
                        }
                    }
                }
                If($Strings){$Node.$Property = $Strings}Else{$Node = $Node | Select * -ExcludeProperty $Property}

            }
            default {
                $SubItem=Flatten-SubNode $_
                ForEach($SubProp in ($SubItem.PSObject.Properties.Name|Where{$_ -notin $XMLExcludes})){
                    If("$Property.$SubProp" -notin $FlattenedProps){
                        Add-Member -InputObject $FlattenedNode -NotePropertyName "$Property.$SubProp" -NotePropertyValue @()
                        $FlattenedProps += "$Property.$SubProp"
                    }
                    $FlattenedNode."$Property.$SubProp" += $SubItem.$SubProp
                }
            }
        }
    }
    $FlattenedNode
}

What that puts out is a single object with properties that are arrays of records for any given column. Then it counts how many records any given column has, and takes the largest number. It then creates that many objects, starting at the top, with all potential columns as properties. The first one will always have a value for each property. As it moves through more properties will end up not having a value as arrays are exhausted. For example, only the first object will have a ID value for the book, since the book only has 1 ID, while the first 2 objects will have a value in the Author column, and all objects will have values in the chapter.Title, chapter.Number, and chapter.Page properties, since that array has the most records.

ForEach($Book in $Data.catalog.book){
    $FlattenedBook = Flatten-SubNode $Data.catalog.book
    $Rows=$Flattenedbook.psobject.properties.name|%{$FlattenedBook.$_.count}|sort -Descending|Select -first 1
    $Results=For($i=0;$i -le $Rows;$i++){
        $RowObj = New-Object PSObject
        $Flattenedbook.psobject.properties.name|%{
            add-member -InputObject $RowObj -NotePropertyName $_ -NotePropertyValue $Flattenedbook.$_[$i]}
        $RowObj
    }
    $Results | ConvertTo-Html -Property * -as Table -Fragment |Set-Content C:\Temp\$($Book.id).htm
}

I set this to output a file based on the book's ID number in my C:\temp folder. I think it worked out fairly well, as it output a file named C:\temp\bk101.htm with just a table (run the snippet to see the resulting table):

<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/><col/><col/></colgroup>
<tr><th>id</th><th>author</th><th>title</th><th>genre</th><th>price</th><th>publish_date</th><th>description</th><th>TableOfContents.chapter.Title</th><th>TableOfContents.chapter.Number</th><th>TableOfContents.chapter.Page</th></tr>
<tr><td>bk101</td><td>Gambardella, Matthew</td><td>XML Developer&#39;s Guide</td><td>Computer</td><td>$44.95</td><td>2000-10-01</td><td>An in-depth look at creating applications with XML.</td><td>Introduction</td><td>1</td><td>1</td></tr>
<tr><td></td><td>Tiny Tim</td><td></td><td></td><td>Ç40.50</td><td></td><td></td><td>XSD</td><td>2</td><td>14</td></tr>
<tr><td></td><td></td><td></td><td></td><td>&#163;35.99</td><td></td><td></td><td>XPATH</td><td>3</td><td>27</td></tr>
<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>XQUERY</td><td>4</td><td>42</td></tr>
<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>XHTML</td><td>5</td><td>58</td></tr>
<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td>XSLT</td><td>6</td><td>75</td></tr>
<tr><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td><td></td></tr>
</table>