1
votes

I'm trying to write a Powershell script which will take a several very long space-separated files and export some columns to similarly-named CSV files.

I do have a successful version:

Foreach ($file in $files) {
    $WriteString=""
    $outfile = $path + "\" + ($file -replace ".{4}$") + ".csv"  

    Get-Content -Path $path"\"$file | Select-Object -Skip $lines | ForEach-Object{
        $ValueArray = ($_ -split "\s+")
        $WriteString += $ValueArray[1] + "," + $ValueArray[2] + "," + $ValueArray[3] + "`n"
    } 

    Add-Content -Path $outfile -Value $Writestring
 }

This works, but is extremely slow - it takes over 16 hours for the script to fully run. The main cause (I think) is adding to the string. I've tried improving this using a hashtable:

Foreach ($file in $files) {
    $outfile = $path + "\" + ($file -replace ".{4}$") + ".csv"

    $ParseLines = Get-Content -Path $path"\"$file | Select-Object -Skip $lines 

    $OutputData = ForEach ($Line in $ParseLines) {
        $ValueArray = ($Line -split "\s+")
        $Line | Select-Object $ValueArray[1], $ValueArray[2], $ValueArray[3]
    } 

   $OutputData | Export-CSV -Path $outfile #-NoTypeInformation
 }    

However, this is only exporting one line of the hashtable:

#TYPE Selected.System.String
"636050.000","7429825.000","77.438"
,,
,,
,,
,,
,,
,,

If I change the last line to:

Set-Content -Path $outfile -Value $OutputData

then the output becomes:

@{636050.000=; 7429825.000=; 77.438=}
@{636075.000=; 7429825.000=; 75.476=}
@{636100.000=; 7429825.000=; 74.374=}
@{636125.000=; 7429825.000=; 73.087=}
@{636150.000=; 7429825.000=; 71.783=}
@{636175.000=; 7429825.000=; 70.472=}

I'm clearly doing something wrong with either the hashtable or Export-CSV, but I can't figure it out. Any help will be greatly appreciated.

As requested below, here's part of one source file. I cut out all non-data rows, and don't include headers in my output CSV, as the input program (that the CSV files go into) doesn't require them, and the outputs are self-evident (Not much chance of getting the X, Y and Z values wrong just by looking at the data).

*
* DEFINITION
*   HEADER_VARIABLES 3
*     QUALITIES        C  16   0 key
*     DATE             C  12   0
*     TIME             C  12   0
*   VARIABLES 4
*     X                F  12   3
*     Y                F  12   3
*     Z                F  12   3
*     gcmaq0.drg       F  12   3
*
*        1         2         3         4
*23456789012345678901234567890123456789012345678
*         X|          Y|          Z| gcmaq0.drg|
*
* HEADER:QUALITIES       29Aug2018   13:53:16    
  636575.000 7429800.000      75.551      75.551
  636600.000 7429800.000      77.358      77.358
  636625.000 7429800.000      78.823      78.823
  636650.000 7429800.000      80.333      80.333
  636675.000 7429800.000      82.264      82.264
  636700.000 7429800.000      84.573      84.573
  636725.000 7429800.000      87.447      87.447
3
Can you share (a part) of your input file (including the header) and add it to the question? - iRon

3 Answers

3
votes

Avoid slow operations like appending to strings (or arrays) in a loop. Change this:

Get-Content -Path $path"\"$file |
    Select-Object -Skip $lines |
    ForEach-Object {
        $ValueArray = ($_ -split "\s+")
        $WriteString += $ValueArray[1] + "," + $ValueArray[2] + "," + $ValueArray[3] + "`n"
    }

Add-Content -Path $outfile -Value $Writestring

into this:

Get-Content -Path "${path}\${file}" |
    Select-Object -Skip $lines |
    ForEach-Object {
        ($_ -split "\s+")[1..3] -join ','
    } |
    Set-Content -Path $outfile

Replace Set-Content with Add-Content if you actually want to append to an existing file.

1
votes

Export-Csv works with objects. It expects properties and values - what you're providing (judging from the Set-Content results) is hashtable with keys only.

One way around this is to create an object and increment values from each line.

Foreach ($file in $files) {

    $outfile    = $path + "\" + ($file -replace ".{4}$") + ".csv"
    $ParseLines = Get-Content -Path $path"\"$file | Select-Object -Skip $lines 

    ForEach ($Line in $ParseLines) {

        $ValueArray = ($Line -split "\s+")

        [array]$OutputData += [pscustomobject]@{
            header1 = $ValueArray[1]
            header2 = $ValueArray[2]
            header3 = $ValueArray[3]
        }

    } 

   $OutputData | Export-CSV -Path $outfile #-NoTypeInformation

}

Not sure if this is the optimal way if you have very large files - am sure a regex guru can come up with something more efficient.

0
votes

The solution above by Ansgar Wiechers worked best, but I also found a second way of doing it at this SO question. It uses a ArrayList to store the hashtable, then writes the ArrayList. This method is almost, but not quite as fast as Ansgar's solution. (About 10x faster than string method, vs 12x for regex method)

Foreach ($file in $files) {
    [System.Collections.ArrayList]$collection = New-Object System.Collections.ArrayList($null)
    $outfile = $path + "\" + ($file -replace ".{4}$") + ".csv" 

    $ParseLines = Get-Content -Path $path"\"$file | Select-Object -Skip $lines 

    $OutputData =@{}
    ForEach ($Line in $ParseLines) {
        $ValueArray = ($Line -split "\s+")
        $OutputData.Easting = $ValueArray[1]
        $OutputData.Northing = $ValueArray[2]
        $OutputData.ZValue = $ValueArray[3]

        $collection.Add((New-Object PSObject -Property $OutputData)) | Out-Null
    } 

    $collection | Export-CSV -Path $outfile -NoTypeInformation
 }