0
votes

I have a CSV file (one column/field only) with thousands of records in it.

I need a way in Powershell to search for a value using a few characters followed by a wildcard and, where found, then replace that value with a ".

I have searched around on how to do this but everyting I have found so far either doesn't cover CSV files or doesn't explain how I might be able to do the search using a wildcard.

Example of values in CSV file:

<#

RanDom.Texto 1.yellow [ Table - wood ] "gibberishcode1.moreRandomText11.xyz123+456"
[email protected] 2.blue [Chair - steel ] "gibberishcode2.moreRandomText222.xyz19283+4567+89
randomi.Textpel 3.green [ counter - granite] "gibberishcode3.moreRandomText3333.xyz17243+3210+987+654"

#>

You will note above that the only values in common across the records are the .xyz in each record.

I want to replace the .xyz (and everything that follows) with a " value.

E.g. Desired result as follows:

<#

RanDom.Texto 1.yellow [ Table - wood ] "gibberishcode1.moreRandomText11"
[email protected] 2.blue [Chair - steel ] "gibberishcode2.moreRandomText222"
Randomi.Textpel 3.green [ counter - granite] "gibberishcode3.moreRandomText3333"

#>

Here is some code I tried but it doesn't work in that it didn't replace the values (but it does successfuly export to a new csv file).

# Create function that gets the current file path (of where this script is located)
function Get-ScriptDirectory {Split-Path -parent $PSCommandPath}

# Create function that gets the current date and time in format of 1990-07-01_19h15m59
function Get-TimeStamp {return "{0:yyyy-MM-dd}_{0:HH}h{0:mm}m{0:ss}" -f (Get-Date)}

# Set current file path. Also used in both FOR loops below as primary source directory.
${sourceDirPath} = Get-ScriptDirectory

# Import CSV look-up file 
${csvFile} = (Import-Csv -Path ${sourceDirPath}\SourceCSVFile.csv)
    
# for each row, replace the values of .xyz and all that follows with "
foreach(${row} in ${csvFile}) 
{
    ${row} = ${row} -replace '.xyz*','"'
}

# Set modified CSV's name and path
${newCSVFile} = ${sourceDirPath} + '\' + $(Get-TimeStamp) + '_SourceCSVFile_Modified.csv'

# export the modified CSV
${csvFile} | Export-Csv ${newCSVFile} -NoTypeInformation

I also tried this as an alternative but no luck either (i think this code below may only work for .txt files??) ...

((Get-Content -path C:\TEMP\TEST\SourceCSVFile.csv -Raw) -replace '.xyz'*,'"') | Export-Csv -Path C:\TEMP\TEST\ReplacementFile.csv

I'm new to Powershell and don't have a proper understanding of regex yet so please be gentle.

UPDATE and SOLUTION:

For those that are interested in my final solution ... I used the code provided by Thomas (Thank you!!) however my .csv file was left with some records that had a triple quote """ value at the end of the string.

As such I modified the code to use variables and execute a second pass of cleaning by replacing all triple quotation (e.g. """) values with a single quote value (e.g. ") and then piping the result to file.

# Create function that gets the current file path (of where this script is located and running from)
function Get-ScriptDirectory {Split-Path -parent $PSCommandPath}

# Set current file path
${sourceDirPath} = Get-ScriptDirectory

# Assign source .csv file name to variable
$origNameSource = 'AllNames.csv'

# Assign desired .csv file name post cleaning
$origNameCLEAN = 'AllNames_CLEAN.csv'
    
# First pass clean to replace .xyz* with " and assign result to tempCsvText variable
${tempCsvText} = ((Get-Content -Path ${sourceDirPath}\$origNameSource) | % {$_ -replace '\.xyz.*$', '"'})

# Second pass clean to replace """ with " and write result to a new .csv file
${tempCsvText} -replace '"""', '"' | Set-Content -Path ${sourceDirPath}\$origNameCLEAN

# Import records from new .csv file and remove duplicates by using Sort-Object * -Unique
${csvFile} = (Import-Csv -Path ${sourceDirPath}\$origNameCLEAN) | Sort-Object * -Unique
1
Please show us some of the lines in your input file exactly as they are. Now it looks as if the lines can have newline breaks in them. Don't use a bulleted list as format, but embed the lines with <pre> and </pre> tags, so they show up unaltered. - Theo
I can confirm that the records do not have newline breaks in them. P.S. I tried to embed those lines with <pre> and </pre> but i get an error message when trying to submit. - JoeJoe99

1 Answers

1
votes

First, a .csv file is nothing else than a regular text file, just following some rules on how content is embedded (one line for each row, columns delimited by a defined ASCII character, optional header). Your last line is close. You have to use a regular expression, that reaches until the end of a line. This will do it:

Get-Content -Path C:\TEMP\TEST\SourceCSVFile.csv | % {$_ -replace '\.xyz.*$', '"'} | Set-Content -Path C:\TEMP\TEST\ReplacementFile.csv

Differences:

  • I removed the -Raw parameter to get each line as one string.
  • I used the pipe to process each string (line)
  • I adjusted your regex to match from .xyz until the end of each line
  • I piped the result to Set-Content as I only did text replacement and did not read any objects that would then have to be retranslated back to csv text by Export-Csv