Powershell - How to search (using wildcard) and replace values in a CSV file?

Question

I have a CSV file (one column/field only) with thousands of records in it.

I need a way in Powershell to search for a value using a few characters followed by a wildcard and, where found, then replace that value with a ".

I have searched around on how to do this but everyting I have found so far either doesn't cover CSV files or doesn't explain how I might be able to do the search using a wildcard.

Example of values in CSV file:

<#

RanDom.Texto 1.yellow [ Table - wood ] "gibberishcode1.moreRandomText11.xyz123+456"
[email protected] 2.blue [Chair - steel ] "gibberishcode2.moreRandomText222.xyz19283+4567+89
randomi.Textpel 3.green [ counter - granite] "gibberishcode3.moreRandomText3333.xyz17243+3210+987+654"

#>

You will note above that the only values in common across the records are the .xyz in each record.

I want to replace the .xyz (and everything that follows) with a " value.

E.g. Desired result as follows:

<#

RanDom.Texto 1.yellow [ Table - wood ] "gibberishcode1.moreRandomText11"
[email protected] 2.blue [Chair - steel ] "gibberishcode2.moreRandomText222"
Randomi.Textpel 3.green [ counter - granite] "gibberishcode3.moreRandomText3333"

#>

Here is some code I tried but it doesn't work in that it didn't replace the values (but it does successfuly export to a new csv file).

# Create function that gets the current file path (of where this script is located)
function Get-ScriptDirectory {Split-Path -parent $PSCommandPath}

# Create function that gets the current date and time in format of 1990-07-01_19h15m59
function Get-TimeStamp {return "{0:yyyy-MM-dd}_{0:HH}h{0:mm}m{0:ss}" -f (Get-Date)}

# Set current file path. Also used in both FOR loops below as primary source directory.
${sourceDirPath} = Get-ScriptDirectory

# Import CSV look-up file 
${csvFile} = (Import-Csv -Path ${sourceDirPath}\SourceCSVFile.csv)
    
# for each row, replace the values of .xyz and all that follows with "
foreach(${row} in ${csvFile}) 
{
    ${row} = ${row} -replace '.xyz*','"'
}

# Set modified CSV's name and path
${newCSVFile} = ${sourceDirPath} + '\' + $(Get-TimeStamp) + '_SourceCSVFile_Modified.csv'

# export the modified CSV
${csvFile} | Export-Csv ${newCSVFile} -NoTypeInformation

I also tried this as an alternative but no luck either (i think this code below may only work for .txt files??) ...

((Get-Content -path C:\TEMP\TEST\SourceCSVFile.csv -Raw) -replace '.xyz'*,'"') | Export-Csv -Path C:\TEMP\TEST\ReplacementFile.csv

I'm new to Powershell and don't have a proper understanding of regex yet so please be gentle.

UPDATE and SOLUTION:

For those that are interested in my final solution ... I used the code provided by Thomas (Thank you!!) however my .csv file was left with some records that had a triple quote """ value at the end of the string.

As such I modified the code to use variables and execute a second pass of cleaning by replacing all triple quotation (e.g. """) values with a single quote value (e.g. ") and then piping the result to file.

# Create function that gets the current file path (of where this script is located and running from)
function Get-ScriptDirectory {Split-Path -parent $PSCommandPath}

# Set current file path
${sourceDirPath} = Get-ScriptDirectory

# Assign source .csv file name to variable
$origNameSource = 'AllNames.csv'

# Assign desired .csv file name post cleaning
$origNameCLEAN = 'AllNames_CLEAN.csv'
    
# First pass clean to replace .xyz* with " and assign result to tempCsvText variable
${tempCsvText} = ((Get-Content -Path ${sourceDirPath}\$origNameSource) | % {$_ -replace '\.xyz.*$', '"'})

# Second pass clean to replace """ with " and write result to a new .csv file
${tempCsvText} -replace '"""', '"' | Set-Content -Path ${sourceDirPath}\$origNameCLEAN

# Import records from new .csv file and remove duplicates by using Sort-Object * -Unique
${csvFile} = (Import-Csv -Path ${sourceDirPath}\$origNameCLEAN) | Sort-Object * -Unique

Please show us some of the lines in your input file exactly as they are. Now it looks as if the lines can have newline breaks in them. Don't use a bulleted list as format, but embed the lines with <pre> and </pre> tags, so they show up unaltered. — Theo
I can confirm that the records do not have newline breaks in them. P.S. I tried to embed those lines with <pre> and </pre> but i get an error message when trying to submit. — JoeJoe99

stackprotector stackprotector · Accepted Answer · 2020-07-12T09:59:37

First, a .csv file is nothing else than a regular text file, just following some rules on how content is embedded (one line for each row, columns delimited by a defined ASCII character, optional header). Your last line is close. You have to use a regular expression, that reaches until the end of a line. This will do it:

Get-Content -Path C:\TEMP\TEST\SourceCSVFile.csv | % {$_ -replace '\.xyz.*$', '"'} | Set-Content -Path C:\TEMP\TEST\ReplacementFile.csv

Differences:

I removed the -Raw parameter to get each line as one string.
I used the pipe to process each string (line)
I adjusted your regex to match from .xyz until the end of each line
I piped the result to Set-Content as I only did text replacement and did not read any objects that would then have to be retranslated back to csv text by Export-Csv

Powershell - How to search (using wildcard) and replace values in a CSV file?

1 Answers