Update: In PowerShell [Core] v7+, a much simpler solution is possible: see this answer.
The existing answers work well with the sample input:
- Wiktor Stribiżew's helpful answer, which identifies double-quoted fields that do not contain
,
using a regex, loads the entire input file into memory first, which enables replacing the input file with the results in a single pipeline.
While this is convenient - and faster than row-by-row processing - the caveat is that it may not be an option for large input files.
- markg's helpful answer, which splits rows into fields by
"
chars., is an alternative for large input files, because it uses the pipeline to process the input rows one by one.
(As a consequence, the input file cannot be directly updated with the result.)
If we generalize the OP's requirement to also handle fields with embedded "
chars., we need a different approach:
The following fields must then retain their enclosing double quotes:
- (of necessity) double-quoted fields with embedded
,
chars.; e.g.,
"1234 Main St, New York, NY"
- (of necessity) double-quoted fields with embedded
"
chars., which, per RFC 4180 must be escaped as ""
, i.e., doubled; e.g.,
"Nat ""King"" Cole"
Note:
- We are not dealing with fields that may contain embedded line breaks, as that would require a fundamentally different approach, because self-contained line-by-line processing is then no longer possible.
- Tip of the hat to Wiktor Stribiżew, who came up with the regex to robustly match a double-quoted field with an arbitrary number of embedded double quotes, escaped as ""
: "([^"]*(?:""[^"]*)*)"
@'
bob,"1234 Main St, New York, NY","cool guy"
nat,"Nat ""King"" Cole Lane","cool singer"
nat2,"Nat ""King"" Cole Lane, NY","cool singer"
'@ | Set-Content ./test.csv
Get-Content ./test.csv | ForEach-Object {
([regex] '"([^"]*(?:""[^"]*)*)"').Replace($_, { param($match)
$fieldContent = $match.Groups[1]
if ($fieldContent -match '[,"]') { $match } else { $fieldContent }
})
}
This yields:
bob,"1234 Main St, New York, NY",cool guy
nat,"Nat ""King"" Cole Lane",cool singer
nat2,"Nat ""King"" Cole Lane, NY",cool singer
Updating the input file:
As in markg's answer, due to line-by-line processing, you cannot directly update the input file with the output in the same pipeline.
To update the iput file later, use a temporary output file and then replace the input file with it (...
represents the Get-Content
pipeline from above, only with $csvFile
instead of ./test.csv
):
$csvfile = 'c:\path\to\some.csv'
$tmpFile = $env:TEMP\tmp.$PID.csv
... | Set-Content $tmpFile
if ($?) { Move-Item -Force $tmpFile $csvFile }
Note that Set-Content
uses your system's single-byte, extended-ASCII character encoding by default (even though the help topic falsely states ASCII).
Using the -Encoding
parameter allows you specify a different encoding, but note that UTF-16LE, which is the default for Out-File
/ >
, causes the CSV file not to be recognized properly by Excel, for instance.