JREPL.BAT is a powerful regular expression find/replace utility that can easily and efficiently solve this problem. It is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward.
I could write a solution that always quotes the 7th column, but that has limited use. More powerful would be a solution that selectively quotes any column that contains a comma, regardless of position. Any column without a comma will remain unquoted.
jrepl "\| [^|,]*,[^|]*" ", \q$&\q" /t " " /x /f myFile.txt /o myFile.csv
The only other thing that could trip you up is if any of the columns already contain a quote. The CSV "standard" requires that any quote literal be escaped as ""
, and the column also be enclosed in quotes. The following will properly escape quote literals, and also enclose any column that contains comma or quote within quotes.
jrepl "\| [^|,]*[,\x22][^|]*" "',' '\x22'+$0.replace(/\x22/g,'\x22\x22')+'\x22'" /t " " /j /f myFile.txt /o myFile.out
One last thing that could be added would be to put the command within a batch script, and parameterize the delimiter, source file, and destination file. I've also added a help facility to the script.
delim2csv.bat
::
::delim2csv Delimiter InFile [OutFile]
::delim2csv /?
::
:: Convert a delimited text file into a CSV file, where
:: - columns containing comma or quote are quoted
:: - quote literals are doubled
:: - Delimiter characters are converted to commas
::
:: The OutFile is optional. The result will be written to stdout
:: if the OutFile is not specified. Use - for the OutFile to
:: overwrite the InFile with the result.
::
:: Remember that the delimiter is used in a regular expression,
:: so the character must be escaped if it is a regex meta character,
:: or encoded if it is difficult to represent on the command line.
:: Any extended ASCII character may be specified by using \xNN,
:: where NN is the hexidecimal representation of the character code.
:: Enclosing argument quotes will be removed before use in the regex.
::
:: Example Delimiters: pipe = "\|" or \x7C
:: tab = \t or \x09
::
:: If the first argument is /?, then this help documentation will
:: be written to stdout.
::
:: This script requires JREPL.BAT to function, available at:
:: http://www.dostips.com/forum/viewtopic.php?t=6044
::
@echo off
if "%~1" equ "/?" (
for /f "delims=: tokens=1*" %%A in ('findstr /n "^::" "%~f0"') do echo(%%B
exit /b
)
@call jrepl "%~1 [^%~1,]*[,\x22][^%~1]*"^
"',' '\x22'+$0.replace(/\x22/g,'\x22\x22')+'\x22'"^
/t " " /j /f %2 /o %3
So, using the above script, the solution would become:
delim2csv "\|" MyFile.txt MyFile.csv
EDIT 2017-02-19
Over at https://stackoverflow.com/a/42324094/1012053 I developed a small hybrid script called parseCSV.bat that is dedicated to transforming CSV data, and does not use regular expressions. It is over 11 times faster than the above solution that relies on JREPL.BAT. Regular expressions are powerful, convenient, and terse, but hand-constructed code is typically faster.
With parseCSV.bat, the solution becomes
parseCSV "/I:|" /L /Q:E <MyFile.txt >MyFile.csv
The only difference in the output is parseCSV quotes every column value, but delim2csv only quotes column values that contain a comma or a quote.
sep=|
as the first line, wich tells Excel, which character to use as field separator (this line will not be part of the Excel sheet) – Stephan