0
votes

I am trying to have this script automatically convert a text file to CSV, but it is delimited by the pipe character "|". What I have below is in my batch file, it replaces the pipes with commas, the only issue I have is, in the 7th column, the strings contain commas, so when I open the CSV file in Excel, it treats those commas as a column and messes up the format of the columns. Is there a way to add a text qualifier to the 7th column only? The text qualifier being quotes around the string.

Example

Value1|Value2,AndSome|Value3

Have Script convert text file to:

Value1,"Value2,AndSome",Value3

@echo off
setLocal enableDELAYedexpansion
for /f "tokens=* delims=^|" %%a in (myFile.txt) do (
set str=%%a
echo !str:^|=,! >> myFile.csv
)
2
if your only problem is to correctly open the file in Excel, you could just add sep=| as the first line, wich tells Excel, which character to use as field separator (this line will not be part of the Excel sheet)Stephan

2 Answers

2
votes
@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
SET "sourcedir=U:\sourcedir"
SET "filename1=%sourcedir%\q35002863.txt"
FOR /f "usebackqtokens=1-7*delims=^|" %%a IN ("%filename1%") DO (
 SET "C8=%%h"
 ECHO(%%a,%%b,%%c,%%d,%%e,%%f,"%%g",!C8:^|=,!
)

GOTO :EOF

This should solve your problem.

2
votes

JREPL.BAT is a powerful regular expression find/replace utility that can easily and efficiently solve this problem. It is pure script (hybrid JScript/batch) that runs natively on any Windows machine from XP onward.

I could write a solution that always quotes the 7th column, but that has limited use. More powerful would be a solution that selectively quotes any column that contains a comma, regardless of position. Any column without a comma will remain unquoted.

jrepl "\| [^|,]*,[^|]*" ", \q$&\q" /t " " /x /f myFile.txt /o myFile.csv

The only other thing that could trip you up is if any of the columns already contain a quote. The CSV "standard" requires that any quote literal be escaped as "", and the column also be enclosed in quotes. The following will properly escape quote literals, and also enclose any column that contains comma or quote within quotes.

jrepl "\| [^|,]*[,\x22][^|]*" "',' '\x22'+$0.replace(/\x22/g,'\x22\x22')+'\x22'" /t " " /j /f myFile.txt /o myFile.out

One last thing that could be added would be to put the command within a batch script, and parameterize the delimiter, source file, and destination file. I've also added a help facility to the script.

delim2csv.bat

::
::delim2csv  Delimiter  InFile  [OutFile]
::delim2csv  /?
::
::  Convert a delimited text file into a CSV file, where
::    - columns containing comma or quote are quoted
::    - quote literals are doubled
::    - Delimiter characters are converted to commas
::
::  The OutFile is optional. The result will be written to stdout
::  if the OutFile is not specified. Use - for the OutFile to
::  overwrite the InFile with the result.
::
::  Remember that the delimiter is used in a regular expression,
::  so the character must be escaped if it is a regex meta character,
::  or encoded if it is difficult to represent on the command line.
::  Any extended ASCII character may be specified by using \xNN,
::  where NN is the hexidecimal representation of the character code.
::  Enclosing argument quotes will be removed before use in the regex.
::
::  Example Delimiters: pipe =  "\|"  or  \x7C
::                      tab  =   \t   or  \x09
::
::  If the first argument is /?, then this help documentation will
::  be written to stdout.
::
::  This script requires JREPL.BAT to function, available at:
::  http://www.dostips.com/forum/viewtopic.php?t=6044
::
@echo off
if "%~1" equ "/?" (
  for /f "delims=: tokens=1*" %%A in ('findstr /n "^::" "%~f0"') do echo(%%B
  exit /b
)
@call jrepl "%~1 [^%~1,]*[,\x22][^%~1]*"^
            "',' '\x22'+$0.replace(/\x22/g,'\x22\x22')+'\x22'"^
            /t " " /j /f %2 /o %3

So, using the above script, the solution would become:

delim2csv "\|" MyFile.txt MyFile.csv

EDIT 2017-02-19

Over at https://stackoverflow.com/a/42324094/1012053 I developed a small hybrid script called parseCSV.bat that is dedicated to transforming CSV data, and does not use regular expressions. It is over 11 times faster than the above solution that relies on JREPL.BAT. Regular expressions are powerful, convenient, and terse, but hand-constructed code is typically faster.

With parseCSV.bat, the solution becomes

parseCSV "/I:|" /L /Q:E <MyFile.txt >MyFile.csv

The only difference in the output is parseCSV quotes every column value, but delim2csv only quotes column values that contain a comma or a quote.