I have been using happily gawk with FPAT. Here's the script I use for my examples:
#!/usr/bin/gawk -f
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")"
}
{
for (i=1; i<=NF; i++) {
printf "Record #%s, field #%s: %s\n", NR, i, $i
}
}
Simple, no quotes
Works well.
$ echo 'a,b,c,d' | ./test.awk
Record #1, field #1: a
Record #1, field #2: b
Record #1, field #3: c
Record #1, field #4: d
With quotes
Works well.
$ echo '"a","b",c,d' | ./test.awk
Record #1, field #1: "a"
Record #1, field #2: "b"
Record #1, field #3: c
Record #1, field #4: d
With empty columns and quotes
Works well.
$ echo '"a","b",,d' | ./test.awk
Record #1, field #1: "a"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
With escaped quotes, empty columns and quotes
Works well.
$ echo '"""a"": aaa","b",,d' | ./test.awk
Record #1, field #1: """a"": aaa"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
With a column containing escaped quotes and ending with a comma
Fails.
$ echo '"""a"": aaa,","b",,d' | ./test.awk
Record #1, field #1: """a"": aaa
Record #1, field #2: ","
Record #1, field #3: b"
Record #1, field #4:
Record #1, field #5: d
Expected output:
$ echo '"""a"": aaa,","b",,d' | ./test_that_would_be_working.awk
Record #1, field #1: """a"": aaa,"
Record #1, field #2: "b"
Record #1, field #4:
Record #1, field #5: d
Is there a regex for FPAT that would make this work, or is this just not supported by awk?
The pattern would be "
followed by anything but a single "
. The regex class search works one character at a time so it can't not match a ""
.
I think there may be an option with lookaround, but I'm not good enough with it to make it work.
|
as a field separator:a||"b" b,|c
– jaspython
tag without mentioningawk
. And let me know – RomanPerekhrest"""b"" b"
needs further parsing (why?) and third one says it would be ok. – thanasisp