1
votes

I have a csv where all fields are enclosed by "". There are occasionally badly formatted lines of the form

Field1,Field2,Field3,Field4"

with a variable number of fields. I need to delete these specific lines, avoiding all lines of the form

"Field1","Field2","Field3","Field4"

4
this should match the single hanging quote at the end /^[^"]*"$/ - karakfa
Thanks. I literally just found it as well. If you post the answer I'll accept yours. - mas
No need, you can accept your own answer. However, think whether a single " is acceptable on a record. You may want to change + to *... - karakfa

4 Answers

2
votes

You can just look for the following regex to match your line

^[^"]*"$

Demo

This will look and match all the characters and match the " at the end of the line also.

If the first " is not at the end of the line, the line will not be matched.

1
votes

If the field content do not contain escaped quotes, you can test the
line for quote evenness.

If this matches, then delete the line:

^(?![^"]*(?:"[^"]*"[^"]*)*$).+$

This can be adapted to account for escaped quotes as well.
Requires a bit more complex regex.

1
votes

This might work for you (GNU sed):

 sed '/^\([^"]*\("[^"]*"\)*\)*$/!d' file

Delete the line unless it contains zero or more pairs of double quotes.

0
votes

Found the answer.

Using extended regex:

'/^[^"]+"$/'