3
votes

I am very new to awk and thought of trying with a simple exercise of splitting a file based on a pattern. Please note:

  1. My file is a notepad file .txt (with CRLF format).
  2. File has exaclty the below content (there is no blank line in the input file at the beginning)

string file1
line1
line2
line3
string file2
line1
line2
line3
string file3
line1
line2
line3

  1. What am I trying to achieve (want to use only awk at this point of time)?
    Split file as soon as I find expression "string" and excluding it. So, my output would be like

"file1" containing only
line1
line2
line3
"file2" containing only
line1
line2
line3

and so on....Below is what I tried...but it leaves a newline at the end of each file and in the beginning of each file in case A and B respectively.

CASE A:

BEGIN {RS="\r\n";FS=" ";ORS="\r\n"}  
/string/ { fname = $2; next } { print > fname".txt"}

CASE B:

BEGIN {RS="\r\n"; FS=" "; ORS=""}
/string/ { if (NR>2) print prev_line>fname".txt"; fname=$2; next} {print (prev_line="") ? $0 : "\r\n" $0 > fname".txt"; prev_line=$0}

Can someone provide me some better methods/hint at modifying the above awk script?

Thanks.

2
Can't reproduce with GNU Awk 4.0.1 "Case A" works as expected. Output even retains the CRLF format, with no extra linesuser000001
Are you running this on cygwin? If not, what platform?Ed Morton
@Ed Morton I am running this on a VM centos but accessing files on my mounted windows 8.Jai
@user000001..Case A would add a newline at the last of every file because every print stmt adds a newline ..if I am not wrong...thts wht is happeing...i dont think version of awk should matter here..Jai
@Jai I can confirm @user000001: If I copy your input, to a text file using CRLF and copy-paste your CASE A code it produces three files with 3 lines each, CRLF and no empty line in the beginning or end (as per vi). However runnign od -c on the files I can confirm that they do contain a final \r\n and if you want to avoid this, the solution you gave in your answer seems to be the way to go.mschilli

2 Answers

0
votes

Thanks everyone for all the inputs. I was able to solve the problem by using below code.

BEGIN {RS="\r\n"; FS=" "; ORS=""}  
/string/ { fname=$2; ctr=1; next } { if (ctr==1) {print $0>fname".txt";ctr=0} else {print "\r\n" $0>fname".txt";next} }

However, if someone finds even a better way of doing it, please do post it !

0
votes

The best I can come up with (similar to your answer) is the following:

awk -v RS='\r\n' '{if(/string/){of=$2".txt";getline}else printf RS>of}{printf $0>of}'