1
votes

I would like to search through range of lines in a file between the lines that begins with start and ends with End and replace the newlines with colon. I need this to be done in SED or AWK.

Example file:

start
a
b
c
End
Start
a
b
c
End
Start
x
y
z
End

Expected Output:

a:b:c
a:b:c
x:y:z
6
sed is an excellent too for simple substitutions on a single line. Period. Do not even consider sed for anything else, that's what awk is for. - Ed Morton

6 Answers

4
votes

this short awk one-liner should work:

 awk -v RS='Start|End' -v OFS=":" '$1=$1' file

with your data:

kent$  cat f
Start
a
b
c
End
Start
a
b
c
End
Start
x
y
z
End

kent$  awk -v RS='Start|End' -v OFS=":" '$1=$1' f
a:b:c
a:b:c
x:y:z
1
votes

Here is one version:

awk '/End/{print a;f=a=0} f {a=a?a":"$0:$0} /(S|s)tart/{f=1}' file
a:b:c
a:b:c
x:y:z

I guess there is a typo in the first start, if so use:

awk '/End/{print a;f=a=0} f {a=a?a":"$0:$0} /Start/{f=1}' file

/End/{print a;f=a=0} If line contains End print a, and set f and a to 0
f {a=a?a":"$0:$0} If f is true, set a to $0 for first run and then :$0 on the next run
/Start/{f=1} If line has Start set f to 1 (true)

1
votes

Let's give a try with awk.

$ awk '/start/ || /Start/ {next} /End/ {print line; line=""; next} {if (line) {line=line":"} line=line$0}' file
a:b:c
a:b:c
x:y:z

Explanation

  • /start/ || /Start/ {next} on lines containing "start" or "Start", skip.
  • /End/ {print line; line=""; next} on lines containing End, print the line variable that contains the loaded information. Delete the value of the var and go to the next line.
  • {if (line) {line=line":"} line=line$0} on the rest of the lines, keep loading data in the line variable. The if condition is to avoid having a trailing :.

The /start/ || /Start/ {next} can be reduced to both of these (thanks Jotne):

/start|Start/ {next}

/(s|S)tart/ {next}
1
votes

If there are always 3 lines between start and end:

grep -iv 'start\|end' file | paste -d: - - -
0
votes
sed -n '/Start/,/End/ {
   /Start/ !{
      /End/ !H
      }
   /End/ {
      s/.*//
      x
      s/\n/:/g
      s/://
      p
      }
   }
/Start/,/End/ !p' YourFile

If start and Start should work replace Start by [sS]tart (and End by [eE]nd) in the code

Explaination

Start sed without printing the ouptut unless specific request

/Start/,/End/ {

For any block of line starting with Start and ending with End (on separate line)

/Start/ !{
          /End/ !H
          }

if line doesn not contain (the ! ) Start than End, Add (append) the line to the holding buffer (kind of storage)

/End/ {
   s/.*//
   x
   s/\n/:/g
   s/://
   p
   }

when reach the line that contain End

  1. Delete current line (the one with End)
  2. exchange ( x )the Hold Buffer (with all line of the bloc stored) and Working Buffer (the one that can be manipulate and normaly have the current line)
  3. Change all new line with : (the buffer contain all the line separate by new line after exchange)
  4. remove first : (due to first Append that insert a new line)
  5. print the content

    /Start/,/End/ !p

for all the line not ( ! ) between the block between Start and End, print it

0
votes

Just an alternative approach with GNU awk:

$ gawk -v RS='\0' '{ gsub(/\n/,":"); gsub(/:End:Start:/,"\n"); gsub(/^start:|:End:$/,"") }1' file     
a:b:c
a:b:c
x:y:z

Other awk solutions posted here are fine too.