2
votes

In my input file, columns are tab-separated, and the values inside each column are comma-separated.

I want to print the first column with each comma separated value from the second.

Mary,Tom,David   cat,dog
Kevin   bird,rabbit
John    cat,bird
...

for each record in the second column ( eg cat,dog ) i want to split record into array of [ cat, dog ] and cross print this against the first column. giving output ( just for this line )

Mary,Tom,David   cat
Mary,Tom,David   dog

output for whole file should be be:

Mary,Tom,David   cat
Mary,Tom,David   dog
Kevin   bird
Kevin   rabbit
John    cat
John    bird
...

any suggest if i want to use awk or sed? Thanks

4
Nothing in this question makes any sense. - 123
@123 i need to make the file that way for later processing, just wonder if there will be some easier way using bash...or should i give more examples? - once
He mean that the question is a not clear not the interest you have to use such a format. output does not show what your explanation specify. what are tab and comma separated for you ? comma and tab are generally column separator (CSV family) - NeronLeVelu
So you need to convert \t to \n and , to \t or several spaces? - Yaron
@NeronLeVelu, i hv changed the example to make it more friendly, hope that you can relate the input to the output - once

4 Answers

4
votes

With awk

awk '{split($2,a,",");for(i in a)print $1"\t"a[i]}' file

Splits the second column on commas and then for each split value, print the first column and that value

Also in sed

sed ':1;s/\(\([^\n]*\t\)[^\n]*\),\{1,\}/\1\n\2/;t1' file
2
votes

This might work for you (GNU sed):

sed -r 's/^((\S+\s+)[^,]+),/\1\n\2/;P;D' file

The process can be broken down to three commands: substitution, print and delete. Replace each , in the second field by a newline and the first field and the following spaces. Then print upto and including the newline and delete upto and including the newline and repeat. The key command is the D which will reinvoke the previous commands until the pattern space is entirely empty.

1
votes

process.sh

#!/bin/bash

while read col_one col_two; do
  IFS=, read -a explode <<< "$col_two";
  for val in "${explode[@]}"; do
    printf "%s\t%s\n" "$col_one" "$val";
  done;
done <"$1";

with input.txt as

Mary,Tom,David   cat,dog
Kevin   bird,rabbit
John    cat,bird

output

$ ./process.sh input.txt 
Mary,Tom,David  cat
Mary,Tom,David  dog
Kevin   bird
Kevin   rabbit
John    cat
John    bird
1
votes

with awk

awk '{split($2, aEl, ","); for (Eli in aEl) print $1 "\t" aEl[ Eli]}' YourFile

with sed

sed 'H;s/.*//;x
:cycle
   s/\(\n\)\([^[:cntrl:]]*[[:blank:]]\{1,\}\)\([^[:cntrl:]]*\),\([^,]*\)/\1\2\3\1\2\4/;t cycle
s/.//' YourFile