Assuming your input is fixed width as shown in your example then using GNU awk for FIELDWIDTHS:
$ cat tst.awk
BEGIN { FIELDWIDTHS="7 15 6 *"; OFS="\t" }
{
delete vals
numCols = NF
for (colNr=1; colNr<=numCols; colNr++) {
numRows = split($colNr,f,/,/)
for (rowNr=1; rowNr<=numRows; rowNr++) {
val = f[rowNr]
gsub(/^[[:space:]]+|[[:space:]]+$/,"",val)
vals[rowNr,colNr] = val
}
}
for (rowNr=1; rowNr<=numRows; rowNr++) {
printf "%s", vals[1,1]
for (colNr=2; colNr<=numCols; colNr++) {
printf "%s%s", OFS, vals[rowNr,colNr]
}
print ""
}
}
$ awk -f tst.awk file
Gene1 human1 dog1 cat1
Gene1 human2 cat2
Gene2 dog2 cat3
Gene3 human3 cat4
Gene3 human2 cat5
Obviously the above will work for any number of fields in your input. If the input is tab-separated or anything else then replace FIELDWIDTHS=...
with FS=whatever-your-separator-is
. If you want the output to just look tabular regardless of what the fields contain rather than be tab-separated then pipe it to column -s$'\t' -t
or use printf
with a width instead of print
.
Using semi-colons as the separator so you can see them (again just set FS and OFS to be whatever you actually use):
$ cat file
Gene1;human1,human2;dog1;cat1,cat2
Gene2;;dog2;cat3
Gene3;human3;;cat4,cat5
$ cat tst.awk
BEGIN { FS=OFS=";" }
{
delete vals
numCols = NF
for (colNr=1; colNr<=numCols; colNr++) {
numRows = split($colNr,f,/,/)
for (rowNr=1; rowNr<=numRows; rowNr++) {
val = f[rowNr]
vals[rowNr,colNr] = val
}
}
for (rowNr=1; rowNr<=numRows; rowNr++) {
printf "%s", vals[1,1]
for (colNr=2; colNr<=numCols; colNr++) {
printf "%s%s", OFS, vals[rowNr,colNr]
}
print ""
}
}
$ awk -f tst.awk file
Gene1;human1;dog1;cat1
Gene1;human2;;cat2
Gene2;;dog2;cat3
Gene3;human3;;cat4
Gene3;;;cat5
;
instead of tab as the separator in your examples then you could get a solution for;
and simply change the;
in the code to tab before running it on your real data. That would remove all questions about what the separator is is and where it occurs for us reading your example and trying to help you. - Ed Morton;
is used as separator, the table (especially for the input where lengths are various) becomes messy and hard to interpret. So I think the current way (put them in nice table but indicate the separator) is the best way. - XiaokangZH