3
votes

I am trying to write a bash script to extract the multiple "directors" from an xml file such as this and concat them separated by a pipe, i.e. Tom Tykwer|Andy Wachowski.

The relevant xml section is:

<directors>
<item>Tom Tykwer</item>
<item>Andy Wachowski</item>
</directors>

With xmlstarlet in a bash script the following commands:

DIRECTORS=$(xmlstarlet sel -t -v "imdbdocument/directors/item" mymoviexml)
echo $DIRECTORS

give me

Tom Tykwer Andy Wachowski

and this command directly at the terminal

xmlstarlet sel -t -v "imdbdocument/directors/item" mymovieapi.xml

gives me:

(empty line)
Tom Tykwer
Andy Wachowski

I don't know why the new lines are being added when I am not specifying the -n option.

A few of my searches have suggested something like this:

xmlstarlet sel -t -m "imdbdocument/directors" -v "item" -o "|" mymovieapi.xml 

but this just gives me:

Tom Tykwer
Andy Wachowski|

I'd appreciate any help I can get. I'm seeing this behaviour with xmlstarlet 1.3.1 on Debian Wheezy and xmlstarlet 1.5.0 on Xubuntu 13.10.

2
You should use --text (or -T) since you don't want XML output.npostavs
I have tried the --text option but it hasn't made any difference to the output.hillbillydetective

2 Answers

4
votes

A solution using only xmlstarlet:

xmlstarlet sel -T -t -v '/imdbdocument/directors/item[1]' -m '/imdbdocument/directors/item[position()>1]' -o '|' -v . mymovieapi.xml

I tested with version 1.5, but I believe it should work with earlier versions too.


Alternative, using --if instead of 2 XPath expressions:

xmlstarlet sel -T -t -m '/imdbdocument/directors/item' --if 'position() > 1' -o '|' -b -v . mymovieapi.xml

-b is --break, it ends the current statement (conditional or loop), like } in C.

1
votes

You can try

xmlstarlet sel -t -v "imdbdocument/directors/item" mymovieapi.xml |  awk '1' ORS='|'

with output

|Tom Tykwer|Andy Wachowski|

or if you do not want the leading and trailing pipes |:

xmlstarlet sel -t -v "imdbdocument/directors/item" mymovieapi.xml | awk 'NF>0 {if (i++) printf "|"; printf "%s", $0 } END { printf "\n" }'

gives

Tom Tykwer|Andy Wachowski