BASH Script to iterate through a list of IDs in an XML file and print/output the name to shell or output file?

Question

I'm looking to iterate through a list of ID numbers which matches ID numbers in an XML file and print the line below using BASH (and AWK) to the shell or redirect it to a third, output file (output.txt)

Here is the breakdown:

ID_list.txt (shortened for this example - it has 100 IDs)

XML_example.txt (thousands of entries)

<book>
  <ID>4414</ID>
  <name>Name of first book</name>
</book>
<book>
  <ID>4561</ID>
  <name>Name of second book</name>
</book>

I'd like the output of the script to be the names of the 100 IDs from the first file:

Name of first book
Name of second book
etc

I believe it's possible to do this using BASH and AWK with a for loop (for each in file 1, find the corresponding name in file2). I think you can recurisvely GREP for the ID number and then print the line below it using AWK. Even if the output looked like this, I can remove the XML tags after:

<name>Name of first book</name>
<name>Name of second book</name>

It's on a Linux server but I can port it over to PowerShell on Windows. I think BASH/GREP and AWK are the way to go.

Can someone help me script this?

Show us what you tried and what specifically you're having problems with - otherwise it looks like you want us to write it for you. — user2062950
@user2062950, you are right, apologies for not posting my version prior to asking. I was using while read; do and a for i in ID_list.txt solution, but Dogbane's solution(s) below were cleaner. — Mike J
It really isn't that terrible using BASH_REMATCH, though still obviously simpler in a language that includes a package to do it for you. — Reinstate Monica Please

larsks larsks · Accepted Answer · 2014-01-21T18:00:27

Given an ID, you can get the name using XPath xpressions and the xmllint command, like this:

id=4414
name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml)

So with this, you could write something like:

while read id; do
    name=$(xmllint --xpath "string(//book[ID[text()='$id']]/name)" books.xml)
    echo "$name"
done < id_list.txt

Unlike solutions involving awk, grep, and friends, this is using an actual XML parsing tool. This means that while most other solutions might break if they encountered:

<book><ID>4561</ID><name>Name of second book</name></book>

...this would work just fine.

xmllint is part of the libxml2 package, and is available on most distributions.

Note also that recent versions of awk have native XML parsing.

BASH Script to iterate through a list of IDs in an XML file and print/output the name to shell or output file?

4 Answers