I have a few thousand PDFs that I need merged based on filename.
Named like:
Lastname, Firstname_12345.pdf
Instead of overwriting or appending, our software appends a number/datetime to the pdf if there are additional pages like:
Lastname, Firstname_12345_201305160953344627.pdf
For all the ones that don't have a second (or third) pdf the script doesn't need to touch. But, for all the ones that have multiples, they need to be merged into a new file *_merged.pdf? and the originals deleted.
I gave this my best effort and this is what I have so far.
#! /bin/bash
# list all pdfs to show shortest name first
LIST=$(ls -r *.pdf)
for x in "$LIST"
# Remove .pdf extension. merge pdfs. delete originals.
do
y=${x%%.*}
pdftk "$y"*.pdf cat output "$y"_merged.pdf
find "$y"*.pdf -type f ! -iname "*_merged.pdf" -delete
done
This script works to a certain extent. It will merge and delete the originals, but it doesn't have anything in it to skip ones that don't need anything appended to them, and when I run it in a folder with several test files it stops after one file. Can anyone point me in the right direction?