2
votes

I wish to write several rules that extract the contents of tar archives to produce a number of files that are then used as input dependencies for other rules. I wish this to work even with parallel builds. I'm not using recursive make.

First up, sorry for the marathon question, but I don't think I can explain it well in a shorter form.

Think of untarring a collection of source files and then compiling them with rules stored outside of the archive to produce various build artefacts that are then, in turn, used further. I am not seeking other arrangements that lead to the omission of this problem. Just take it for granted that I have good reason to do this. :)

I'll demonstrate my issue with a contrived example. Of course, I started with something basic:

TAR := test.tar.bz2

CONTENTS := $(addprefix out/,$(filter-out %/,$(shell tar -tf $(TAR))))

out: $(TAR)
        rm -rf out
        mkdir out
        tar -xvf $< -C out --touch || (rm -rf out; exit 1)

$(CONTENTS): out

sums: $(CONTENTS)
        md5sum $^ > $@

.DELETE_ON_ERROR:
.DEFAULT_GOAL := all
.PHONY: all clean
all: sums
clean:
        rm -rf out sums

The thinking here is that since $(CONTENTS) are all of the files in the archive, and they all depend on out, then to run the sums target we need to end up extracting the archive.

Unfortunately, this doesn't (always) work if you use a parallel invocation after a previous build when only test.tar.bz2 is updated, because make may decide to check the timestamp of $(CONTENTS) before running the out rule, which means it thinks that each of the sources is older than sums, so there is nothing to do:

$ make clean
rm -rf out sums

$ make -j6
rm -rf out
mkdir out
tar -xvf test.tar.bz2 -C out --touch || (rm -rf out; exit 1)
data.txt
file
weird.file.name
dir/
dir/another.c
dir/more
md5sum out/data.txt out/file out/weird.file.name out/dir/another.c out/dir/more > sums

$ touch test.tar.bz2 
$ make -j6
rm -rf out
mkdir out
tar -xvf test.tar.bz2 -C out --touch || (rm -rf out; exit 1)
data.txt
file
weird.file.name
dir/
dir/another.c
dir/more

Oops! The sums rule didn't run!

So, the next attempt was to tell make that the one untar rule actually does make all the $(CONTENTS) directly. This seems better since we're telling make what's really going on, so it knows when to forget any cached timestamps for targets when they are remade through their rule.

First, let's look at what seems to work, and then I'll get to my problem:

TAR := test.tar.bz2

CONTENTS := $(addprefix out/,$(filter-out %/,$(shell tar -tf $(TAR))))

# Here's the change.
$(addprefix %/,$(patsubst out/%,%,$(CONTENTS))): $(TAR)
        rm -rf out
        mkdir out
        tar -xvf $< -C out --touch || (rm -rf out; exit 1)

sums: $(CONTENTS)
        md5sum $^ > $@

.DELETE_ON_ERROR:
.DEFAULT_GOAL := all
.PHONY: all clean
all: sums
clean:
        rm -rf out sums

In this case, we've effectively got a rule that says:

%/data.txt %/file %/weird.file.name %/dir/another.c %/dir/more: test.tar.bz2
        rm -rf out
        mkdir out
        tar -xvf $< -C out --touch || (rm -rf out; exit 1)

Now you can see one of the reasons I forced the output into an out directory: to give me a place to use the % so I could use a pattern rule. I am forced to use a pattern rule even though there isn't a strong pattern here because it is the only way make can be told that one rule creates multiple output files from a single invocation. (Isn't it?)

This works if any of the files are touched (not important for my use case) or if the test.tar.bz2 file is touched, even in parallel builds, because make has the information it needs: running this recipe makes all these files and will change all their timestamps.

For example, after a previous successful build:

$ touch test.tar.bz2 
$ make -j6
rm -rf out
mkdir out
tar -xvf test.tar.bz2 -C out --touch || (rm -rf out; exit 1)
data.txt
file
weird.file.name
dir/
dir/another.c
dir/more
md5sum out/data.txt out/file out/weird.file.name out/dir/another.c out/dir/more > sums

So, if I have a working solution, what's my problem?

Well, I have many of these archives to extract, each with their own set of $(CONTENTS). I can manage that, but the trouble comes in writing a nice pattern rule. Since each archive needs its own rule defined, the patterns for each rule must not overlap even if the archives have similar (or identical) content. That means the output paths for the extracted files must be made unique for each archive, as in:

TAR := test.tar.bz2

CONTENTS := $(addprefix out.$(TAR)/,$(filter-out %/,$(shell tar -tf $(TAR))))

$(patsubst out.$(TAR)/%,out.\%/%,$(CONTENTS)): $(TAR)
        rm -rf out.$(TAR)
        mkdir out.$(TAR)
        tar -xvf $< -C out.$(TAR) --touch || (rm -rf out.$(TAR); exit 1)

sums: $(CONTENTS)
        md5sum $^ > $@

.DELETE_ON_ERROR:
.DEFAULT_GOAL := all
.PHONY: all clean
all: sums
clean:
        rm -rf out.$(TAR) sums

So, this can be made to work with the right target-specific variables, but it now means that the extraction points are all "ugly" in a way that is very specifically tied to how the makefile is constructed:

$ make -j6
rm -rf out.test.tar.bz2
mkdir out.test.tar.bz2
tar -xvf test.tar.bz2 -C out.test.tar.bz2 --touch || (rm -rf out.test.tar.bz2; exit 1)
data.txt
file
weird.file.name
dir/
dir/another.c
dir/more
md5sum out.test.tar.bz2/data.txt out.test.tar.bz2/file out.test.tar.bz2/weird.file.name out.test.tar.bz2/dir/another.c out.test.tar.bz2/dir/more > sums

The next natural step I took was to try to combine static pattern rules with the multiple-targets-via-pattern-rule approach. This would let me keep the patterns very general, but limit their application to a specific set of targets:

TAR := test.tar.bz2

CONTENTS := $(addprefix out/,$(filter-out %/,$(shell tar -tf $(TAR))))

# Same as second attempt, except "$(CONTENTS):" static pattern prefix
$(CONTENTS): $(addprefix %/,$(patsubst out/%,%,$(CONTENTS))): $(TAR)
        rm -rf out
        mkdir out
        tar -xvf $< -C out --touch || (rm -rf out; exit 1)

sums: $(CONTENTS)
        md5sum $^ > $@

.DELETE_ON_ERROR:
.DEFAULT_GOAL := all
.PHONY: all clean
all: sums
clean:
        rm -rf out sums

Great! Except it doesn't work:

$ make
Makefile:5: *** multiple target patterns.  Stop.
$ make --version
GNU Make 4.0

So, is there a way to use multiple target patterns with a static pattern rule? If not, is there another way to achieve what I have in the last working example above, but without the constraint on the output paths to make unique patterns? I basically need to tell make "when you unpack this archive, all of the files in this directory (which I am willing to enumerate if necessary) have new timestamps". A solution where I can force make to restart if and only if it unpacks an archive would also be acceptable, but less ideal.

1
Does that original makefile always fail for you in that test? Over a run of a thousand it succeeded for me seven times.Etan Reisner

1 Answers

1
votes

The problem with your original makefile is that you have a collision in names. You have a target (non-phony) named out and a directory named out. make thinks those are the same thing and gets very confused.

(Note: I added .SUFFIXES: to your first makefile to cut down on some noise but it doesn't change anything. The -r and -R flags disable make built-in rules and variables also for noise reduction.)

$ make clean
....
$ make -j6
....
$ touch test.tar.bz2
$ make -rRd -j6
....
Considering target file 'all'.
 File 'all' does not exist.
  Considering target file 'sums'.
    Considering target file 'out/data.txt'.
     Looking for an implicit rule for 'out/data.txt'.
     No implicit rule found for 'out/data.txt'.
      Considering target file 'out'.
        Considering target file 'test.tar.bz2'.
         Looking for an implicit rule for 'test.tar.bz2'.
         No implicit rule found for 'test.tar.bz2'.
         Finished prerequisites of target file 'test.tar.bz2'.
        No need to remake target 'test.tar.bz2'.
       Finished prerequisites of target file 'out'.
       Prerequisite 'test.tar.bz2' is older than target 'out'.
      No need to remake target 'out'.
     Finished prerequisites of target file 'out/data.txt'.
     Prerequisite 'out' is older than target 'out/data.txt'.
    No recipe for 'out/data.txt' and no prerequisites actually changed.
    No need to remake target 'out/data.txt'.
.... # This following set of lines repeats for all the other files in the tarball.
    Considering target file 'out/file'.
     Looking for an implicit rule for 'out/file'.
     No implicit rule found for 'out/file'.
      Pruning file 'out'.
     Finished prerequisites of target file 'out/file'.
     Prerequisite 'out' is older than target 'out/file'.
    No recipe for 'out/file' and no prerequisites actually changed.
    No need to remake target 'out/file'.
....
   Finished prerequisites of target file 'sums'.
   Prerequisite 'out/data.txt' is older than target 'sums'.
   Prerequisite 'out/file' is older than target 'sums'.
   Prerequisite 'out/weird.file.name' is older than target 'sums'.
   Prerequisite 'out/dir/more' is older than target 'sums'.
   Prerequisite 'out/dir/another.c' is older than target 'sums'.
  No need to remake target 'sums'.
 Finished prerequisites of target file 'all'.
Must remake target 'all'.
Successfully remade target file 'all'.
make: Nothing to be done for 'all'.

The main details here are these two lines:

  1. Considering target file 'out'.
  2. Prerequisite 'out' is older than target 'out/data.txt'

The out directory doesn't matter here. We don't care about it (and make doesn't deal with directory prerequisites too well anyway because modification timestamps on directories don't mean the same thing as they do on files). Even more to the point you don't want out/data.txt not being created because the build artifact directory target already existed (and seemed older).

You can "fix" this by marking the out target as .PHONY but that is just going to get make to extract the tarball every time you run make (you already run tar -tf every time you run make so it would probably be better to just combine those two steps if you were going to do this).

That said I wouldn't do that. I think the simplest solution to this problem is the "atomic rules" idea from John Graham-Cunning built-up and explained here.

sp :=
sp +=

sentinel = .sentinel.$(subst $(sp),_,$(subst /,_,$1))

atomic = $(eval $1: $(call sentinel,$1) ; @:)$(call sentinel,$1): $2 ; touch $$@ $(foreach t,$1,$(if $(wildcard $t),,$(shell rm -f $(call sentinel,$1))))

.PHONY: all

all: a b

$(call atomic,a b,c d)

   touch a b

You could probably also do this with an extraction stamp file (prereq on the tarball), extracting the tarball to a "shadow" directory and copy/link to the "final" location (build/$file: shadow/$file target) if you wanted to but that's going to be a bit more complicated I think.