11
votes

Is it possible using some smart piping and coding, to merge yaml files recursively? In PHP, I make an array of them (each module can add or update config nodes of/in the system).

The goal is an export shellscript that will merge all separate module folders' config files into big merged files. It's faster, efficient, and the customer does not need the modularity at the time we deploy new versions via FTP, for example.

It should behave like the PHP function: array_merge_recursive

The filesystem structure is like this:

mod/a/config/sys.yml
mod/a/config/another.yml
mod/b/config/sys.yml
mod/b/config/another.yml
mod/c/config/totally-new.yml
sys/config/sys.yml

Config looks like:

date:
   format:
      date_regular: %d-%m-%Y

And a module may, say, do this:

date:
   format:
      date_regular: regular dates are boring
      date_special: !!!%d-%m-%Y!!!

So far, I have:

#!/bin/bash
#........
cp -R $dir_project/ $dir_to/
for i in $dir_project/mod/*/
do
    cp -R "${i}/." $dir_to/sys/
done

This of course destroys all existing config files in the loop.. (rest of the system files are uniquely named)

Basically, I need a yaml parser for the command line, and an array_merge_recursive like alternative. Then a yaml writer to ouput it merged. I fear I have to start to learn Python because bash won't cut it on this one.

6
YAML is too complicated to parse and process in Bash. Use some tool that has that as a feature. - Ondra Žižka

6 Answers

15
votes

You can use for example perl. The next oneliner:

perl -MYAML::Merge::Simple=merge_files -MYAML -E 'say Dump merge_files(@ARGV)' f1.yaml f2.yaml

for the next input files: f1.yaml

date:
  epoch: 2342342343
  format:
    date_regular: "%d-%m-%Y"

f2.yaml

date:
  format:
    date_regular: regular dates are boring
    date_special: "!!!%d-%m-%Y!!!"

prints the merged result...

---
date:
  epoch: 2342342343
  format:
    date_regular: regular dates are boring
    date_special: '!!!%d-%m-%Y!!!'

Because @Caleb pointed out that the module now is develeloper only, here is an replacement. It is a bit longer and uses two (but commonly available) modules:

perl -MYAML=LoadFile,Dump -MHash::Merge::Simple=merge -E 'say Dump(merge(map{LoadFile($_)}@ARGV))' f1.yaml f2.yaml

produces the same as above.

3
votes

No.

Bash has no support for nested data structures (its maps are integer->string or string->string only), and thus cannot represent arbitrary YAML documents in-memory.

Use a more powerful language for this task.

3
votes

I recommend yq -m. yq is a swiss army knife for yaml, very similar to jq (for JSON).

2
votes

Late to the party, but I also wrote a tool for this:

https://github.com/benprofessionaledition/yamlmerge

It's almost identical to Ondra's JVM tool (they're even both called "yaml merge"), the key difference being that it's written in Go so it compiles to a ~3MB binary with no external dependencies. We use it in Gitlab-CI containers.

1
votes

Bash is a bit of a stretch for this (it could be done but it would be error prone). If all you want to do is call a few things from a bash shell (as opposed to actually scripting the merge using bash functions) then you have a few options.

I noticed there is a Java based yaml-merge tool, but that didn't suit my fancy very much, so I kept looking. In the end I clobbered together something using two tools: yaml2json and jq.

Warning: Since JSON's capabilities are only a subset of YAML's, this is not a lossless process for complex YAML structures. It will work for a lot of simple key/value/sequence scenarios but will muck things up if your input YAML is too fancy. Test it on your data types to see if it does what you expect.

  1. Use yaml2json to convert your inputs to JSON:

    yaml2json input1.yml > input1.json
    yaml2json input2.yml > input2.json
    
  2. Use jq to iterate over the objects and merge them recursively (see this question and answers for details). List files in reverse order of importance as values in later ones will clobber earlier ones:

    jq -s 'reduce .[] as $item({}; . + $item)' input1.json input2.json > merged.json
    
  3. Take it back to YAML:

    json2yaml merged.json > merged.yml
    

If you want to script this, of course the usual bash mechanisms are your friend. And if you happen to be in GNU-Make like I was, something like this will do the trick:

.SECONDEXPANSION:
merged.yml: input1.yml input2.yml
    json2yaml <(jq -s 'reduce .[] as $$item({}; . + $$item)' $(foreach YAML,$^,<(yaml2json $(YAML)))) > $@
1
votes

There is a tool that merges YAML files - merge-yaml. It supports full YAML syntax, and is capable of expanding environment variables references.

I forked it and released it into a form of an executable .jar.
https://github.com/OndraZizka/yaml-merge

Usage:

./bin/yaml-merge.sh ./*.yml > result.yml

It is written in Java so you need Java (I think 8 and newer) installed.
(Btw, if someone wants to contribute, that would be great.)


In general, merging YAML is not a trivial thing, in the sense that the tool doesn't always know what you really want to do. You can merge structures in multiple way. Think if this example:

foo:
   bar: bar2
   baz: 
      - baz1
---
foo:
   bar: bar1
   baz: 
      - baz2
   goo: gaz1

Few questions / unknowns arise:

  • Should the 2nd foo tree replace the first one?
  • Should the 2nd bar replace the first one, or merge to an array?
  • Should the 2nd baz array replace the 1st, or be merged?
    • If merged, then how - should there be duplicities, or should the tool keep the values unique? Should the order be managed in some way?

Etc. One may object that there can be some default, but often, the real world requirements need different operations.

Other tools and libraries to deal with data structures deal with this by defining a scheme with metadata, for instance, JAXB or Jackson use Java annotations.
For this general tool, that is not an option, so the user would have to control this through a) the input data, or b) parameters. a) is impractical and sometimes impossible, b) is tedious and needs a fancy syntax like jq has.

That said, Caleb's answer might be what you need. Although, that solution reduces your data to what JSON is capable of, so you will loose comments, various way to represent long strings, usage of JSON within YAML, etc., which is not too user friendly.