244
votes

I need to merge two Git repositories into a brand new, third repository. I've found many descriptions of how to do this using a subtree merge (for example Jakub Narębski's answer on How do you merge two Git repositories?) and following those instructions mostly works, except that when I commit the subtree merge all of the files from the old repositories are recorded as new added files. I can see the commit history from the old repositories when I do git log, but if I do git log <file> it shows only one commit for that file - the subtree merge. Judging from the comments on the above answer, I'm not alone in seeing this problem but I've found no published solutions for it.

Is there any way do merge repositories and leave individual file history intact?

8
I'm not using Git, but in Mercurial I'd first do a convert if necessary to fix the file paths of the repos to be merged, and then force-pull one repo into the target to get the changesets, and then do a merge of the different branches. This is tested and works ;) Maybe this helps to find a solution for Git as well... compared to the subtree-merge approach I guess the convert step is different where the history is rewritten instead of just mapping a path (if I understand correctly). This then ensures a smooth merge without any special handling of file paths.Lucero
I also found this question helpful stackoverflow.com/questions/1683531/…nacross
I created a follow-up question. Might be interesting: Merge two Git repositories and keep the master history: stackoverflow.com/questions/42161910/…Dimitri Dewaele
The automated solution that worked for me was stackoverflow.com/a/30781527/239408xverges

8 Answers

287
votes

It turns out that the answer is much simpler if you're simply trying to glue two repositories together and make it look like it was that way all along rather than manage an external dependency. You simply need to add remotes to your old repos, merge them to your new master, move the files and folders to a subdirectory, commit the move, and repeat for all additional repos. Submodules, subtree merges, and fancy rebases are intended to solve a slightly different problem and aren't suitable for what I was trying to do.

Here's an example Powershell script to glue two repositories together:

# Assume the current directory is where we want the new repository to be created
# Create the new repository
git init

# Before we do a merge, we have to have an initial commit, so we'll make a dummy commit
git commit --allow-empty -m "Initial dummy commit"

# Add a remote for and fetch the old repo
# (the '--fetch' (or '-f') option will make git immediately fetch commits to the local repo after adding the remote)
git remote add --fetch old_a <OldA repo URL>

# Merge the files from old_a/master into new/master
git merge old_a/master --allow-unrelated-histories

# Move the old_a repo files and folders into a subdirectory so they don't collide with the other repo coming later
mkdir old_a
dir -exclude old_a | %{git mv $_.Name old_a}

# Commit the move
git commit -m "Move old_a files into subdir"

# Do the same thing for old_b
git remote add -f old_b <OldB repo URL>
git merge old_b/master --allow-unrelated-histories
mkdir old_b
dir –exclude old_a,old_b | %{git mv $_.Name old_b}
git commit -m "Move old_b files into subdir"

Obviously you could instead merge old_b into old_a (which becomes the new combined repo) if you’d rather do that – modify the script to suit.

If you want to bring over in-progress feature branches as well, use this:

# Bring over a feature branch from one of the old repos
git checkout -b feature-in-progress
git merge -s recursive -Xsubtree=old_a old_a/feature-in-progress

That's the only non-obvious part of the process - that's not a subtree merge, but rather an argument to the normal recursive merge that tells Git that we renamed the target and that helps Git line everything up correctly.

I wrote up a slightly more detailed explanation here.

161
votes

Here's a way that doesn't rewrite any history, so all commit IDs will remain valid. The end-result is that the second repo's files will end up in a subdirectory.

  1. Add the second repo as a remote:

    cd firstgitrepo/
    git remote add secondrepo username@servername:andsoon
    
  2. Make sure that you've downloaded all of the secondrepo's commits:

    git fetch secondrepo
    
  3. Create a local branch from the second repo's branch:

    git branch branchfromsecondrepo secondrepo/master
    
  4. Move all its files into a subdirectory:

    git checkout branchfromsecondrepo
    mkdir subdir/
    git ls-tree -z --name-only HEAD | xargs -0 -I {} git mv {} subdir/
    git commit -m "Moved files to subdir/"
    
  5. Merge the second branch into the first repo's master branch:

    git checkout master
    git merge --allow-unrelated-histories branchfromsecondrepo
    

Your repository will have more than one root commit, but that shouldn't pose a problem.

15
votes

Say you want to merge repository a into b (I'm assuming they're located alongside one another):

cd b
git remote add a ../a
git fetch a
git merge --allow-unrelated-histories a/master
git remote remove a

In case you want to put a into a subdirectory do the following before the commands above:

cd a
git filter-repo --to-subdirectory-filter a
cd ..

For this you need git-filter-repo installed (filter-branch is discouraged).

An example of merging 2 big repositories, putting one of them into a subdirectory: https://gist.github.com/x-yuri/9890ab1079cf4357d6f269d073fd9731

More on it here.

14
votes

A few years have passed and there are well-based up-voted solutions but I want to share mine because it was a bit different because I wanted to merge 2 remote repositories into a new one without deleting the history from the previous repositories.

  1. Create a new repository in Github.

    enter image description here

  2. Download the newly created repo and add the old remote repository.

    git clone https://github.com/alexbr9007/Test.git
    cd Test
    git remote add OldRepo https://github.com/alexbr9007/Django-React.git
    git remote -v
    
  3. Fetch for all the files from the old repo so a new branch gets created.

    git fetch OldRepo
    git branch -a
    

    enter image description here

  4. In the master branch, do a merge to combine the old repo with the newly created one.

    git merge remotes/OldRepo/master --allow-unrelated-histories
    

    enter image description here

  5. Create a new folder to store all the new created content that was added from the OldRepo and move its files into this new folder.

  6. Lastly, you can upload the files from the combined repos and safely delete the OldRepo from GitHub.

Hope this can be useful for anyone dealing with merging remote repositories.

7
votes

please have a look at using

git rebase --root --preserve-merges --onto

to link two histories early on in their lives.

If you have paths that overlap, fix them up with

git filter-branch --index-filter

when you use log, ensure you "find copies harder" with

git log -CC

that way you will find any movements of files in the path.

7
votes

I turned the solution from @Flimm this into a git alias like this (added to my ~/.gitconfig):

[alias]
 mergeRepo = "!mergeRepo() { \
  [ $# -ne 3 ] && echo \"Three parameters required, <remote URI> <new branch> <new dir>\" && exit 1; \
  git remote add newRepo $1; \
  git fetch newRepo; \
  git branch \"$2\" newRepo/master; \
  git checkout \"$2\"; \
  mkdir -vp \"${GIT_PREFIX}$3\"; \
  git ls-tree -z --name-only HEAD | xargs -0 -I {} git mv {} \"${GIT_PREFIX}$3\"/; \
  git commit -m \"Moved files to '${GIT_PREFIX}$3'\"; \
  git checkout master; git merge --allow-unrelated-histories --no-edit -s recursive -X no-renames \"$2\"; \
  git branch -D \"$2\"; git remote remove newRepo; \
}; \
mergeRepo"
3
votes

This function will clone remote repo into local repo dir:

function git-add-repo
{
    repo="$1"
    dir="$(echo "$2" | sed 's/\/$//')"
    path="$(pwd)"

    tmp="$(mktemp -d)"
    remote="$(echo "$tmp" | sed 's/\///g'| sed 's/\./_/g')"

    git clone "$repo" "$tmp"
    cd "$tmp"

    git filter-branch --index-filter '
        git ls-files -s |
        sed "s,\t,&'"$dir"'/," |
        GIT_INDEX_FILE="$GIT_INDEX_FILE.new" git update-index --index-info &&
        mv "$GIT_INDEX_FILE.new" "$GIT_INDEX_FILE"
    ' HEAD

    cd "$path"
    git remote add -f "$remote" "file://$tmp/.git"
    git pull "$remote/master"
    git merge --allow-unrelated-histories -m "Merge repo $repo into master" --edit "$remote/master"
    git remote remove "$remote"
    rm -rf "$tmp"
}

How to use:

cd current/package
git-add-repo https://github.com/example/example dir/to/save

Notice. This script can rewrite commits but will save all authors and dates, it means new commits will have another hashes, and if you try to push changes to remote server it can be able only with force key, also it will rewrite commits on server. So please make backups before to launch.

Profit!

2
votes

Follow the steps to embed one repo into another repo, having one single git history by merging both git histories.

  1. Clone both the repos you want to merge.

git clone [email protected]:user/parent-repo.git

git clone [email protected]:user/child-repo.git

  1. Go to child repo

cd child-repo/

  1. run the below command, replace path my/new/subdir (3 occurences) with directory structure where you want to have the child repo.

git filter-branch --prune-empty --tree-filter ' if [ ! -e my/new/subdir ]; then mkdir -p my/new/subdir git ls-tree --name-only $GIT_COMMIT | xargs -I files mv files my/new/subdir fi'

  1. Go to parent repo

cd ../parent-repo/

  1. Add a remote to parent repo, pointing path to child repo

git remote add child-remote ../child-repo/

  1. Fetch the child repo

git fetch child-remote

  1. Merge the histories

git merge --allow-unrelated-histories child-remote/master

If you check the git log in the parent repo now, it should have the child repo commits merged. You can also see the tag indicating from the commit source.

Below article helped me in Embedding one repo into another repo, having one single git history by merging both git histories.

http://ericlathrop.com/2014/01/combining-git-repositories/

Hope this helps. Happy Coding!