This GitPro page does summarize the consequence of a git submodule update nicely
When you run git submodule update
, it checks out the specific version of the project, but not within a branch. This is called having a detached head — it means the HEAD file points directly to a commit, not to a symbolic reference.
The issue is that you generally don’t want to work in a detached head environment, because it’s easy to lose changes.
If you do an initial submodule update, commit in that submodule directory without creating a branch to work in, and then run git submodule update again from the superproject without committing in the meantime, Git will overwrite your changes without telling you. Technically you won’t lose the work, but you won’t have a branch pointing to it, so it will be somewhat difficult to retrieve.
Note March 2013:
As mentioned in "git submodule tracking latest", a submodule now (git1.8.2) can track a branch.
# add submodule to track master branch
git submodule add -b master [URL to Git repo];
# update your submodule
git submodule update --remote
# or (with rebase)
git submodule update --rebase --remote
See "git submodule update --remote
vs git pull
".
MindTooth's answer illustrate a manual update (without local configuration):
git submodule -q foreach git pull -q origin master
In both cases, that will change the submodules references (the gitlink, a special entry in the parent repo index), and you will need to add, commit and push said references from the main repo.
Next time you will clone that parent repo, it will populate the submodules to reflect those new SHA1 references.
The rest of this answer details the classic submodule feature (reference to a fixed commit, which is the all point behind the notion of a submodule).
To avoid this issue, create a branch when you work in a submodule directory with git checkout -b work or something equivalent. When you do the submodule update a second time, it will still revert your work, but at least you have a pointer to get back to.
Switching branches with submodules in them can also be tricky. If you create a new branch, add a submodule there, and then switch back to a branch without that submodule, you still have the submodule directory as an untracked directory:
So, to answer your questions:
can I create branches/modifications and use push/pull just like I would in regular repos, or are there things to be cautious about?
You can create a branch and push modifications.
WARNING (from Git Submodule Tutorial): Always publish (push) the submodule change before publishing (push) the change to the superproject that references it. If you forget to publish the submodule change, others won't be able to clone the repository.
how would I advance the submodule referenced commit from say (tagged) 1.0 to 1.1 (even though the head of the original repo is already at 2.0)
The page "Understanding Submodules" can help
Git submodules are implemented using two moving parts:
- the
.gitmodules
file and
- a special kind of tree object.
These together triangulate a specific revision of a specific repository which is checked out into a specific location in your project.
From the git submodule page
you cannot modify the contents of the submodule from within the main project
100% correct: you cannot modify a submodule, only refer to one of its commits.
This is why, when you do modify a submodule from within the main project, you:
- need to commit and push within the submodule (to the upstream module), and
- then go up in your main project, and re-commit (in order for that main project to refer to the new submodule commit you just created and pushed)
A submodule enables you to have a component-based approach development, where the main project only refers to specific commits of other components (here "other Git repositories declared as sub-modules").
A submodule is a marker (commit) to another Git repository which is not bound by the main project development cycle: it (the "other" Git repo) can evolves independently.
It is up to the main project to pick from that other repo whatever commit it needs.
However, should you want to, out of convenience, modify one of those submodules directly from your main project, Git allows you to do that, provided you first publish those submodule modifications to its original Git repo, and then commit your main project refering to a new version of said submodule.
But the main idea remains: referencing specific components which:
- have their own lifecycle
- have their own set of tags
- have their own development
The list of specific commits you are refering to in your main project defines your configuration (this is what Configuration Management is all about, englobing mere Version Control System)
If a component could really be developed at the same time as your main project (because any modification on the main project would involve modifying the sub-directory, and vice-versa), then it would be a "submodule" no more, but a subtree merge (also presented in the question Transferring legacy code base from cvs to distributed repository), linking the history of the two Git repo together.
Does that help understanding the true nature of Git Submodules?