2
votes

I have a question regarding how git will pull the changes form the remote, and how many history.

I'm considering to follow a gitFlow workflow for my project. We are 80 developers, and we will be integrating our changes from feature branches to the develop branch - by means of pull requests to perform code review first.

We will need to (locally) rebase our feature branches on (top of) develop, so that we have all the latest develop changes integrated. Hence, we will be pulling develop often. Here, I don´t want to fetch other teammates' feature branches - nor their commit history.

Now, if I pull develop, will this operation bring commit history that happen under other feature branches if they are reachable (through a merge commit) from develop?

Thanks in advance :-)

EDIT: I might not have been clear enough:

  1. We use rebase locally, so that pull request over develop branch are mergeable. We don't use merge as it might "pollute" feature branches when performing code-review. If the pulll request is accepted then, we will merge with a non fast forward commit.

  2. I know I can "git fetch origin develop". Here is my question: will git pull origin develop just "fetch" the blue commits or also the green ones? See figure git-pull-

2
Yes, it will. Rebasing your feature on develop or merging develop into feature will have the same end result with regard to any new changes from develop being brought into your feature branch. By the way, you casually mentioned rebasing without giving a compelling reason for using it over merging. Rebasing is useful when you want to keep the history of develop linear, but you may not have a need for this. - Tim Biegeleisen
I edited the question to be more clear. - forti
we will merge with a non fast forward commit ... the whole point of rebasing feature on develop is so that you can fast forward develop with all the new commits from feature. Otherwise, you might as well just merge. - Tim Biegeleisen
I agree to a point. To me the whole point of rebasing 'feature' on develop is that code review is "easier": (1) we don't pollute the feature branch with other commits that do not belong to that feature, and (2) we can perform a 2-way diff between develop and feature branch and hence take advantage of Github's pull-request diff-view, as the common ancestor is the latest develop commit - so we do not need a 3 diff view (which Github is not giving). What about edit #2? - forti

2 Answers

1
votes

I started on a complete answer, but it got way too long.

To answer just a few specifics, your concerns are real but slightly misguided (not your fault as much Git documentation is terrible). The crucial issue is not so much what git fetch fetches,1 it's what is in the commit graph of the commits you merge with git merge; and which commits get copied when you choose to run git rebase, which depend, again, on the commit graph, and on the arguments you supply to git rebase.

The key concept is reachability. Names like origin/master (which git fetch updates) make commits reachable, but commits (which git fetch brings in) also make other commits reachable. A reachable commit makes the entire chain of commits "before" that commit reachable. Merge commits, which list more than one parent commit ID, make two (or more) chains of commits reachable.


1Of course, what git fetch doesn't fetch, can't possibly be reached (in your copy of the repo), since it does not exist (in your copy of the repo). I suspect that's what you are aiming for here, but it's difficult to achieve in general, and unnecessary anyway.


Remember that (1) each commit is identified by its SHA-1 hash ID, (2) each commit contains the hash ID(s) of its parent commit(s), and (3) branch names are just names for one commit ID. The branch name gets a new ID stuffed into it frequently, to grow the branch (to add a regular or merge commit), or to point to commits copied by rebase.

Then, remember that git rebase works by copying commits. The copies have new, different IDs:

          A--B--C       [original mybranch, before rebase]
         /
...--o--o
         \
          o--o           <-- origin/theirbranch
              \
               A'-B'-C'   <-- mybranch [after rebase]

This is guaranteed to be fine as long as no one else has names (branch or tag names) or commits that point to any of the original commits A, B, or C. If they do have such names, those existing names may—or may not—continue to point back to the originals, not to the new copies. Even that is fine as long as you don't use them now. If and when the names are updated to point to new commits, the old ones become irrelevant as long as no still-reachable commits point to the old commits. If existing commits point to "outdated" commits, though, those commits will continue to point to them forever, since commits are permanent.2


2No Git object can ever change. This is a fundamental guarantee that Git makes. However, all Git objects, including commits, that are completely unreachable are eventually removed. Git has a "garbage collector", git gc, that does this. It's a bit complicated as there are numerous grace period tricks to keep objects around: everything gets 14 days by default, and references—including branch, tag, and remote-tracking branch names—may have reflog entries, which make otherwise-unreachable commits reachable again. The reflog entries themselves persist for either 30 days or 90 days by default, depending on yet another reachability computation, comparing the current hash value in the reference to the hash in the reflog entry. The garbage collector is normally invoked automatically whenever Git thinks this might be a good idea.


On fetch

For instance, suppose that your git fetch brings in, to your repository, origin/BobsBranch and it points to some commits:

          B1-B2-B3    <-- origin/BobsBranch
         /
...--o--o             <-- origin/develop
         \
          C1-C2-C3    <-- my_independent_work

You can rebase your work whenever you like. Meanwhile Bob can rebase BobsBranch (though he may need to force-push the result to the server). Let's say he throws out those three commits entirely in favor of one new B4 commit. You run git fetch and pick up a new, different origin/BobsBranch; your repository now has:

          B4          <-- origin/BobsBranch
         /
        | B1-B2-B3    [a reflog entry for origin/BobsBranch]
        |/
...--o--o             <-- origin/develop
         \
          C1-C2-C3    <-- my_independent_work

The reflog-only commits won't show up in git log --all or gitk --all views, and as long as you never use any of these B* commits, they do not harm you in any way (well, they do take up a bit of space in your repository).

To avoid bringing them over even though they are harmless, you can run git fetch with instructions to avoid bringing them over. When you run the git pull convenience command, git pull runs git fetch with instructions to bring over only one origin/whatever branch's reachable commits, so that usually avoids bringing them over—unless, of course, they're reachable from something your Git does need, based on the one branch tip.

On merge

A "bad" case occurs when you merge in a commit that "reaches" a commit that is later copied by rebase. For instance, suppose you have this:

...--o--o--A--B   <-- origin/feature_X
         \
          C--D    <-- feature_Y

Now you decide it is time to merge origin/feature_X's commits (A and B) into your feature_Y, so you make a merge commit:

...--o--o--A--B   <-- origin/feature_X
         \     \
          C--D--o   <-- feature_Y

If someone else (upstream) decides to rebase and force-push their feature_X, so that your origin/feature_X points to new copies, you end up with this:

          o--A'-B'  <-- origin/feature_X
         /
...--o--o--A--B
         \     \
          C--D--o   <-- feature_Y

That can happen even if there was no name attached to the rebase-copied commits, if you picked up something else by its name. For instance, if someone else pushed feature_F and promised it was done:

       A----B
      /      \
...--o--o--E--F   <-- origin/feature_F
         \
          C--D    <-- feature_Y

and you then merge it, you get this:

       A----B
      /      \
...--o--o--E--F   <-- origin/feature_F
         \     \
          C--D--o   <-- feature_Y

Now suppose they, or a third person, then rebase a branch they have that points to B, without realizing / remembering that commit F itself also points to B. That is, they start with this (note that they do not have your feature_Y):

       A----B     <-- myhacks
      /      \
...--o--o--E--F   <-- feature_F, origin/feature_F

Then then decide that it would be better to rebase myhacks onto commit E, so they run:

$ git checkout myhacks
$ git rebase 123e4567    # <id-of-E>

which produces:

       A----B
      /      \
...--o--o--E--F      <-- feature_F, origin/feature_F
            \
             A'-B'   <-- myhacks

Eventually, when you fetch (perhaps via git pull) and get their final version of myhacks—whether or not it has a name at that time, as long as it has commits A' and B'—you will have (and retain) the original A--B commits, through commit F, and add the A'-B' chain, even though you may never have seen their branch-name myhacks.

Conclusion

The "bad" case we saw above happened when git fetch brought in commit F, via the name (in the repository you're fetching from, presumably one stored on a central server) feature_F. (You and your Git renamed this origin/feature_F.) The problem was not feature_F (or origin/feature_F) itself, though, but rather myhacks: a name neither you, nor the central server, ever saw! The person who did have that name—or maybe even made it up after the fact—used it to copy commits A and B, without thinking about who had the originals. He then pushed the copies, maybe under yet another name.

The names matter at fetch and push time because git fetch and git push transfer commits by refspecs (mostly just pairs of reference names, plus some ancillary stuff). Before and after that point, though, the names are mainly distractions: it's the set of commits, as named by their IDs, and their reachability status, that matters.

0
votes

will git pull origin develop just "fetch" the blue commits or also the green ones?

https://i.stack.imgur.com/zoyEE.png

Git 2.19 (Q3 2018) adds two improvements, one on the client side, one on the server side, when fetching commits (reminder, fetch is called by a git pull).
That influences how "reachability" is done, but won't fix the issue mentioned by torek.

First:

"git fetch" learned a new option "--negotiation-tip" to limit the set of commits it tells the other end as "have", to reduce wasted bandwidth and cycles, which would be helpful when the receiving repository has a lot of refs that have little to do with the history at the remote it is fetching from.

See commit 3390e42 (02 Jul 2018) by Jonathan Tan (jhowtan). (Merged by Junio C Hamano -- gitster -- in commit 30bf8d9, 02 Aug 2018)

fetch-pack: support negotiation tip whitelist

During negotiation, fetch-pack eventually reports as "have" lines all commits reachable from all refs. Allow the user to restrict the commits sent in this way by providing a whitelist of tips; only the tips themselves and their ancestors will be sent.

Both globs and single objects are supported.

This feature is only supported for protocols that support connect or stateless-connect (such as HTTP with protocol v2).

This will speed up negotiation when the repository has multiple relatively independent branches (for example, when a repository interacts with multiple repositories, such as with linux-next and torvalds/linux), and the user knows which local branch is likely to have commits in common with the upstream branch they are fetching.


Second, Git will fetch more commits at a time:

Git adds a server-side knob to skip commits in exponential/fibbonacci stride in an attempt to cover wider swath of history with a smaller number of iterations, potentially accepting a larger packfile transfer, instead of going back one commit a time during common ancestor discovery during the "git fetch" transaction.

See commit 42cc748 (16 Jul 2018) by Jonathan Tan (jhowtan). (Merged by Junio C Hamano -- gitster -- in commit 7c85ee6, 02 Aug 2018)

negotiator/skipping: skip commits during fetch

Introduce a new negotiation algorithm used during fetch that skips commits in an effort to find common ancestors faster.
The skips grow similarly to the Fibonacci sequence as the commit walk proceeds further away from the tips. The skips may cause unnecessary commits to be included in the packfile, but the negotiation step typically ends more quickly.

Usage of this algorithm is guarded behind the configuration flag fetch.negotiationAlgorithm.


Note: as commented in Git 2.24, a setting like fetch.negotiationAlgorithm is still experimental.

See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4f8dfe, 09 Sep 2019)

repo-settings: create feature.experimental setting

The 'feature.experimental' setting includes config options that are not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to use the repo_settings struct if not already using them:

  • 'pack.useSparse=true'
  • 'fetch.negotiationAlgorithm=skipping'

In the case of fetch.negotiationAlgorithm, the existing logic would load the config option only when about to use the setting, so had a die() statement on an unknown string value.
This is removed as now the config is parsed under prepare_repo_settings().


And with Git 2.24 (Q4 2019), a mechanism to affect the default setting for a (related) group of configuration variables is introduced.

See commit aaf633c, commit c6cc4c5, commit ad0fb65, commit 31b1de6, commit b068d9a, commit 7211b9e (13 Aug 2019) by Derrick Stolee (derrickstolee).
(Merged by Junio C Hamano -- gitster -- in commit f4f8dfe, 09 Sep 2019)

repo-settings: create feature.experimental setting

Signed-off-by: Derrick Stolee

The 'feature.experimental' setting includes config options that are not committed to become defaults, but could use additional testing.

Update the following config settings to take new defaults, and to use the repo_settings struct if not already using them:

  • 'pack.useSparse=true'

  • 'fetch.negotiationAlgorithm=skipping'

In the case of fetch.negotiationAlgorithm, the existing logic would load the config option only when about to use the setting, so had a die() statement on an unknown string value.
This is removed as now the config is parsed under prepare_repo_settings().
In general, this die() is probably misplaced and not valuable. A test was removed that checked this die() statement executed.


Git 2.29 (Q4 2020) updates to an on-demand fetching code in lazily cloned repositories.

See commit db3c293 (02 Sep 2020), and commit 9dfa8db, commit 7ca3c0a, commit 5c3b801, commit abcb7ee, commit e5b9421, commit 2b713c2, commit cbe566a (17 Aug 2020) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit b4100f3, 03 Sep 2020)

negotiator/noop: add noop fetch negotiator

Signed-off-by: Jonathan Tan

Add a noop fetch negotiator.

This is introduced to allow partial clones to skip the unneeded negotiation step when fetching missing objects using a "git fetch"(man) subprocess.
(The implementation of spawning a "git fetch"(man) subprocess will be done in a subsequent patch.)
But this can also be useful for end users, e.g. as a blunt fix for object corruption.

git config now includes in its man page:

fetch.negotiationAlgorithm:

Control how information about the commits in the local repository is sent when negotiating the contents of the packfile to be sent by the server.

Set to "skipping" to use an algorithm that skips commits in an effort to converge faster, but may result in a larger-than-necessary packfile;
or set to "noop" to not send any information at all, which will almost certainly result in a larger-than-necessary packfile, but will skip the negotiation step.


With Git 2.32 (Q2 2021), "git push"(man) learns to discover common ancestor with the receiving end over protocol v2.

See commit 6db01a7 (08 Apr 2021) by Junio C Hamano (gitster).
See commit 477673d, commit 9c1e657 (04 May 2021), and commit 6871d0c, commit 57c3451, commit 8102570 (08 Apr 2021) by Jonathan Tan (jhowtan).
(Merged by Junio C Hamano -- gitster -- in commit 644f4a2, 16 May 2021)

fetch: teach independent negotiation (no packfile)

Signed-off-by: Jonathan Tan

Currently, the packfile negotiation step within a Git fetch cannot be done independent of sending the packfile, even though there is at least one application wherein this is useful.
Therefore, make it possible for this negotiation step to be done independently.
A subsequent commit will use this for one such application - push negotiation.

This feature is for protocol v2 only.
(An implementation for protocol v0 would require a separate implementation in the fetch, transport, and transport helper code.)

In the protocol, the main hindrance towards independent negotiation is that the server can unilaterally decide to send the packfile.
This is solved by a "wait-for-done" argument: the server will then wait for the client to say "done".
In practice, the client will never say it; instead it will cease requests once it is satisfied.

In the client, the main change lies in the transport and transport helper code.
fetch_refs_via_pack() performs everything needed - protocol version and capability checks, and the negotiation itself.

There are 2 code paths that do not go through fetch_refs_via_pack() that needed to be individually excluded: the bundle transport (excluded through requiring smart_options, which the bundle transport doesn't support) and transport helpers that do not support takeover.
If or when we support independent negotiation for protocol v0, we will need to modify these 2 code paths to support it.
But for now, report failure if independent negotiation is requested in these cases.

technical/protocol-v2 now includes in its man page:

If the 'wait-for-done' feature is advertised, the following argument can be included in the client's request.

wait-for-done

Indicates to the server that it should never send "ready", but should wait for the client to say "done" before sending the packfile.


Before Git 2.33 (Q3 2021), code recently added to support common ancestry negotiation during "git push"(man) did not sanity check its arguments carefully enough.

See commit eff4045 (08 Jul 2021), and commit 60fadf8, commit 1e5b5ea (30 Jun 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit b2fc822, 16 Jul 2021)

fetch: fix segfault in --negotiate-only without --negotiation-tip=*

Signed-off-by: Ævar Arnfjörð Bjarmason

The recent --negotiate-only option would segfault in the call to oid_array_for_each() in negotiate_using_fetch() unless one or more --negotiation-tip=* options were provided.

All of the other tests for the feature combine both, but nothing was checking this assumption, let's do that and add a test for it.
Fixes a bug in 9c1e657 ("fetch: teach independent negotiation (no packfile)", 2021-05-04, Git v2.32.0-rc0 -- merge).

And:

fetch: document the --negotiate-only option

Signed-off-by: Ævar Arnfjörð Bjarmason

There was no documentation for the --negotiate-only option added in 9c1e657 ("fetch: teach independent negotiation (no packfile)", 2021-05-04, Git v2.32.0-rc0 -- merge), only documentation for the related push.negotiation option added in the following commit in 477673d ("send-pack: support push negotiation", 2021-05-04, Git v2.32.0-rc0 -- merge).

Let's document it, and update the cross-linking I'd added between --negotiation-tip=* and 'fetch.negotiationAlgorithm' in 5266082 ("fetch doc: cross-link two new negotiation options", 2018-08-01, Git v2.19.0-rc0 -- merge listed in batch #7).

I think it would be better to say "in common with the remote" here than "...the server", but the documentation for --negotiation-tip=* above this talks about "the server", so let's continue doing that in this related option.
See 3390e42 ("fetch-pack: support negotiation tip whitelist", 2018-07-02, Git v2.19.0-rc0 -- merge) for that documentation.

git config now includes in its man page:

See also the --negotiate-only and --negotiation-tip options to git fetch.

fetch-options now includes in its man page:

See also the fetch.negotiationAlgorithm and push.negotiate configuration variables documented in git config, and the --negotiate-only option below.

--negotiate-only

Do not fetch anything from the server, and instead print the ancestors of the provided --negotiation-tip=* arguments, which we have in common with the server.

Internally this is used to implement the push.negotiate option, see git config.