21
votes

It's possible to get list of changed files between two commits. Something like that comparison between two commits in web version but using GitHub Api.

4
If you need to do this in GitHub - check thisPankaj Singhal

4 Answers

31
votes

The official commit comparison API is Compare two commits:

GET /repos/:owner/:repo/compare/:base...:head

Both :base and :head can be either branch names in :repo or branch names in other repositories in the same network as :repo. For the latter case, use the format user:branch:

GET /repos/:owner/:repo/compare/user1:branchname...user2:branchname

Note that you can use tags or commit SHAs as well. For instance:

https://api.github.com/repos/git/git/compare/v2.2.0-rc1...v2.2.0-rc2

Note the '...', not '..' between the two tags.
And you need to have the oldest tag first, then the newer tag.

That gives a status:

  "status": "behind",
  "ahead_by": 1,
  "behind_by": 2,
  "total_commits": 1,

And for each commit, information about the files:

"files": [
    {
      "sha": "bbcd538c8e72b8c175046e27cc8f907076331401",
      "filename": "file1.txt",
      "status": "added",
      "additions": 103,
      "deletions": 21,
      "changes": 124,
      "blob_url": "https://github.com/octocat/Hello-World/blob/6dcb09b5b57875f334f61aebed695e2e4193db5e/file1.txt",
      "raw_url": "https://github.com/octocat/Hello-World/raw/6dcb09b5b57875f334f61aebed695e2e4193db5e/file1.txt",
      "contents_url": "https://api.github.com/repos/octocat/Hello-World/contents/file1.txt?ref=6dcb09b5b57875f334f61aebed695e2e4193db5e",
      "patch": "@@ -132,7 +132,7 @@ module Test @@ -1000,7 +1000,7 @@ module Test"
    }
  ]

BUT:

  • The response will include a comparison of up to 250 commits. If you are working with a larger commit range, you can use the Commit List API to enumerate all commits in the range.

  • For comparisons with extremely large diffs, you may receive an error response indicating that the diff took too long to generate. You can typically resolve this error by using a smaller commit range.


Notes:

"same network" means: two repositories hosted by the same Git repository hosting services (two repositories on github.com for example, or on the same on-premise GHE -- GitHub Enterprise -- instance)

You can therefore compare two branches between a repo and its fork.
Example:

https://api.github.com/repos/030/learn-go-with-tests/compare/master...quii:master

(this example compares a fork to its original repo, not the original repo to the fork: that is because the fork, in this case, is behind the original repo)

7
votes

Investigating answers coming with the official API, one can find a barely mentioned way to get diffs from Github. Try this:

  wget -H 'Accept: application/vnd.github.v3.diff' \
    http://github.com/github/linguist/compare/96d29b76...a20631af.diff
  wget -H 'Accept: application/vnd.github.v3.diff' \
    http://github.com/github/linguist/compare/a20631af...96d29b76.diff

This is the link you provided as an example, with .diff appended. And the reverse diff of the same.

The header given makes sure the request is handled by the Github's v3 API. That's currently the default, but might change in the future. See Media Types.

Why two downloads?

Github serves linear diffs from older to newer versions, only. If the requested diff is indeed linear and from an older to a newer version, the second download will be empty.

If the requested diff is linear, but from a newer to an older version, the first download is empty. Instead, the whole diff is in the second download. Depending on what one want to achieve, one can normally apply it to the newer version or reverse-apply (patch -R) it to the older version.

If there is no linear relationship between the pair of requested commits, both downloads get answered with non-zero content. One from the common anchestor to the first commit and another, reversed one from this common anchestor to the other commit. Applying one diff normally and the other one reversed gives what applying the output of git diff 96d29b76..a20631af would give, too.

As far as I can tell, these raw diffs aren't subject to Github's API limitations. Requests for 540 commits with 1002 file changes went flawlessly.

Note: one can also append .patch instead of .diff. Then one still gets one big file for each, but a set of individual patches for each commit inside this file.

4
votes

Traumflug's answer isn't correct if you are using the API to access private repos. Actually, I think that answer doesn't require the header since it works without it in a public repo anyways.

You should not put the .diff at the end of the url and you should use the api subdomain. If you want the diff specifically, you only need to put the appropriate media type header in the request (and the token for authentication).

So for example:

  wget -H 'Accept: application/vnd.github.v3.diff' \
    https://api.github.com/repos/github/linguist/compare/96d29b76...a20631af?access_token=123

GitHub's documentation is super confusing since it says it only works for branch names, but it also accepts commit shas. Also, the returned JSON includes a diff_url that is just a direct link to the diff but does not work if the repo is private, which isn't very helpful.

3
votes

Here's another actual executable example using the HEAD and HEAD~1 references on my public repo DataApp--ParamCompare which should help illuminate the :owner and :repo notation once substituted with clear parameters.

curl -X GET  https://api.github.com/repos/jxramos/DataApp--ParamCompare/compare/HEAD~1...HEAD

As a sanity check the equivalent browser representation can be seen at https://github.com/jxramos/DataApp--ParamCompare/compare/HEAD~1...HEAD

In general the form goes as the following to lend an alternate parameter syntax for the api routing:

https://api.github.com/repos/<owner_name>/<repo_name>/compare/HEAD~1...HEAD

One can also invoke a url such as

https://api.github.com/repos/jxramos/DataApp--ParamCompare/compare/80f0bb42606888ce7fc66b4402fcc90a1709c9e8...255fe089543f5569f90af54168af904e88fc150f

There should be an equivalent graphql means to just pare down and select those results under the files list to select all the filename values to lend something of a git diff --name-only type output straight from remote. I'll update this answer if I figure it out.

My take on this is that the graphql API doesn't conduct operations which is what a diff is, but rather allows to to query primitive types and properties and the like of the repo itself. You can see the sort of entities you're dealing with by looking at the schema itself https://developer.github.com/v4/public_schema/