48
votes

According to the V2 documentation, you can list all commits for a branch with:

commits/list/:user_id/:repository/:branch

I am not seeing the same functionality in the V3 documentation.

I would like to collect all branches using something like:

https://api.github.com/repos/:user/:repo/branches

And then iterate through them, pulling all commits for each. Alternatively, if there's a way to pull all commits for all branches for a repo directly, that would work just as well if not better. Any ideas?

UPDATE: I tried passing the branch :sha as a param as follows:

params = {:page => 1, :per_page => 100, :sha => b}

The problem is that when i do this, it doesn't page the results properly. I feel like we're approaching this incorrectly. Any thoughts?

4
Could you describe what you mean by "it doesn't page the results properly"?Kevin Sawicki
By the way if you only need the hash of the commits you could do git log --pretty="%h"George Pligoropoulos

4 Answers

43
votes

I have encountered the exact same problem. I did manage to acquire all the commits for all branches within a repository (probably not that efficient due to the API).

Approach to retrieve all commits for all branches in a repository

As you mentioned, first you gather all the branches:

# https://api.github.com/repos/:user/:repo/branches
https://api.github.com/repos/twitter/bootstrap/branches

The key that you are missing is that APIv3 for getting commits operates using a reference commit (the parameter for the API call to list commits on a repository sha). So you need to make sure when you collect the branches that you also pick up their latest sha:

Trimmed result of branch API call for twitter/bootstrap

[
  {
    "commit": {
      "url": "https://api.github.com/repos/twitter/bootstrap/commits/8b19016c3bec59acb74d95a50efce70af2117382",
      "sha": "8b19016c3bec59acb74d95a50efce70af2117382"
    },
    "name": "gh-pages"
  },
  {
    "commit": {
      "url": "https://api.github.com/repos/twitter/bootstrap/commits/d335adf644b213a5ebc9cee3f37f781ad55194ef",
      "sha": "d335adf644b213a5ebc9cee3f37f781ad55194ef"
    },
    "name": "master"
  }
]

Working with last commit's sha

So as we see the two branches here have different sha, these are the latest commit sha on those branches. What you can do now is to iterate through each branch from their latest sha:

# With sha parameter of the branch's lastest sha
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=d335adf644b213a5ebc9cee3f37f781ad55194ef

So the above API call will list the last 100 commits of the master branch of twitter/bootstrap. Working with the API you have to specify the next commit's sha to get the next 100 commits. We can use the last commit's sha (which is 7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa using the current example) as input for the next API call:

# Next API call for commits (use the last commit's sha)
# https://api.github.com/repos/:user/:repo/commits
https://api.github.com/repos/twitter/bootstrap/commits?per_page=100&sha=7a8d6b19767a92b1c4ea45d88d4eedc2b29bf1fa

This process is repeated until the last commit's sha is the same as the API's call sha parameter.

Next branch

That is it for one branch. Now you apply the same approach for the other branch (work from the latest sha).


There is a large issue with this approach... Since branches share some identical commits you will see the same commits over-and-over again as you move to another branch.

I can image that there is a much more efficient way to accomplish this, yet this worked for me.

30
votes

I asked this same question for GitHub support, and they answered me this:

GETing /repos/:owner/:repo/commits should do the trick. You can pass the branch name in the sha parameter. For example, to get the first page of commits from the '3.0.0-wip' branch of the twitter/bootstrap repository, you would use the following curl request:

curl https://api.github.com/repos/twitter/bootstrap/commits?sha=3.0.0-wip

The docs also describe how to use pagination to get the remaining commits for this branch.

As long as you are making authenticated requests, you can make up to 5,000 requests per hour.

I used the rails github-api in my app as follows(using https://github.com/peter-murach/github gem):

github_connection = Github.new :client_id => 'your_id', :client_secret => 'your_secret', :oauth_token => 'your_oath_token'
branches_info = {}
all_branches = git_connection.repos.list_branches owner,repo_name
all_branches.body.each do |branch|
    branches_info["#{branch.name}".to_s] = "#{branch.commit.url}"
end
branches_info.keys.each do |branch|
    commits_list.push (git_connection.repos.commits.list owner,repo_name, start_date,      end_date, :sha => "branch_name")
end
19
votes

Using GraphQL API v4

You can use GraphQL API v4 to optimize commits download per branch. In the following method, I've managed to download in a single request 1900 commits (100 commits per branch in 19 different branches) which drastically reduces the number of requests (compared to using REST api).

1 - Get all branches

You will have to get all branches & go through pagination if you have more than 100 branches :

Query :

query($owner:String!, $name:String!, $branchCursor: String!) {
  repository(owner: $owner, name: $name) {
    refs(first: 100, refPrefix: "refs/heads/",after: $branchCursor) {
      totalCount
      edges {
        node {
          name
          target {
            ...on Commit {
              history(first:0){
                totalCount
              }
            }
          }
        }
      }
      pageInfo {
        endCursor
        hasNextPage
      }
    }
  }
}

variables :

{
  "owner": "google",
  "name": "gson",
  "branchCursor": ""
}

Try it in the explorer

Note that branchCursor variable is used when you have more than 100 branches & features the value of pageInfo.endCursor in the previous request in that case.

2 - Split the branches array into array of 19 branches max

There is some limitation of the number of request per nodes that prevents us from making too much query per node. Here, some testing I've performed showed that we can't go over 19*100 commits in a single query.

Note that in case of repo which have < 19 branches, you don't need to bother about that

3 - Query commits by chunk of 100 for each branch

You can then create your query dynamically for getting the 100 next commits on all branches. An example with 2 branches :

query ($owner: String!, $name: String!) {
  repository(owner: $owner, name: $name) {
    branch0: ref(qualifiedName: "JsonArrayImplementsList") {
      target {
        ... on Commit {
          history(first: 100) {
            ...CommitFragment
          }
        }
      }
    }
    branch1: ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(first: 100) {
            ...CommitFragment
          }
        }
      }
    }
  }
}

fragment CommitFragment on CommitHistoryConnection {
  totalCount
  nodes {
    oid
    message
    committedDate
    author {
      name
      email
    }
  }
  pageInfo {
    hasNextPage
    endCursor
  }
}

Try it in the explorer

  • The variables used are owner for the repo's owner & name for the name of the repo.
  • A fragment in order to avoid duplication of commit history field definition.

You can see that pageInfo.hasNextpage & pageInfo.endCursor will be used to go through pagination for each branch. The pagination takes place in history(first: 100) with specification of the last cursor encountered. For instance the next request will have history(first: 100, after: "6e2fcdcaf252c54a151ce6a4441280e4c54153ae 99"). For each branch, we have to update the request with the last endCursor value to query for the 100 next commit.

When pageInfo.hasNextPage is false, there is no more page for this branch, so we won't include it in the next request.

When the last branch have pageInfo.hasNextPage to false, we have retrieved all commits

Sample implementation

Here is a sample implementation in NodeJS using github-graphql-client. The same method could be implemented in any other language. The following will also store commits in a file commitsX.json :

var client = require('github-graphql-client');
var fs = require("fs");

const owner = "google";
const repo = "gson";
const accessToken = "YOUR_ACCESS_TOKEN";

const branchQuery = `
query($owner:String!, $name:String!, $branchCursor: String!) {
  repository(owner: $owner, name: $name) {
    refs(first: 100, refPrefix: "refs/heads/",after: $branchCursor) {
      totalCount
      edges {
        node {
          name
          target {
            ...on Commit {
              history(first:0){
                totalCount
              }
            }
          }
        }
      }
      pageInfo {
        endCursor
        hasNextPage
      }
    }
  }
}`;

function buildCommitQuery(branches){
    var query = `
        query ($owner: String!, $name: String!) {
          repository(owner: $owner, name: $name) {`;
    for (var key in branches) {
        if (branches.hasOwnProperty(key) && branches[key].hasNextPage) {
          query+=`
            ${key}: ref(qualifiedName: "${branches[key].name}") {
              target {
                ... on Commit {
                  history(first: 100, after: ${branches[key].cursor ? '"' + branches[key].cursor + '"': null}) {
                    ...CommitFragment
                  }
                }
              }
            }`;
        }
    }
    query+=`
          }
        }`;
    query+= commitFragment;
    return query;
}

const commitFragment = `
fragment CommitFragment on CommitHistoryConnection {
  totalCount
  nodes {
    oid
    message
    committedDate
    author {
      name
      email
    }
  }
  pageInfo {
    hasNextPage
    endCursor
  }
}`;

function doRequest(query, variables) {
  return new Promise(function (resolve, reject) {
    client({
        token: accessToken,
        query: query,
        variables: variables
    }, function (err, res) {
      if (!err) {
        resolve(res);
      } else {
        console.log(JSON.stringify(err, null, 2));
        reject(err);
      }
    });
  });
}

function buildBranchObject(branch){
    var refs = {};

    for (var i = 0; i < branch.length; i++) {
        console.log("branch " + branch[i].node.name);
        refs["branch" + i] = {
            name: branch[i].node.name,
            totalCount: branch[i].node.target.history.totalCount,
            cursor: null,
            hasNextPage : true,
            commits: []
        };
    }
    return refs;
}

async function requestGraphql() {
    var iterateBranch = true;
    var branches = [];
    var cursor = "";

    // get all branches
    while (iterateBranch) {
        let res = await doRequest(branchQuery,{
          "owner": owner,
          "name": repo,
          "branchCursor": cursor
        });
        iterateBranch = res.data.repository.refs.pageInfo.hasNextPage;
        cursor = res.data.repository.refs.pageInfo.endCursor;
        branches = branches.concat(res.data.repository.refs.edges);
    }

    //split the branch array into smaller array of 19 items
    var refChunk = [], size = 19;

    while (branches.length > 0){
        refChunk.push(branches.splice(0, size));
    }

    for (var j = 0; j < refChunk.length; j++) {

        //1) store branches in a format that makes it easy to concat commit when receiving the query result
        var refs = buildBranchObject(refChunk[j]);

        //2) query commits while there are some pages existing. Note that branches that don't have pages are not 
        //added in subsequent request. When there are no more page, the loop exit
        var hasNextPage = true;
        var count = 0;

        while (hasNextPage) {
            var commitQuery = buildCommitQuery(refs);
            console.log("request : " + count);
            let commitResult = await doRequest(commitQuery, {
              "owner": owner,
              "name": repo
            });
            hasNextPage = false;
            for (var key in refs) {
                if (refs.hasOwnProperty(key) && commitResult.data.repository[key]) {
                    isEmpty = false;
                    let history = commitResult.data.repository[key].target.history;
                    refs[key].commits = refs[key].commits.concat(history.nodes);
                    refs[key].cursor = (history.pageInfo.hasNextPage) ? history.pageInfo.endCursor : '';
                    refs[key].hasNextPage = history.pageInfo.hasNextPage;
                    console.log(key + " : " + refs[key].commits.length + "/" + refs[key].totalCount + " : " + refs[key].hasNextPage + " : " + refs[key].cursor + " : " + refs[key].name);
                    if (refs[key].hasNextPage){
                        hasNextPage = true;
                    }
                }
            }
            count++;
            console.log("------------------------------------");
        }
        for (var key in refs) {
            if (refs.hasOwnProperty(key)) {
                console.log(refs[key].totalCount + " : " + refs[key].commits.length + " : " + refs[key].name);
            }
        }

        //3) write commits chunk (up to 19 branches) in a single json file
        fs.writeFile("commits" + j + ".json", JSON.stringify(refs, null, 4), "utf8", function(err){
            if (err){
                console.log(err);
            }
            console.log("done");
        });
    }
}

requestGraphql();

This also work with repo with a lot of branches, for instances this one which has more than 700 branches

Rate Limit

Note that while it is true that with GraphQL you can perform a reduced number of requests, it won't necessarily improve your rate limit as the rate limit is based on points & not a limited number of requests : check GraphQL API rate limit

0
votes

Pure JS Implementation without Access Token (Unauthorised Usage)

const base_url = 'https://api.github.com';

    function httpGet(theUrl, return_headers) {
        var xmlHttp = new XMLHttpRequest();
        xmlHttp.open("GET", theUrl, false); // false for synchronous request
        xmlHttp.send(null);
        if (return_headers) {
            return xmlHttp
        }
        return xmlHttp.responseText;
    }

    function get_all_commits_count(owner, repo, sha) {
        let first_commit = get_first_commit(owner, repo);
        let compare_url = base_url + '/repos/' + owner + '/' + repo + '/compare/' + first_commit + '...' + sha;
        let commit_req = httpGet(compare_url);
        let commit_count = JSON.parse(commit_req)['total_commits'] + 1;
        console.log('Commit Count: ', commit_count);
        return commit_count
    }

    function get_first_commit(owner, repo) {
        let url = base_url + '/repos/' + owner + '/' + repo + '/commits';
        let req = httpGet(url, true);
        let first_commit_hash = '';
        if (req.getResponseHeader('Link')) {
            let page_url = req.getResponseHeader('Link').split(',')[1].split(';')[0].split('<')[1].split('>')[0];
            let req_last_commit = httpGet(page_url);
            let first_commit = JSON.parse(req_last_commit);
            first_commit_hash = first_commit[first_commit.length - 1]['sha']
        } else {
            let first_commit = JSON.parse(req.responseText);
            first_commit_hash = first_commit[first_commit.length - 1]['sha'];
        }
        return first_commit_hash;
    }

    let owner = 'getredash';
    let repo = 'redash';
    let sha = 'master';
    get_all_commits_count(owner, repo, sha);

Credits - https://gist.github.com/yershalom/a7c08f9441d1aadb13777bce4c7cdc3b