0
votes

I'm running into an issue when using Go's http client to download a zip or tar.gz file from Github. I get a 403 with the message "Your access to this site has been restricted".

Curl works fine though.

I am running this in an EC2 instance on AWS in the us-west-2 region. In particular,

Ubuntu Server 16.04 LTS (HVM), SSD Volume Type - ami-0807918df10edc141 (64-bit x86) / ami-0c75fb2e6a6be38f6 (64-bit Arm)

Info

Sample code to reproduce:

package main

import (
    "fmt"
    "io/ioutil"
    "net/http"
)

func main() {
    endpoint := "https://github.com/kubeflow/manifests/archive/v1.0.2.tar.gz"

    // or https://api.github.com/repos/kubeflow/manifests/zipball/v0.12.0

    // Get the data
    resp, err := http.Get(endpoint)
    if err != nil {
        fmt.Printf("[error] %v", err)
        return
    }
    defer resp.Body.Close()

    respData, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        fmt.Printf("[error] %v", err)
        return
    }

    // Returns a 403 and html error page
    fmt.Printf("Resp:\n%v\n", string(respData))
}

Note: the above works fine on my local machine, it just seems to stop in the aws instance.

Thanks!

1
403 error is forbidden access, maybe you must send OAuth header or some thing like this to verify your access. see en.wikipedia.org/wiki/HTTP_403ttrasn
Cant reproduce, have success request.Зелёный
Anonymous access is subject to rate limiting. Use the API for programmatic access, not the web frontend. A Go library is available.Peter
@Peter I tried changing it to api.github.com/repos/kubeflow/manifests/zipball/v0.12.0 and I got the same error.Vafilor
Yeah but did you authenticate with an Authorization: token <value> header? developer.github.com/v3/#authenticationevilSnobu

1 Answers

2
votes

That particular error message means that GitHub is restricting you because you're making requests that match a pattern of abuse that's ongoing. GitHub is blocking this pattern because it causes availability concerns for other users.

You should always make your program use a custom User-Agent header because that distinguishes your actions from other people's. (After all, lots of people use Go.) You should acquire the URLs you're using via the API, not via github.com directly. You should also authenticate when possible (e.g., with a token), because GitHub will give authenticated requests higher limits, and if you cause a problem, GitHub can reach out to you. Finally, you should implement appropriate rate-limiting and throttling so that you don't make too many requests and back off or stop completely if you get a 403, 429, or 5xx error.

If you need to download many archives for the same repository, clone it and use git archive, which is far more efficient. Caching data instead of requesting it multiple times is also recommended.

If you do all of these things, you'll probably find that your requests work.