1
votes

I'm making a program that takes a person's tweet, and if it contains an image, download it.

Why can I download an image from this URL (Example #1)
http://www.google.co.id/intl/en_com/images/logo_plain.png
And not from this URL (Example #2)
https://www.google.com/imgres?imgurl=https://pbs.twimg.com/media/DR-kkH4XcAAQ-vc.jpg&imgrefurl=https://twitter.com/officialmcafee/status/945655402276024320&h=1200&w=992&tbnid=0q3B6ZB_UxjRIM&tbnh=247&tbnw=204&usg=__xvjbjSSMvuImESBLVvBBrUagUe8=&docid=vdqkoUmaefYoFM

Example #1

#include <iostream> 
#include <curl/curl.h> 

using namespace std;
int main()
{
    CURL *image;
    CURLcode imgresult;
    FILE *fp = nullptr;
    const char *url = "http://www.google.co.id/intl/en_com/images/logo_plain.png";
    image = curl_easy_init();
    if (image)
    {
        // Open file 
        fp = fopen("img.png", "wb");
        if (fp == NULL) cout << "File cannot be opened";

        curl_easy_setopt(image, CURLOPT_WRITEFUNCTION, NULL);
        curl_easy_setopt(image, CURLOPT_WRITEDATA, fp);
        curl_easy_setopt(image, CURLOPT_URL, url);
        // Grab image 
        imgresult = curl_easy_perform(image);
        if (imgresult)
            cout << "Cannot grab the image!\n";
    }
    // Clean up the resources 
    curl_easy_cleanup(image);
    // Close the file 
    fclose(fp);
    system("pause");
    return 0;
}

Example #2

#include <iostream> 
#include <curl/curl.h> 

using namespace std;
int main()
{
    CURL *image;
    CURLcode imgresult;
    FILE *fp = nullptr;
    const char *url = "https://www.google.com/imgres?imgurl=https://pbs.twimg.com/media/DR-kkH4XcAAQ-vc.jpg&imgrefurl=https://twitter.com/officialmcafee/status/945655402276024320&h=1200&w=992&tbnid=0q3B6ZB_UxjRIM&tbnh=247&tbnw=204&usg=__xvjbjSSMvuImESBLVvBBrUagUe8=&docid=vdqkoUmaefYoFM";
    image = curl_easy_init();
    if (image)
    {
        // Open file 
        fp = fopen("img.png", "wb");
        if (fp == NULL) cout << "File cannot be opened";

        curl_easy_setopt(image, CURLOPT_WRITEFUNCTION, NULL);
        curl_easy_setopt(image, CURLOPT_WRITEDATA, fp);
        curl_easy_setopt(image, CURLOPT_URL, url);
        // Grab image 
        imgresult = curl_easy_perform(image);
        if (imgresult)
            cout << "Cannot grab the image!\n";
    }
    // Clean up the resources 
    curl_easy_cleanup(image);
    // Close the file 
    fclose(fp);
    system("pause");
    return 0;
}
1
Most likely because you're obviously getting an HTTP redirect from google. Not familiar with libcurl, but it's possible that there's an option that can be set to automatically follow HTTP redirects. If not, you will have to do more work to extract the real URL from Google's response, and attempt to download from the real URL.Sam Varshavchik
@SamVarshavchik But even if I use the address of the original image: pbs.twimg.com/media/DR-kkH4XcAAQ-vc.jpg it still doesn't workuser9143463
I had no problems, whatsoever, using the curl command line client to download the image from the URL you included in your comment. Like I said, I am no familiar with libcurl, a brief Google search found the documentation for libcurl, and after reading curl_easy_setopt()'s documentation, and looking at your code, it seems obvious why you don't download anything. That's what you told the library to do: not download anything. You set CURLOPT_WRITEFUNCTION to NULL. Seems you told the library to ignore everything it downloads (no write function). So what did you expect to happen?Sam Varshavchik
I don't know what you're doing man, but I changed the CURLOPT_WRITEFUNCTION and it still won't download.user9143463
Just switched it back to NULL. Now it works. I have no idea what was going on.user9143463

1 Answers

0
votes

First, this is not a link to an image. This is html page. Note, that your code doesn't download any image the html page refers to, but just html page.

Second, you are not following redirects. Add one more option:

curl_easy_setopt(image, CURLOPT_FOLLOWLOCATION, 1);

Third, you'd better to pretend to be a browser:

curl_easy_setopt(image, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36");

Once I added both options, I managed to download your link.