9
votes

I am trying to scape this url https://www.myntra.com/laptop-bag/chumbak/chumbak-unisex-brown-geo-bird--printed-laptop-bag/6795882/buy using puppeteer. It's working when i use { headless: false }, but failing in headless mode.

Then i have compared response in both cases using this.

const resp = await page.goto(url);
console.log(resp);

Then i figured out that we need to add userAgent when using headless mode. so i have added this.

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');

Now it is working in both cases locally. But when i deploy to cloud function, it is still failing.

This is the screenshot taken using puppeteer. enter image description here

this is some part of the response log.

_headers: 
   { status: '403',
     server: 'AkamaiGHost',
     'mime-version': '1.0',
     'content-type': 'text/html',
     'content-length': '395',
     expires: 'Thu, 09 Jul 2020 12:16:30 GMT',
     date: 'Thu, 09 Jul 2020 12:16:30 GMT',
     'set-cookie': 'AKA_A2=A; expires=Thu, 09-Jul-2020 13:16:30 GMT........

Am i missing anything?

Thanks.

update:

I have used puppeteer stealth plugin along with IP rotation. here is the code

const puppeteer = require('puppeteer-extra');

const StealthPlugin = require('puppeteer-extra-plugin-stealth')
puppeteer.use(StealthPlugin())

const AdblockerPlugin = require('puppeteer-extra-plugin-adblocker')
puppeteer.use(AdblockerPlugin({ blockTrackers: true }))

And for IP rotation:

var browser = await puppeteer.launch({
           headless: true,
           args: ['--proxy-server=abcd-efg.proxymesh.com:12345']
         });

var page = await browser.newPage();

await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36');

await page.authenticate({
          username: 'myusername',
          password: 'mypassword'
        });

IP rotation working locally but still blocked on cloud function.

1
The IP and headless footprint is probably banned in that Akamai. You have to use stealth and other bypassing technologies. - Md. Abu Taher
@Md.AbuTaher I just now tried using puppeteer-extra but still not working on cloud function. - vjnan369
Did you try with puppeteer stealth plugin with some dedicated proxies? - Md. Abu Taher
@Md.AbuTaher yes, i tried it with ProxyMesh now. IP rotation working locally but still same error in cloud function. - vjnan369
If the stealth and proxies are not working, then you need to compare the request headers for the two instances, local and cloud function. It could be that the cloud function is not sending the same headers. - Tom

1 Answers

5
votes

Using residential proxies fixed the issue.

Initially I have deployed in cloud function and AWS lambda with IP rotation. I have used proxymesh service for IP rotation. but it provides data center proxies only. It was failed. Then i tried with residential proxies from another service. It worked.