0
votes

When migrating from 4.0 to 4.5 .NET Framework, some crawlers seems not beeing captured in a ASP.NET application.

I have a .browser file in the App_Browsers directory that contains the following definition:

<browser id="Baiduspider" parentID="Default">
<sampleHeaders>
  <header name="Connection" value="close"/>
  <header name="Accept" value="*/*"/>
  <header name="Accept-Encoding" value="gzip"/>
  <header name="Accept-Language" value="zh-cn,zh-tw"/>
  <header name="Host" value="www.example.com"/>
  <header name="User-Agent" value="Baiduspider+(+http://www.baidu.com/search/spider.html)"/>
</sampleHeaders>
<identification>
  <userAgent match="Baiduspider"/>
</identification>
<capabilities>
  <capability name="crawler" value="true"/>
  <capability name="browser" value="Baidu.com"/>
  <capability name="majorversion" value="0"/>
  <capability name="minorversion" value=".0"/>
  <capability name="version" value="0.0"/>
</capabilities>
</browser>

But the line

Request.Browser.Crawler

returns

false

for the Baidu.com User-Agent:

Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)

I suspect than Baidu is not forming well his User-Agent string, because of the parenthesis, but this was working in 4.0 .NET Framework.

Can anyone help me?

Thanks in advance!

1

1 Answers

0
votes

Try changing

<browser id="Baiduspider" parentID="Default">

to

<browser id="Baiduspider" parentID="Mozilla">

The file v4.0.30319\Config\Browsers\generic.browser probably has something like this:

<browser id="Mozilla" parentID="Default">
<identification>
<userAgent match="Mozilla" />
</identification>

So maybe in 4.5 it's matching that "first" before hitting your custom definition?