0
votes

I'm trying to get links from page source in Windows Store App. I'm using HtmlAgilityPack and here is my code

HttpClient client = new HttpClient();
        client.MaxResponseContentBufferSize = 256000;
        client.DefaultRequestHeaders.Add("user-agent", "Mozilla/5.0 
        (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)");

        string source = await client.GetStringAsync(url);
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(source);
        List<String> links = doc
        .DocumentNode
        .SelectNodes("//a[@href]")
        .Select(node => node.Attributes["href"].Value)
        .ToList();

I'm getting error

The type 'System.Xml.XPath.IXPathNavigable' is defined in an assembly that is not referenced. You must add a reference to assembly 'System.Xml.XPath, Version=2.0.5.0, Culture=neutral, PublicKeyToken=31bf3856ad364e35' in line where variable doc is created.

But when I add reference to System.Xml.XPath from MicrosoftSDKs folder I'm getting

Cannot find type System.SystemException in module mscorlib.dll

How to fix it?

2

2 Answers

0
votes

Look at this simple article: C# Scraping

for instance this is pretty easy:

using System.Diagnostics;
using System.Net;

class Program
{
    static void Main()
    {
    // Scrape links from wikipedia.org

    // 1.
    // URL: http://en.wikipedia.org/wiki/Main_Page
    WebClient w = new WebClient();
    string s = w.DownloadString("http://en.wikipedia.org/wiki/Main_Page");

    // 2.
    foreach (LinkItem i in LinkFinder.Find(s))
    {
        Debug.WriteLine(i);
    }
    }
}
0
votes

I think the suggestion to reference an assembly to bring in System.Xml.XPath is off the mark. When I compile your code, I can't resolve HttpClient. Once I include a reference to System.Net.Http and add using System.Net.Http; at the top of the file containing your code, it immediately compiles (well, as soon as I define, e.g., var url = "http://apps.microsoft.com/windows/en-us/app/appstudio-contoso-sample-app/748084e6-e1da-40d5-9571-35c750b26d5e";, it compiles).