1
votes

I am taking a stab at html agility pack and having trouble finding the right way to go about this. For example: I want to get the content of second span tag:

htmlDoc.DocumentNode.SelectSingleNode("//div[@style='color:#000000; padding: 10px;']/table/tr[1]/td[1]/span[2]").InnerText;

here is my html file that I want to parse using HTML AGILITY PACK:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
</head>
<body onload="oload()" onunload="Unload()">

<div id="content">
<table width="100%">
<tr>
    <td width="48%" valign="top">
<fieldset style="border:1px solid #ccc;color:#ccc;margin:0;padding:0;">
<legend style="color:#ccc;margin:0 0 0 10px;padding:0 3px;">Profile Information</legend>
<div style="color:#000000; padding: 10px;">
<br />
Name Surname:<br />
<span style="font-size:18px;">John Doe</span>
<br /><br /><br />
Address:<br />
<span style="font-size:18px;">706 test<br>NY 14013</span>
<br /><br /><br />
</div>
</fieldset>
<br />
</td>
    <td width="52%" align="right" valign="top">
</td>
</tr>
</table>
    </div>
</body>
</html>
2

2 Answers

0
votes

According to the HTML snippet posted, all the span elements including the target span[2] are directly inside the div, so the correct XPath would be simply :

//div[@style='color:#000000; padding: 10px;']/span[2]

Online demo link : https://dotnetfiddle.net/mRfLEQ

output :

706 testNY 14013
0
votes

Try this:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using HtmlAgilityPack;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            String html = @"<!DOCTYPE html PUBLIC ""-//W3C//DTD XHTML 1.0 Transitional//EN"" ""http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"">
<html xmlns=""http://www.w3.org/1999/xhtml"">
<head>
</head>
<body onload=""oload()"" onunload=""Unload()"">

<div id=""content"">
<table width=""100%"">
<tr>
    <td width=""48%"" valign=""top"">
<fieldset style=""border:1px solid #ccc;color:#ccc;margin:0;padding:0;"">
<legend style=""color:#ccc;margin:0 0 0 10px;padding:0 3px;"">Profile Information</legend>
<div style=""color:#000000; padding: 10px;"">
<br />
Name Surname:<br />
<span style=""font-size:18px;"">John Doe</span>
<br /><br /><br />
Address:<br />
<span style=""font-size:18px;"">706 test<br>NY 14013</span>
<br /><br /><br />
</div>
</fieldset>
<br />
</td>
    <td width=""52%"" align=""right"" valign=""top"">
</td>
</tr>
</table>
    </div>
</body>
</html>";

            var doc = new HtmlAgilityPack.HtmlDocument();
            doc.LoadHtml(html);
            var spans = doc.DocumentNode.SelectNodes("//span");
            Console.WriteLine(spans[1].InnerText);
        }
    }
}

Basically, doc.DocumentNode.SelectNodes("//span"); will give all span node and use index to display the 2nd innertext