I am having trouble getting some specific table's with HTML Agility Pack. I cannot change the actual HTML either, so I can't use other ID"s or Classes or anything.
Can someone show me how I would access each individual table of the following?
<table class="newTable">
//table 1 contents
<table border="0" cellpadding="3" cellspacing="2" width="100%">
//table 1 - A contents
</table>
</table>
<table border="0" cellpadding="0" cellspacing="0" class="newTable">
//table 2 contents
<table width="100%" border="0" cellspacing="2" cellpadding="0">
//table 2 - A contents
</table>
<table width="100%" border="0" cellspacing="2" cellpadding="0">
//table 2 - B contents
</table>
<table width="100%" cellspacing="2" cellpadding="0">
//table 2 - C contents
</table>
</table>
<table>
//table 3 contents
</table>
Right now if I were to call the following
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table");
foreach (var cell in table.SelectNodes("//tr/td"))
{
string someVariable = cell.InnerText
}
I would go through everything. I want to be able to access tables differently to correlate where I am storing the data.
I have tried looking at something like
doc.DocumentNode.SelectNodes("//table[1]");
but using an index does not seem to work, when I try to specify a table with it, it still reads in all tables or none.
Same thing applies to this, it either does not work at all or gets everything.
foreach (var cell in table.SelectNodes("//table").Skip(some_number))
{
string someVariable = cell.InnerText
}
I am using the NuGet package of HTML Agility Pack 1.4.9
EDIT:
My attempt to get ONLY Table 1 - A's contents. Both give null or endcodingfound exceptions.
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table/tr/td/table[1]");
HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[1]/tr/td/table[1]");