0
votes

I'm using HTML agility pack to extract text from a node.

            var sb = new StringBuilder();
            foreach (HtmlNode innernode in node.SelectNodes("//*[not(self::script or self::style)]/text()[not(normalize-space(.)='')]"))
            {
                sb.Append(innernode.InnerText);
            }
            Console.WriteLine(sb.ToString());

I'm using this code. I want to extract text from "node" and it's child nodes, but this xpath query returns result from whole html document (it starts search from root node I guess). I know this is stupid, but how can I update XPath so that it searches only in "node"s child nodes :)

Thanks

1
// means "start at the top of the document and go to any depth" so it is doing exactly what you said - matt
See w3schools.com/xpath/xpath_syntax.asp for Xpath syntax. To say "only direct children of this node" start with ./ - matt

1 Answers

2
votes

To include text nodes from node's children (and I assume all other descendants as well), and text nodes of node as well, you probably want:

./descendant-or-self::*[not(self::script or self::style)]/text()[not(normalize-space(.)='')]

.//*[not(self::script or self::style)]/text()[not(normalize-space(.)='')] will not include node's direct children text nodes, since it would mean ./descendant-or-self::*/*[not(self::script or self::style)]....