4
votes

I have an XMLNode whose body looks like this: (via OpenCalais)

    <SocialTag importance="2">Signal processing
<originalValue>Signal processing</originalValue>
</SocialTag>

When I call XMLMNode.InnerText on it, I get back:

SignalprocessingSignalprocessing

However, I only want the InnerText from the tag itself, and not the InnerText of the child 'original value' node.

When I call XMLNode.Value, it returns null.

How can I get just the InnerText of this node, without concatenating all of the InnerTexts of other child nodes?

4

4 Answers

8
votes

The text inside the XmlNode is actually another XmlNode of type text. This should work:

socialTagNode.ChildNodes[0].Value
1
votes

From the docs, XmlElement.InnerText

Gets or sets the concatenated values of the node and all its children.

While this statement is not not entirely clear, it implies that the property descends the DOM hierarchy under the element and concatenates all text values into the returned value -- the behavior you are seeing.

Extending the accepted answer, here are extension methods adapted from the reference source that collect and return all immediate text children of a given node:

public static partial class XmlNodeExtensions
{
    /// <summary>
    /// Returns all immediate text values of the given node, concatenated into a string
    /// </summary>
    /// <param name="node"></param>
    /// <returns></returns>
    public static string SelfInnerText(this XmlNode node)
    {
        // Adapted from http://referencesource.microsoft.com/#System.Xml/System/Xml/Dom/XmlNode.cs,66df5d2e6b0bf5ae,references
        if (node == null)
            return null;
        else if (node is XmlProcessingInstruction || node is XmlDeclaration || node is XmlCharacterData)
        {
            // These are overridden in the reference source.
            return node.InnerText;
        }
        else
        {
            var firstChild = node.FirstChild;
            if (firstChild == null)
                return string.Empty;
            else if (firstChild.IsNonCommentText() && firstChild.NextSibling == null)
                return firstChild.InnerText; // Optimization.
            var builder = new StringBuilder();
            for (var child = firstChild; child != null; child = child.NextSibling)
            {
                if (child.IsNonCommentText())
                    builder.Append(child.InnerText);
            }
            return builder.ToString();
        }
    }

    /// <summary>
    /// Enumerates all immediate text values of the given node.
    /// </summary>
    /// <param name="node"></param>
    /// <returns></returns>
    public static IEnumerable<string> SelfInnerTexts(this XmlNode node)
    {
        // Adapted from http://referencesource.microsoft.com/#System.Xml/System/Xml/Dom/XmlNode.cs,66df5d2e6b0bf5ae,references
        if (node == null)
            yield break;
        else if (node is XmlProcessingInstruction || node is XmlDeclaration || node is XmlCharacterData)
        {
            // These are overridden in the reference source.
            yield return node.InnerText;
        }
        else
        {
            var firstChild = node.FirstChild;
            for (var child = firstChild; child != null; child = child.NextSibling)
            {
                if (child.IsNonCommentText())
                    yield return child.InnerText;
            }
        }
    }

    public static bool IsNonCommentText(this XmlNode node)
    {
        return node != null &&
            (node.NodeType == XmlNodeType.Text || node.NodeType == XmlNodeType.CDATA
            || node.NodeType == XmlNodeType.Whitespace || node.NodeType == XmlNodeType.SignificantWhitespace);
    }
}

Then use it like:

var value = XMLMNode.SelfInnerText();

Sample fiddle.

0
votes

You could try the following, with node your tag:

var result="";
var nodes = node.childNodes
for (var i=0,len=nodes.length; i<len; i++)
{
   var node=nodes[i];
   if (node.nodeType==node.TEXT_NODE)
   {
       result += node.nodeValue;
   }
}

It should cncatenate all the textnodes inside your main node and ignore children elements

0
votes

So there's a few things:

  1. The InnerText by definition, gives you the text for all child nodes. Asking for "the InnerText of [just] this node" doesn't make sense in terms of the tools the api gives you.
  2. What you're looking for is a child node of Text type (or possibly CDATA, depending on your circumstances). Most (all?) times this will be the first ChildNode.