1
votes

I have the following html source loaded in a UIWebView
I want to extract
text1
text2 text2
text3 text3 text3

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
    <title>1322170516271</title>
    <meta name="viewport" content="initial-scale=1.0, user-scalable=1, minimum-scale=1.0, maximum-scale=4.0">                   

    <style type="text/css">
    body
    {
        padding: 5px;
        margin: 0px;
        font-family: Helvetica, Arial;
        font-size: 12pt;
        background-color: #efefef;
        background-image: url(ArticleBackground.jpg);
        background-position: cover;
        color: #000000;
    }
    h1
    {
        text-align: center;
        border-bottom: 1px dotted #805050;
        font-size: 28px;
        line-height: 38px;
        margin-bottom: 30px;
        text-shadow: 0 2px 1px white;
        color: #803030;
    }
    </style>

</head>

<body>

    <script type="text/javascript">
    function printMe()
    {
        print();
    }
    </script>

    <div style='align:center; padding: 20px;'>

        <div>

    <b>text1</b><br><br>

    <h2>
      text2 text2
    </h2>
    <br>
    text3 text3 text3

        </div>

    </div>

</body>
</html>

but here is what I get when I use

[webView stringByEvaluatingJavaScriptFromString:@"document.documentElement.textContent"]

I don't need the body and h1. I only want the actual text that is user facing.

234534546



    body
{
    padding: 5px;
    margin: 0px;
    font-family: Helvetica, Arial;
    font-size: 12pt;
    background-color: #efefef;
    background-image: url(ArticleBackground.jpg);
    background-position: cover;
    color: #000000;
}
h1
{
    text-align: center;
    border-bottom: 1px dotted #805050;
    font-size: 28px;
    line-height: 38px;
    margin-bottom: 30px;
    text-shadow: 0 2px 1px white;
    color: #803030;
}







    function printMe()
    {
        print();
    }






text1


  text2 text2


text3 text3 text3

Thanks for any insight.

UPDATE

[webView stringByEvaluatingJavaScriptFromString:@"document.body.innerHTML"] won't work either for my goal

<script type="text/javascript">
    function printMe()
    {
        print();
    }
    </script>

    <div style="align:center; padding: 20px;">

        <div>

    <b>text1</b><br><br>

    <h2>
       text2 text2
    </h2>
    <br>
    text3 text3 text3

        </div>

    </div>

update: this is needed for an existing project. If I had the chance to redesign it, a solution would be easy to find. But given this HTML source as it is, it might make it a bit difficult.

2
any ideas based on the updates?Zsolt

2 Answers

1
votes

Try using :

document.body.innerHTML

Or take a look at parsing HTML: parsing HTML on the iPhone There are many other links on SO.

1
votes

why dont you put all your text into different tags such as div,p,etc . give id's to each of them and then get the text within them by the syntax

var text1 = document.getElementById("your ID").innerHTML

hope this works with your problem.