How to program at client side to get Html Snapshot (or to capture all texts) of entire GWT page?

Question

To let you understand what I want, please read this:

Suppose you have a GWT page (mydomain.com#!article). That page contain many widgets and data downloaded from DB. The DB data & the widgets are mixed into each other, for example a label can hold Customer Name (customer name came from DB).

So, everything on that page is javascript, ie when you view source you can only see Javascript. However, if you open that GWT page in Chrome & save it as "myGwtArticlePage.htm" into your local PC, then reopen the "myGwtArticlePage.htm", you can see that all the text, widgets... in "myGwtArticlePage.htm" is exactly the same as the ones in "mydomain.com#!article".

Now, you right-click & view-source of "myGwtArticlePage.htm", you will see not just Javascript but all text, & Db data & widget still in there.

So, the "myGwtArticlePage.htm" is called the Html Snapshot of the "mydomain.com#!article".

Are you Clear?

Now I want program at client side to be able to capture all texts of "myGwtArticlePage.htm".

So, MyArticlePresenter.java (in Client package) should work like this:

private AsyncCallback<GetArticleResult> getArticleCallback=new AsyncCallback<GetArticleResult>(){
    @Override
public void onSuccess(GetArticleResult result) {
        String articleData=result.getArticleData;
        //... many other data from DB .....

        myLabel.setText(articleData);
        //... many other widgets that setText of the DB data ....

        // Now what I should do here to get Html Snapshot of "`mydomain.com#!article`" ??

    }
}

Note: people say that I can use HTMLUnit, but HtmlUnit work at server not at client package. Besides, HTMLUnit couldn't parse GWTP page properly. GWTP is GWT app buit under GWTP framework.

I hope someone can help me to ask this question.

Try with RootPanel.getBodyElement().getInnerHTML() or RootPanel.getBodyElement().getInnerText(). You can call it for particular Element. — Braj
I don't know what do you want to achieve? All the client side code is in JavaScript. — Braj
i want to make my GWTP app crawlable since HTMLUnit does not work for GWTP. — Tum

Colin Alworth Colin Alworth · Accepted Answer · 2014-05-19T02:12:06

Client code, by definition, has to run on the client - in the case of GWT or any HTML/JS app, this means in a web browser.

HtmlUnit is a web browser, but is one that never renders to the screen. You can still ask it for the HTML contents of the current page. It is entirely written in Java, so it runs easily in any JVM, including a server. Also consider PhantomJS, a headless Chromium - you can script it to take screenshots, export html, etc. It is a native app, you'd need to get the right build for your server, and you'd need to do the wiring to call PhantomJS.

HtmlUnit should work perfectly with GWT, when properly configured. GWTTestCases by default use HtmlUnit to run GWT tests in a browser without launching a 'real' browser instance. To do this of course, HtmlUnit must run in a normal JVM, like what your server would be running on.

That said, you can't call into client code directly, but instead you just launch the HTML/JS (compiled from GWT/Java) page in the HtmlUnit browser.

Another detail to consider - speed. Especially if you are interested in being indexed by a search engine, speed is important for two reasons. First, the search engine will be hitting these pages quickly, and you don't want your server to be killed when Google/Yahoo/Bing drives by and downloads all the URLs it can find. Second, among other things, the server may take the speed of a url being finished into account when ranking the page.

Dev mode is slow because of calling back and forth between the JS in the browser (even HtmlUnit) and the real Java. Since you are compiling your app to JS anyway, there is no need to use dev mode on the server
Pre-loading pages: Consider running htmlunit periodically and generating plain html pages that can then be served up right away when requested. If your data doesn't change often, this could work well for you, and would be fastest of all, if the page was already pre-rendered and cached.

Finally, if already using GWTP, look into their Crawler Service, which appears to be designed to filter our URLs with encoded fragments and translate to #! tokens, then pass that to your GWTP application. See https://github.com/ArcBees/GWTP/wiki/Crawler-Support for more details.

How to program at client side to get Html Snapshot (or to capture all texts) of entire GWT page?

1 Answers