3
votes

I need to capture an image of generated HTML. I'm using Alex Filipovici's excellent solution from here: Convert HTML string to image. It works great except when I'm trying to load a page that has an iframe that uses some Javascript to load.

        static int width = 1024;
        static int height = 768;

        public static void Capture()
        {
            var html = @"
<!DOCTYPE html>
<meta http-equiv='X-UA-Compatible' content='IE=Edge'>
<html>
<iframe id='forecast_embed' type='text/html' frameborder='0' height='245' width='100%' src='http://forecast.io/embed/#lat=42.3583&lon=-71.0603&name=Downtown Boston'> </iframe>
</html>
";
            StartBrowser(html);
        }

        private static void StartBrowser(string source)
        {
            var th = new Thread(() =>
            {
                var webBrowser = new WebBrowser();
                webBrowser.Width = width;
                webBrowser.Height = height;
                webBrowser.ScrollBarsEnabled = false;
                webBrowser.DocumentCompleted += webBrowser_DocumentCompleted;
                webBrowser.DocumentText = source;
                Application.Run();
            });
            th.SetApartmentState(ApartmentState.STA);
            th.Start();
        }

        static void webBrowser_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
        {
            var webBrowser = (WebBrowser)sender;
            using (Bitmap bitmap = new Bitmap(width, height))
            {
                webBrowser.DrawToBitmap(bitmap, new System.Drawing.Rectangle(0, 0, width, height));
                bitmap.Save(@"image.jpg", System.Drawing.Imaging.ImageFormat.Jpeg);
            }
            Application.Exit();
        }

I understand that there's probably no definitive way to know if all javascript's have ended and the vagaries of iframe loading and the fact that DocumentCompleted get's called as many times as there are frames/iframes + 1. I can deal with the iframe load with a counter or something, but all I want is a reasonable delay, so the javascript is loaded and I don't get an image with "Loading" in it like this: http://imgur.com/FiFMTmm

2

2 Answers

3
votes

If you're dealing with dynamic web pages which use frames and AJAX heavily, there is no perfect solution to find when a particular page has finished loading resources. You could get close by doing the following two things:

  • handle the page's window.onload event;
  • then asynchronously poll WebBrowser Busy property, with some predefined reasonably short time-out.

E.g., (check https://stackoverflow.com/a/19283143/1768303 for a complete example):

const int AJAX_DELAY = 2000; // non-deterministic wait for AJAX dynamic code
const int AJAX_DELAY_STEP = 500;

// wait until webBrowser.Busy == false or timed out
async Task<bool> AjaxDelay(CancellationToken ct, int timeout)
{
    using (var cts = CancellationTokenSource.CreateLinkedTokenSource(ct))
    {
        cts.CancelAfter(timeout);
        while (true)
        {
            try
            {
                await Task.Delay(AJAX_DELAY_STEP, cts.Token);
                var busy = (bool)this.webBrowser.ActiveXInstance.GetType().InvokeMember("Busy", System.Reflection.BindingFlags.GetProperty, null, this.webBrowser.ActiveXInstance, new object[] { });
                if (!busy)
                    return true;
            }
            catch (OperationCanceledException)
            {
                if (cts.IsCancellationRequested && !ct.IsCancellationRequested)
                    return false;
                throw;
            }
        }
    }
}

If you don't want to use async/await, you can implement the same logic using a timer.

0
votes

Here's what I've been using after a lot of messing around with various other ideas that ended up complicated and had race conditions or require .Net 4.5 (such as the answer to this question).

The trick is to restart a Stopwatch on every DocumentCompleted and wait until no documents have been completed within a certain threshold.

To make it easier to use I put into an extension method:

browser.NavigateAndWaitUntilComplete(uri);

I should have called it NavigateUntilProbablyComplete(). The downside to this approach is there's a guaranteed 250ms penalty to every navigation. Many of the solutions I've seen rely on the final page being the same as the url which isn't guaranteed in my scenario.

using System;
using System.Diagnostics;
using System.Threading;
using System.Windows.Forms;

namespace MyProject.Extensions
{
    public static class WebBrowserExtensions
    {
        const int CompletionDelay = 250;

        private class WebBrowserCompletionHelper
        {
            public Stopwatch LastCompletion;

            public WebBrowserCompletionHelper()
            {
                // create but don't start.
                LastCompletion = new Stopwatch();
            }

            public void DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)
            {
                WebBrowser browser = sender as WebBrowser;
                if (browser != null)
                {
                    LastCompletion.Restart();
                }
            }
        }

        public static void NavigateAndWaitUntilComplete(this WebBrowser browser, Uri uri)
        {
            WebBrowserCompletionHelper helper = new WebBrowserCompletionHelper();
            try
            {
                browser.DocumentCompleted += helper.DocumentCompleted;
                browser.Navigate(uri);

                Thread.Sleep(CompletionDelay);
                Application.DoEvents();

                while (browser.ReadyState != WebBrowserReadyState.Complete && helper.LastCompletion.ElapsedMilliseconds < CompletionDelay)
                {
                    Thread.Sleep(CompletionDelay);
                    Application.DoEvents();
                }
            }
            finally
            {
                browser.DocumentCompleted -= helper.DocumentCompleted;
            }
        }
    }
}