45
votes

Edit Sept 26

See below for the full background. tl;dr: A data grid control is causing odd exceptions, and I am looking for help isolating the cause and finding a solution.

I've narrowed this down a bit further. I have been able to reproduce the behavior in a smaller test app with more reliable triggering of the erratic behavior.

I can definitely rule out both threading and (I think) memory issues. The new app uses no Tasks or other threading/asynchronous features, and I can trigger the unhandled exception simply by adding properties that return a constant to the class of objects shown in the DataGrid. This indicates to me that the problem is either in unmanaged resource exhaustion or something I haven't thought of yet.

The revised program is structured like this. I have created a user control called EntityCollectionGridView which has a label and a data grid. In the control's Loaded event handler, I assign a List<TestClass> to the data grid with 1000 or 10000 rows, letting the grid generate the columns. This user control is instantiated 2-4 times in MainPage.xaml in the page's OnNavigatedTo event (or Loaded, it doesn't seem to matter). If an exception occurs, it occurs immediately after MainPage is shown.

The interesting thing is, the behavior doesn't seem to vary with the number of rows shown (it will work reliably with 10000 rows or fail reliably with only 1000 rows in each grid) but rather with the total number of columns in all the grids loaded at a given time. With 20 properties to show, 4 grids works fine. With 35 properties and 4 grids, the exception is thrown. But if I eliminate two grids, the same class with 35 properties will work fine.

Note that all of the properties I add to TestClass to jump from 20 to 35 columns are of the form:

public string StringXYZ { get { return "asdfasdfasdfasdfasf"; } }

So, there's no additional memory in the backing data (and again, I don't think memory pressure is the problem anyway).

What do you all think? Again, the handles/user objects/etc in Task Manager look good, but is there something else I might be missing?

Original post

I have been working on a port of the Silverlight Toolkit DataGrid to WinRT, and it has done well enough in simple tests (a variety of configurations and up to 10000 rows). However, as I have tried to embed it into another WinRT app I have run into a sporadic exception (of type System.Exception, raised in the App.UnhandledException handler) that is proving very difficult to debug.

Not enough quota is available to process this command. (Exception from HRESULT: 0x80070718)

The error is consistently reproducible, but not deterministically. That is, I can make it happen every time I run the App, but it doesn't always happen by performing the same exact set of steps the same number of times. The error seems to occur on page transitions (whether navigating to a new page forward or back to a previous page), and not (for instance) when changing the ItemsSource of the datagrid.

The application structure is basically recursive access through a hierarchy, with a page shown at each hierarchy level. On the page for the current node in the hierarchy, each child node and some grandchild nodes are shown, and for any subnode a datagrid may be shown. In practice, I consistently reproduce this with the following navigation structure:

Root page: shows no datagrid
  Child page: shows one datagrid and a few listviews
    Grandchild page: shows two datagrids, one bound to the
                     same source as Child page, the other one empty

A typical test scenario is, start at Root, move to Child, move to Grandchild, move back to Child, and then when I try to navigate to Grandchild again, it fails with the exception I mentioned above. But it might fail the first time I hit Grandchild, or it might let me move back and forth a few times before failing.

The call stack has only one managed frame on it, which is the unhandled exception event handler. This is very unhelpful. Switching to mixed mode debugging, I get the following:

WinRTClient.exe!WinRTClient.App.InitializeComponent.AnonymousMethod__14(object sender, Windows.UI.Xaml.UnhandledExceptionEventArgs e) Line 50 + 0x20 bytes  C#
[Native to Managed Transition]  
Windows.UI.Xaml.dll!DirectUI::CFTMEventSource<Windows::UI::Xaml::IUnhandledExceptionEventHandler,Windows::UI::Xaml::IApplication,Windows::UI::Xaml::IUnhandledExceptionEventArgs>::Raise(Windows::UI::Xaml::IApplication * pSource, Windows::UI::Xaml::IUnhandledExceptionEventArgs * pArgs)  Line 327  C++
Windows.UI.Xaml.dll!DirectUI::Application::RaiseUnhandledExceptionEventHelper(long hrEncountered, unsigned short * pszErrorMessage, unsigned int * pfIsHandled)  Line 920 + 0xa bytes   C++
Windows.UI.Xaml.dll!DirectUI::ErrorHelper::CallAUHandler(unsigned int errorCode, unsigned int * pfIsHandled, wchar_t * * pbstrErrorMessage)  Line 39 + 0x14 bytes   C++
Windows.UI.Xaml.dll!DirectUI::ErrorHelper::ProcessUnhandledErrorForUserCode(long error)  Line 82 + 0x10 bytes   C++
Windows.UI.Xaml.dll!AgCoreCallbacks::CallAUHandler(unsigned int errorCode)  Line 1104 + 0x8 bytes   C++
Windows.UI.Xaml.dll!CCoreServices::ReportUnhandledError(long errorXR)  Line 6582    C++
Windows.UI.Xaml.dll!CXcpDispatcher::Tick()  Line 1126 + 0xb bytes   C++
Windows.UI.Xaml.dll!CXcpDispatcher::OnReentrancyProtectedWindowMessage(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam)  Line 653 C++
Windows.UI.Xaml.dll!CXcpDispatcher::WindowProc(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam)  Line 401 + 0x24 bytes    C++
user32.dll!_InternalCallWinProc@20()  + 0x23 bytes  
user32.dll!_UserCallWinProcCheckWow@36()  + 0xbd bytes  
user32.dll!_DispatchMessageWorker@8()  + 0xf8 bytes 
user32.dll!_DispatchMessageW@4()  + 0x10 bytes  
Windows.UI.dll!Windows::UI::Core::CDispatcher::ProcessMessage(int bDrainQueue, int * pbAnyMessages)  Line 121   C++
Windows.UI.dll!Windows::UI::Core::CDispatcher::ProcessEvents(Windows::UI::Core::CoreProcessEventsOption options)  Line 184 + 0x10 bytes C++
Windows.UI.Xaml.dll!CJupiterWindow::RunCoreWindowMessageLoop()  Line 416 + 0xb bytes    C++
Windows.UI.Xaml.dll!CJupiterControl::RunMessageLoop()  Line 714 + 0x5 bytes C++
Windows.UI.Xaml.dll!DirectUI::DXamlCore::RunMessageLoop()  Line 2539 + 0x5 bytes    C++
Windows.UI.Xaml.dll!DirectUI::FrameworkView::Run()  Line 91 C++
twinapi.dll!`Windows::ApplicationModel::Core::CoreApplicationViewAgileContainer::RuntimeClassInitialize'::`55'::<lambda_A2234BA2CCD64E2C>::operator()(void * pv)  Line 560  C++
twinapi.dll!`Windows::ApplicationModel::Core::CoreApplicationViewAgileContainer::RuntimeClassInitialize'::`55'::<lambda_A2234BA2CCD64E2C>::<helper_func>(void * pv)  Line 613 + 0xe bytes   C++
SHCore.dll!_SHWaitForThreadWithWakeMask@12()  + 0xceab bytes    
kernel32.dll!@BaseThreadInitThunk@12()  + 0xe bytes 
ntdll.dll!___RtlUserThreadStart@8()  + 0x27 bytes   
ntdll.dll!__RtlUserThreadStart@8()  + 0x1b bytes    

This indicates to me that whatever I'm doing wrong doesn't register until after at least one cycle in the app's message loop (and I also tried breaking on all thrown exceptions using "Debug | Exceptions..." -- as far as I can tell, nothing is thrown and swallowed). The interesting stack frames I see are WindowProc, OnReentrancyProtectedWindowMessage, and Tick. The msg is 0x402 (1026), which doesn't mean anything to me. This page lists that message as used in the following contexts:

CBEM_SETIMAGELIST 
DDM_CLOSE 
DM_REPOSITION 
HKM_GETHOTKEY 
PBM_SETPOS 
RB_DELETEBAND 
SB_GETTEXTA 
TB_CHECKBUTTON 
TBM_GETRANGEMAX 
WM_PSD_MINMARGINRECT

...but that doesn't mean anything much to me, either (it might not even be relevant).

The three theories I can come up with are these:

  1. Memory pressure. But I have run into this with 24% of my physical memory free and the app consuming less than 100MB of memory. Other times, the app won't have hit any problems navigating around a while and racking up 400MB of memory
  2. Threading problems, such as access to the UI thread from a worker thread. And, in fact, I do have data access happening on a background thread. But this is using a (ported) framework that has been very reliable in a WinForms environment and in an Outlook plugin, and I think the thread use is safe. Additionally, I can use the same data in this app without any problems binding just to ListViews and so forth. Finally, the Grandchild node is configured such that selecting a row in the first datagrid kicks off a request for the row's detail items, which are displayed in the second datagrid (which is initially empty, and can remain so without preventing the exception). This happens without a page transition and works flawlessly for as long as I choose to fiddle with the selection. But navigating back to Child might kill me right away, even though there should be no data access at that point and therefore not threading operations that I know of.
  3. Resource exhaustion of some kind, maybe GUI handles. But I don't think I'm putting that much pressure on this system. In one execution, breaking in the exception handler, Task Manager reports the process using 662 handles, 21 User objects, and 12 GDI objects, as compared to Tweetro which is using 734, 37, and 19 respectively without problems. What else might I be missing in this category?

I have plenty of disk space free, and am not using the disk for anything other than configuration files anyway (and all that worked fine before adding the datagrids).

My next thought was to try to step through some of the potential 'interesting' parts of the datagrid code and jump over any that were questionable. I did try that with the datagrid's ArrangeOverride, but the exception didn't seem to care whether I did that or not. Also, I am not sure this is a useful strategy. Since the exception isn't being thrown until after a cycle on the message loop, and since I can't know for sure when it's about to happen, I would need to cover a huge number of permutations, running each permutation a whole lot of times, to isolate the problem code.

The error is thrown in both Debug and Release modes. And, as a final background note, the amount of data we're dealing with here is small, much smaller than my 10000-row runs of the datagrid in isolation. It's probably on the order of 50-100 rows, with perhaps 30-40 columns. And before the exception is thrown, the data and the grids seem to work and respond fine.

So, that's why I come to you. My two questions are:

  1. Does the error information give you any hints as to what might be the problem?
  2. What debugging strategy would you use to isolate the problem code?

Many thanks in advance for any help you can provide!

2

2 Answers

56
votes

OK, with some critical input from Tim Heuer [MSFT], I figured out what was going on and how to get around this problem.

Surprisingly, none of my three initial guesses were correct. This wasn't about memory, threading, or system resources. Instead, it was about limitations in the Windows messaging system. Apparently it is a little like a stack overflow exception, in that when you make too many changes to the visual tree all at once, the asynchronous update queue gets so long that it trips a wire and the exception gets thrown.

In this case, the problem is that there are enough UIElements going into the data grid I am working with that allowing the grid to generate all its own columns all at once can in some cases exceed the limit. I was using a number of grids all at once, and all loading in response to page navigation events, which made it all the trickier to nail down.

Thankfully, the limitations I was running into were NOT limitations in the visual tree or the XAML UI subsystem itself, just in the messaging used to update it. This means that if I could spread out the same operations over multiple ticks of the dispatcher's clock, I could accomplish the same end result.

What I ended up doing was to instruct my data grid not to autogenerate its own columns. Instead, I embedded the grid into a user control that, when the data was loaded, would parse out the columns needed and load them into a list. Then, I called the following method:

void LoadNextColumns(List<ColumnDisplaySetup> colDef, int startIdx, int numToLoad)
{
    for (int idx = startIdx; idx < startIdx + numToLoad && idx < colDef.Count; idx++)
    {
        DataGridTextColumn newCol = new DataGridTextColumn();
        newCol.Header = colDef[idx].Header;
        newCol.Binding = new Binding() { Path = new PropertyPath(colDef[idx].Property) };
        dgMainGrid.Columns.Add(newCol);
    }

    if (startIdx + numToLoad < colDef.Count)
    {
        Dispatcher.RunAsync(Windows.UI.Core.CoreDispatcherPriority.Normal, () =>
            {
                    LoadNextColumns(colDef, startIdx + numToLoad, numToLoad);
            });
    }
}

(ColumnDisplaySetup is a trivial type used to house the parsed-out configuration or a configuration loaded from a file.)

This method is called with the following arguments, respectively: list of columns, 0, and my arbitrary guess of 5 as a fairly safe number of columns to load at a time; but this number is based on testing and the expectation that a good number of grids could be loading simultaneously. I asked Tim for more information that might inform this part of the process, and will report back here if I learn more about how to determine how much is safe.

In practice, this seems to work adequately, although it results in the sort of progressive rendering you'd expect, with the columns visibly popping in. I expect this could be improved both by using the maximum possible value for numToLoad and by other UI sleight-of-hand. I may investigate hiding the grid while the columns are generated and only showing the result when everything is ready. Ultimately the decision will come down to which feels more 'fast and fluid'.

Again, I will update this answer with more information if I get it, but I hope this helps someone facing similar problems in the future. After pouring more time than I'd care to admit into the bug hunt, I don't want anyone else to have to kill themselves over this.

0
votes

It appears that this issue is fixed in Windows 8.1 Preview, after retargeting my application for Windows 8.1. I can no longer recreate this issue by dumping thousands of visuals onto the screen.