Edit Sept 26
See below for the full background. tl;dr: A data grid control is causing odd exceptions, and I am looking for help isolating the cause and finding a solution.
I've narrowed this down a bit further. I have been able to reproduce the behavior in a smaller test app with more reliable triggering of the erratic behavior.
I can definitely rule out both threading and (I think) memory issues. The new app uses no Tasks or other threading/asynchronous features, and I can trigger the unhandled exception simply by adding properties that return a constant to the class of objects shown in the DataGrid. This indicates to me that the problem is either in unmanaged resource exhaustion or something I haven't thought of yet.
The revised program is structured like this. I have created a user control called EntityCollectionGridView
which has a label and a data grid. In the control's Loaded event handler, I assign a List<TestClass>
to the data grid with 1000 or 10000 rows, letting the grid generate the columns. This user control is instantiated 2-4 times in MainPage.xaml in the page's OnNavigatedTo
event (or Loaded
, it doesn't seem to matter). If an exception occurs, it occurs immediately after MainPage is shown.
The interesting thing is, the behavior doesn't seem to vary with the number of rows shown (it will work reliably with 10000 rows or fail reliably with only 1000 rows in each grid) but rather with the total number of columns in all the grids loaded at a given time. With 20 properties to show, 4 grids works fine. With 35 properties and 4 grids, the exception is thrown. But if I eliminate two grids, the same class with 35 properties will work fine.
Note that all of the properties I add to TestClass
to jump from 20 to 35 columns are of the form:
public string StringXYZ { get { return "asdfasdfasdfasdfasf"; } }
So, there's no additional memory in the backing data (and again, I don't think memory pressure is the problem anyway).
What do you all think? Again, the handles/user objects/etc in Task Manager look good, but is there something else I might be missing?
Original post
I have been working on a port of the Silverlight Toolkit DataGrid to WinRT, and it has done well enough in simple tests (a variety of configurations and up to 10000 rows). However, as I have tried to embed it into another WinRT app I have run into a sporadic exception (of type System.Exception, raised in the App.UnhandledException handler) that is proving very difficult to debug.
Not enough quota is available to process this command. (Exception from HRESULT: 0x80070718)
The error is consistently reproducible, but not deterministically. That is, I can make it happen every time I run the App, but it doesn't always happen by performing the same exact set of steps the same number of times. The error seems to occur on page transitions (whether navigating to a new page forward or back to a previous page), and not (for instance) when changing the ItemsSource of the datagrid.
The application structure is basically recursive access through a hierarchy, with a page shown at each hierarchy level. On the page for the current node in the hierarchy, each child node and some grandchild nodes are shown, and for any subnode a datagrid may be shown. In practice, I consistently reproduce this with the following navigation structure:
Root page: shows no datagrid
Child page: shows one datagrid and a few listviews
Grandchild page: shows two datagrids, one bound to the
same source as Child page, the other one empty
A typical test scenario is, start at Root, move to Child, move to Grandchild, move back to Child, and then when I try to navigate to Grandchild again, it fails with the exception I mentioned above. But it might fail the first time I hit Grandchild, or it might let me move back and forth a few times before failing.
The call stack has only one managed frame on it, which is the unhandled exception event handler. This is very unhelpful. Switching to mixed mode debugging, I get the following:
WinRTClient.exe!WinRTClient.App.InitializeComponent.AnonymousMethod__14(object sender, Windows.UI.Xaml.UnhandledExceptionEventArgs e) Line 50 + 0x20 bytes C#
[Native to Managed Transition]
Windows.UI.Xaml.dll!DirectUI::CFTMEventSource<Windows::UI::Xaml::IUnhandledExceptionEventHandler,Windows::UI::Xaml::IApplication,Windows::UI::Xaml::IUnhandledExceptionEventArgs>::Raise(Windows::UI::Xaml::IApplication * pSource, Windows::UI::Xaml::IUnhandledExceptionEventArgs * pArgs) Line 327 C++
Windows.UI.Xaml.dll!DirectUI::Application::RaiseUnhandledExceptionEventHelper(long hrEncountered, unsigned short * pszErrorMessage, unsigned int * pfIsHandled) Line 920 + 0xa bytes C++
Windows.UI.Xaml.dll!DirectUI::ErrorHelper::CallAUHandler(unsigned int errorCode, unsigned int * pfIsHandled, wchar_t * * pbstrErrorMessage) Line 39 + 0x14 bytes C++
Windows.UI.Xaml.dll!DirectUI::ErrorHelper::ProcessUnhandledErrorForUserCode(long error) Line 82 + 0x10 bytes C++
Windows.UI.Xaml.dll!AgCoreCallbacks::CallAUHandler(unsigned int errorCode) Line 1104 + 0x8 bytes C++
Windows.UI.Xaml.dll!CCoreServices::ReportUnhandledError(long errorXR) Line 6582 C++
Windows.UI.Xaml.dll!CXcpDispatcher::Tick() Line 1126 + 0xb bytes C++
Windows.UI.Xaml.dll!CXcpDispatcher::OnReentrancyProtectedWindowMessage(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Line 653 C++
Windows.UI.Xaml.dll!CXcpDispatcher::WindowProc(HWND__ * hwnd, unsigned int msg, unsigned int wParam, long lParam) Line 401 + 0x24 bytes C++
user32.dll!_InternalCallWinProc@20() + 0x23 bytes
user32.dll!_UserCallWinProcCheckWow@36() + 0xbd bytes
user32.dll!_DispatchMessageWorker@8() + 0xf8 bytes
user32.dll!_DispatchMessageW@4() + 0x10 bytes
Windows.UI.dll!Windows::UI::Core::CDispatcher::ProcessMessage(int bDrainQueue, int * pbAnyMessages) Line 121 C++
Windows.UI.dll!Windows::UI::Core::CDispatcher::ProcessEvents(Windows::UI::Core::CoreProcessEventsOption options) Line 184 + 0x10 bytes C++
Windows.UI.Xaml.dll!CJupiterWindow::RunCoreWindowMessageLoop() Line 416 + 0xb bytes C++
Windows.UI.Xaml.dll!CJupiterControl::RunMessageLoop() Line 714 + 0x5 bytes C++
Windows.UI.Xaml.dll!DirectUI::DXamlCore::RunMessageLoop() Line 2539 + 0x5 bytes C++
Windows.UI.Xaml.dll!DirectUI::FrameworkView::Run() Line 91 C++
twinapi.dll!`Windows::ApplicationModel::Core::CoreApplicationViewAgileContainer::RuntimeClassInitialize'::`55'::<lambda_A2234BA2CCD64E2C>::operator()(void * pv) Line 560 C++
twinapi.dll!`Windows::ApplicationModel::Core::CoreApplicationViewAgileContainer::RuntimeClassInitialize'::`55'::<lambda_A2234BA2CCD64E2C>::<helper_func>(void * pv) Line 613 + 0xe bytes C++
SHCore.dll!_SHWaitForThreadWithWakeMask@12() + 0xceab bytes
kernel32.dll!@BaseThreadInitThunk@12() + 0xe bytes
ntdll.dll!___RtlUserThreadStart@8() + 0x27 bytes
ntdll.dll!__RtlUserThreadStart@8() + 0x1b bytes
This indicates to me that whatever I'm doing wrong doesn't register until after at least one cycle in the app's message loop (and I also tried breaking on all thrown exceptions using "Debug | Exceptions..." -- as far as I can tell, nothing is thrown and swallowed). The interesting stack frames I see are WindowProc
, OnReentrancyProtectedWindowMessage
, and Tick
. The msg
is 0x402 (1026), which doesn't mean anything to me. This page lists that message as used in the following contexts:
CBEM_SETIMAGELIST
DDM_CLOSE
DM_REPOSITION
HKM_GETHOTKEY
PBM_SETPOS
RB_DELETEBAND
SB_GETTEXTA
TB_CHECKBUTTON
TBM_GETRANGEMAX
WM_PSD_MINMARGINRECT
...but that doesn't mean anything much to me, either (it might not even be relevant).
The three theories I can come up with are these:
- Memory pressure. But I have run into this with 24% of my physical memory free and the app consuming less than 100MB of memory. Other times, the app won't have hit any problems navigating around a while and racking up 400MB of memory
- Threading problems, such as access to the UI thread from a worker thread. And, in fact, I do have data access happening on a background thread. But this is using a (ported) framework that has been very reliable in a WinForms environment and in an Outlook plugin, and I think the thread use is safe. Additionally, I can use the same data in this app without any problems binding just to ListViews and so forth. Finally, the Grandchild node is configured such that selecting a row in the first datagrid kicks off a request for the row's detail items, which are displayed in the second datagrid (which is initially empty, and can remain so without preventing the exception). This happens without a page transition and works flawlessly for as long as I choose to fiddle with the selection. But navigating back to Child might kill me right away, even though there should be no data access at that point and therefore not threading operations that I know of.
- Resource exhaustion of some kind, maybe GUI handles. But I don't think I'm putting that much pressure on this system. In one execution, breaking in the exception handler, Task Manager reports the process using 662 handles, 21 User objects, and 12 GDI objects, as compared to Tweetro which is using 734, 37, and 19 respectively without problems. What else might I be missing in this category?
I have plenty of disk space free, and am not using the disk for anything other than configuration files anyway (and all that worked fine before adding the datagrids).
My next thought was to try to step through some of the potential 'interesting' parts of the datagrid code and jump over any that were questionable. I did try that with the datagrid's ArrangeOverride, but the exception didn't seem to care whether I did that or not. Also, I am not sure this is a useful strategy. Since the exception isn't being thrown until after a cycle on the message loop, and since I can't know for sure when it's about to happen, I would need to cover a huge number of permutations, running each permutation a whole lot of times, to isolate the problem code.
The error is thrown in both Debug and Release modes. And, as a final background note, the amount of data we're dealing with here is small, much smaller than my 10000-row runs of the datagrid in isolation. It's probably on the order of 50-100 rows, with perhaps 30-40 columns. And before the exception is thrown, the data and the grids seem to work and respond fine.
So, that's why I come to you. My two questions are:
- Does the error information give you any hints as to what might be the problem?
- What debugging strategy would you use to isolate the problem code?
Many thanks in advance for any help you can provide!