How To Improve Delete Timeout Issues In CRM 2011 On Prem Dev Environment?

Question

Background

I have a unit test framework that creates entities for my unit tests, preforms the test, then automagically deletes the entities. It had been working fine except that some entities take 15 - 30 seconds to delete in our dev environment.

I recently received a VM setup in the Amazon Cloud to perform some long term changes requiring a couple release cycles to complete. When I run a unit test on VM, I'm continually getting SQL Timeout Errors attempting to delete the entity.

Steps

I've gone down this set of discovery / action steps:

Turned on tracing, saw that timeout was occurring on fn_CollectForCascadeWrapper which is used to handle cascading deletes. My unit test only has 6 entities in it, and they are deleted in such a way that no cascading deletes are needed. Ran Estimated Execution Plan on it and added some of the indexes it requested. This still didn't fix the timeout issue.
Turned on the Resource Manager on the VM to look at Disk Access / Memory / CPU. When I attempt a delete, the CPU hits 20% for 2 seconds, then drops down to near 0. Memory is unchanged, but Disk Read Access on the Resource Manager Goes crazy high, and stays that way for 7-10 minutes.
Hard Coded the fn_CollectForCascadeWrapper to return a result meaning nothing is required to be cascaded for the 6 entities in my unit test. Ran the unit test and again got the SQL Timeout Error. According to the Tracing, the actual delete statement was timing out:

delete from [New_inquiryExtensionBase] where ([New_inquiryId] = '7e250a5f-890e-40ae-9d2d-c55bbd7250cd');
delete from [New_inquiryBase]
OUTPUT DELETED.[New_inquiryId], 10012
into SubscriptionTrackingDeletedObject (ObjectId, ObjectTypeCode)
where ([New_inquiryId] = '7e250a5f-890e-40ae-9d2d-c55bbd7250cd')

Ran the query manually in SQL Management Studio. Took around 3 minutes to complete. No Triggers on the tables, so I thought the time must be due to the insert. Looked at the SubscriptionTrackingDeletedObject table, and noticed it had 2100 records in it. Deleted all records in the table, and reran my unit test. It actually worked in the normal 15-30 second time frame for deletes.
Researched and discovered what the SubscriptionTrackingDeletedObject is used for, and that the Async Service cleans it up. Noticed that the Async Service was not running on the server. Turned the service on, waited 10 minutes and queried the table again. My 6 entities were still listed there. Looked in trace log and saw timeout errors: Error cleaning up Principal Object Access Table
Researched POA and performed a SELECT COUNT(*) on the table and 7 minutes later it returned 261 million records! Researched how to cleanup the table and the only thing I found was for Role Up 6 (we're currently on 11).

What Next?

Could the POA be affecting the Delete? Or is it just the POA that is affecting the Async Service that is affecting the delete? Could inserting into the SubscriptionTrackingDeletedObject really be causing my problem?

Have you executed script you have found in KB (to clean up POA table)? — MarioZG
Also have you tried to look at execution plan for the long running queries to see if there is any step that takes most of the time? — MarioZG
@MarioZG I am currently running the script. It was executing it as part of a transaction, and taking around 5 min. for 50000 records. It'll take 16 days to clean it up at this rate... The Query that shows up in the SQL log as timing out, doesn't offer any additional indexing.... — Daryl

Daryl Daryl · Accepted Answer · 2013-10-04T14:24:57

I ended up turning on SQL Server Profiling, and running the delete statement listed in my question. It took 3.5 minutes to execute. I was expecting it to be kicking something else off that hit the POA table, but nope, it was just deleting those records.

382604 Reads

I took a second look at the Query Execution Plan and noticed there were lots of Nested loops:

enter image description here

that were looking at the child tables that contain a reference to it (see the 13 tiny branches in the tree structure insert in the bottom right?) . So all the reads were being performed on the indexes themselves, and taking forever to get loaded on my uber slow VM.

I ended up running the same query for a different id, and it ran in 2 seconds. I then attempted my unit test, and finally it completed successfully.

I'm guessing each time I attempted a delete, a transaction was started, and then the time out on CRM rolled back the transaction, never allowing the child entity indexes to load. So my current fix is to ensure the child indexes are loaded in memory before actually performing the delete. How I'm going to do that, I'm not sure (perform a query by id for each of the child entities?).

Edit

We had a performance analyst from Microsoft come out and they wrote up a report that was over 200 pages long. 98% said the POA table was too long. Over Christmas we ended up turning off CRM and running some scripts to cleanup the POA table. This has been extremely helpful.