17
votes

I am currently faced with the task of importing around 200K items from a custom CMS implementation into Sitecore. I have created a simple import page which connects to an external SQL database using Entity Framework and I have created all the required data templates.

During a test import of about 5K items I realized that I needed to find a way to make the import run a lot faster so I set about to find some information about optimizing Sitecore for this purpose. I have concluded that there is not much specific information out there so I'd like to share what I've found and open the floor for others to contribute further optimizations. My aim is to create some kind of maintenance mode for Sitecore that can be used when importing large columes of data.

The most useful information I found was on Mark Cassidy's blogpost http://intothecore.cassidy.dk/2009/04/migrating-data-into-sitecore.html. At the bottom of this post he provides a few tips for when you are running an import.

  • If migrating large quantities of data, try and disable as many Sitecore event handlers and whatever else you can get away with.
  • Use BulkUpdateContext()
  • Don't forget your target language
  • If you can, make the fields shared and unversioned. This should help migration execution speed.

The first thing I noticed out of this list was the BulkUpdateContext class as I had never heard of it. I quickly understood why as a search on the SND forum and in the PDF documentation returned no hits. So imagine my surprise when i actually tested it out and found that it improves item creation/deletes by at least ten fold!

The next thing I looked at was the first point where he basically suggests creating a version of web config that only has the bare essentials needed to perform the import. So far I have removed all events related to creating, saving and deleting items and versions. I have also removed the history engine and system index declarations from the master database element in web config as well as any custom events, schedules and search configurations. I expect that there are a lot of other things I could look to remove/disable in order to increase performance. Pipelines? Schedules?

What optimization tips do you have?

2

2 Answers

16
votes

Incidentally, BulkUpdateContext() is a very misleading name - as it really improves item creation speed, not item updating speed. But as you also point out, it improves your import speed massively :-)

Since I wrote that post, I've added a few new things to my normal routines when doing imports.

  • Regularly shrink your databases. They tend to grow large and bulky. To do this; first go to Sitecore Control Panel -> Database and select "Clean Up Database". After this, do a regular ShrinkDB on your SQL server
  • Disable indexes, especially if importing into the "master" database. For reference, see http://intothecore.cassidy.dk/2010/09/disabling-lucene-indexes.html
  • Try not to import into "master" however.. you will usually find that imports into "web" is a lot faster, mostly because this database isn't (by default) connected to the HistoryManager or other gadgets

And if you're really adventureous, there's a thing you could try that I'd been considering trying out myself, but never got around to. They might work, but I can't guarantee that they will :-)

  • Try removing all your field types from App_Config/FieldTypes.config. The theory here is, that this should essentially disable all of Sitecore's special handling of the content of these fields (like updating the LinkDatabase and so on). You would need to manually trigger a rebuild of the LinkDatabase when done with the import, but that's a relatively small price to pay

Hope this helps a bit :-)

1
votes

I'm guessing you've already hit this, but putting the code inside a SecurityDisabler() block may speed things up also.

I'd be a lot more worried about how Sitecore performs with this much data... assuming you only do the import once, who cares how long that process takes. Is this going to be a regular occurrence?