12
votes

I'm currently doing some research on usage of db4o a storage for my web application. I'm quite happy how easy db4o works. So when I read about the Code First approach I kinda liked is, because the way of working with EF4 Code First is quite similar to working with db4o: create your domain objects (POCO's), throw them at db4o, and never look back.

But when I did a performance comparison, EF 4 was horribly slow. And I couldn't figure out why.

I use the following entities :



public class Recipe { private List _RecipePreparations; public int ID { get; set; } public String Name { get; set; } public String Description { get; set; } public List Tags { get; set; } public ICollection Preparations { get { return _RecipePreparations.AsReadOnly(); } }

    public void AddPreparation(RecipePreparation preparation) 
    {
        this._RecipePreparations.Add(preparation);
    }
}

public class RecipePreparation { public String Name { get; set; } public String Description { get; set; } public int Rating { get; set; } public List Steps { get; set; } public List Tags { get; set; } public int ID { get; set; } }

To test the performance I new up a recipe, and add 50.000 RecipePrepations. Then I stored the object in db4o like so :

IObjectContainer db = Db4oEmbedded.OpenFile(Db4oEmbedded.NewConfiguration(), @"RecipeDB.db4o");
db.Store(recipe1);
db.Close();

This takes around 13.000 (ms)

I store the stuff with EF4 in SQL Server 2008 (Express, locally) like this :

cookRecipes.Recipes.Add(recipe1);
cookRecipes.SaveChanges();

And that takes 200.000 (ms)

Now how on earth is db4o 15(!!!) times faster that EF4/SQL? Am I missing a secret turbo button for EF4? I even think that db4o could be made faster? Since I don't initialize the database file, I just let it grow dynamically.

4
My guess is that the overhead of many single insert-statements being executed is the largest portion of the difference. Is there a way to instruct EF4 to combine insert-statements to reduce that overhead? - Lasse V. Karlsen
@Lasse: Yes, there is. EF implements the unit of work pattern out of the box - see my answer. - Tomas Aschan
I have done some profiling with Visual Studio. And the cookRecipes.Recipes.Add(recipe1) take approx 65% of total time to store, and SaveChanges approx 35% (duh... ;) ). - Saab
Not sure how much it matters but what CTP version of code-only did you use? - KallDrexx
CTP 4 downloaded from here : microsoft.com/downloads/… - Saab

4 Answers

3
votes

Did you call SaveChanges() inside the loop? No wonder it's slow! Try doing this:

foreach(var recipe in The500000Recipes)
{
    cookRecipes.Recipes.Add(recipe);
}
cookRecipes.SaveChanges();

EF expects you to make all the changes you want, and then call SaveChanges once. That way, it can optimize database communication and sql to perform the changes between opening state and saving state, ignoring all changes that you have undone. (For example, adding 50 000 records, then removing half of them, then hitting SaveChanges will only add 25 000 records to the database. Ever.)

2
votes

Perhaps you can disable Changetracking while adding new objects, this would really increase Performance.

context.Configuration.AutoDetectChangesEnabled = false;

see also for more info: http://coding.abel.nu/2012/03/ef-code-first-change-tracking/

1
votes

The EF excels at many things, but bulk loading is not one of them. If you want high-performance bulk loading, doing it directly through the DB server will be faster than any ORM. If your app's sole performance constraint is bulk loading, then you probably shouldn't use the EF.

1
votes

Just to add on to the other answers: db4o typically runs in-process, while EF abstracts an out-of-process (SQL) database. However, db4o is essentially single-threaded. So while it might be faster for this one example with one request, SQL will handle concurrency (multiple queries, multiple users) much better than a default db4o database setup.