11
votes

Let's look at example from django docs with Pizza and Topping models. One pizza may have multiple toppings.

If we make a query:

pizzas = Pizza.objects.prefetch_related('toppings')

We'll get all the pizzas and their toppings in 2 queries. Now let's suppose that I want to prefetch only vegetarian toppings (assume we have such property):

pizzas = Pizza.objects.prefetch_related(
    Prefetch('toppings', queryset=Topping.objects.filter(is_vegetarian=True))
)

It works pretty well and Django doesn't perform yet another query for each pizza, when making something like this:

for pizza in pizzas:
    print(pizza.toppings.filter(is_vegetarian=True))

Now let's suppose We have a custom manager for Topping model and we decided to put there a method that allows us to filter only vegetarian toppings like in code example above:

class ToppingManager(models.Manager):
    def filter_vegetarian(self):
        return self.filter(is_vegetarian=True)

Now I make a new query and prefetch custom queryset with my method from manager:

    pizzas = Pizza.objects.prefetch_related(
        Prefetch('toppings', queryset=Topping.objects.filter_vegetarian()))

And the try to execute my code:

    for pizza in pizzas:
        print(pizza.toppings.filter_vegeterian())

I get a new one query for each iteration of the loop. That is my question. Why? Both these constructions return the same type object which is queryset:

   Topping.objects.filter_vegetarian()
   Topping.objects.filter(is_vegetarian=True)
2
If you do the prefetch using the manager method but then in your for loop do print(pizza.toppings.filter(is_vegetarian=True)), does it make the extra queries? I have a feeling I understand why this is happening, just want to make sure it operates as I imagine it does. - Titus P
I started to debug it and it seems like even in ther first example we'll have a loope of queries too. That is why docs recommend to use us to_attr - Igor Karbachinsky
But it's still interesting. Why not to implement this feature in django? If prefetched queryset is the same as filtered after we can still use cached results - Igor Karbachinsky
Honestly, I think the reason is that it would be even harder to follow the logic and would make development a nightmare for anyone who didn't work directly on it when it was implemented. You have to remember, Django is open source and as such, a certain number of people need to be able to understand it (it doesn't have to be a large number). Since you query a model, and the queryset returns models, having the models it returns then contain their own querysets with already-executed lookups could be dizzying to think about and troubleshoot, especially if you have a more complicated prefetch. - Titus P
You could definitely look at adding it yourself though! It seems like a niche but potentially useful feature, perhaps if you can code it simply enough to follow along with, the core team would consider adding it in. - Titus P

2 Answers

8
votes

I haven't tested this directly, but you should not invoke a method or filter again in the loop, as prefetch_related has already attached the data. So either of these should work:

pizzas = Pizza.objects.prefetch_related(
    Prefetch('toppings', queryset=Topping.objects.filter(is_vegetarian=True))
)
for pizza in pizzas:
    print(pizza.toppings.all()) # uses prefetched queryset

or

pizzas = Pizza.objects.prefetch_related(
    Prefetch('toppings', queryset=Topping.objects.filter_vegetarian(),
             to_attr="veg_toppings"))
for pizza in pizzas:
    print(pizza.toppings.veg_toppings)

Your examples do not work because they invoke another queryset, and this cannot be compared to the prefetched one to determine if it would be same.

It also says so in the docs:

The prefetch_related('toppings') implied pizza.toppings.all(), but pizza.toppings.filter() is a new and different query. The prefetched cache can’t help here; in fact it hurts performance, since you have done a database query that you haven’t used.

and

Using to_attr is recommended when filtering down the prefetch result as it is less ambiguous than storing a filtered result in the related manager’s cache.

1
votes

This implementation:

class ToppingManager(models.Manager):
    def filter_vegetarian(self):
        return self.filter(is_vegetarian=True)

Looks non-standard. docs look like they do a safer method of modifying the super-class method for this sort of lazy-eval stuff. If I rewrite your method in that style, it would look like:

class ToppingManager(models.Manager):
    def filter_vegetarian(self):
        return super(ToppingManager, self).get_queryset().filter(is_vegetarian=True)

You wouldn't strictly need the super() here, but safer to use it because you should know that you want to start with the models.Manager get_queryset method.

Doing a brief test of this in my own environment, I find that it works feeding into Prefetch without triggering queries on each item. I do not have any reason to believe this would not work for the problem here.

However, I'm also inclined to believe that specifying to_attr in webjunkie's answer may also be necessary.