3
votes

Our app manages books owned by a user with a book containing multiple documents (pdfs, word docs etc). The home page lists all the books for a user with a button for paging that loads the next 10 books. Then when a user clicks on a book it opens in a new screen and lists all the documents for that book.

Up until now we were using WCF / entity framework to retrieve all books shown on the home page, then azure search (connected to a sql view) to get the documents for one book when it was opened, which worked well with paging and sorting.

Now though we also want to get the list of all books for a user from azure search so we created a new table to hold the book and document data, one row per document meaning the parent book name and book id is repeated for each row.

AzureSearchTable

Our azure search index now points to this table and I have to figure out how to retrieve the books for a user with paging and possibly sorting also. The problem is that I need a distinct select for the books but azure search doesn't do distinct and I don't know how many documents a book might have, so I can't set the Top parameter to 10. A book could have 30 or 40 documents which means the first 40 rows for example could be just for one book.

I tried to use a facet on the book id which kind of works and gives me the id and count of documents for each book, but I can't seem to specify a sort order for the facet - the order is different to the order I set for the query (BookId). I also don't know how to get all the books using a facet - I can set a count property on the facet but I don't know how many books a user will have.

Our architect says I should get all rows (which could be thousands) and filter them in the C# code to get 10 books. This seems pretty inefficient to me though and doesn't feel right.

So I'm not sure if this is the right approach..

  • should I have separate azure search indices for book and document data (that use separate tables?
  • how do I return the top n books from this table without knowing how many documents each book has?
  • can I specify a sort order for facets using the C# sdk? (I think it's possible via the rest API)
  • how do I get a facet to return all books for a user?
1

1 Answers

3
votes

Here are a few thoughts:

Bullet #1 answer:

If your intent is to be able to return a list of books based on a search of DocumentName, then you probably want to keep them in the same index. The idea your architect had about handling the results in C# may not be as bad as you think. You could do a GroupBy in LINQ. The Azure Search query is fast and so are LINQ queries. Especially if the machine that is issuing the Azure Search query is an Azure web/app server and is in the same region (intra-datacenter communication). I've used this approach even with the Suggestions API for an auto-complete feature which needs to return results quickly (within a few hundred milli-seconds) as the user is typing. I'd say it's at least worth a try to see what kind of performance you're getting with your maximum and typical datasets.

But, if that doesn't work for you, then consider re-structuring your index schema so that DocumentName is of type Collection(Edm.String). Your would look something like this:

{
    id: 20663,
    userId: 1,
    bookId: 2144,
    bookName: "ber",
    documentName: ["asdasd", "_318-1991.jpg", "wallhaven-13081.png", etc...],
    documentCount: 7
}

Now, if you need to allow the user to get detailed information about the documents of a particular book that they select you can just do that with a database call to get the book details. Alternatively, this is where you could create another Azure search index for documents that has more detailed document information. But at this point in the user workflow unless you're going to provide another full-text search across the documents of that particular book, then you'd probably just want to stick with a get-by-id kind of DB call.

Bullet #2 answer:

For the document count you can just create another field (as shown above) and sort/filter/facet on that.

Bullet #3 answer:

Neither the SDK nor the Azure Search REST API provides a way to order the facets themselves, but keep in mind you ultimately have complete control for how you want to display facet information in the UI. If the SDK doesn't provide what you need you can create a simple lookup class in your app to order your facets as you like. Something like this:

public class FacetDefinition
{
    public string FacetName { get; set; }
    public int FacetOrder { get; set; }
}

...

var myFacetDefinitions = new List<FacetDefinition>();
myFacetDefinitions.Add(new FacetDefinition() { FacetName = "SomeNameThatMatchesTheFacetThatAzureSearchSendsBack", FacetOrder = 1});
myFacetDefinitions.Add(new FacetDefinition() { FacetName = "SomeOtherNameThatMatchesTheFacetThatAzureSearchSendsBack", FacetOrder = 2});
...

Bullet #4 answer:

To return all books for a particular user you can just add a filter expression like this:

userId eq <put_authenticated_userid_here>

That is assuming that the currently authenticated user should only be able to see their own books. However, if you want to be able to have a list of users in a facet to filter across one or more of them, then that would need another re-structuring of the index schema to have a new field on the book document called something like "users" that is a Collection(Edm.String) of the user name. Like this:

{
    ...
    users: ["Luke Skywalker", "Han Solo", "Chewbacca", etc...]
    ...
}