0
votes

[Book] isbn(PK),title,category_id,subcategory_id,price

[Author] isbn(FK),author_id(PK),name

[Category] category_id(PK),name

[SubCategory] sub_category_id(PK),category_id(FK),name

I have a database (not designed by me) that contains the above four tables.

I want to have a book list having the following format:

isbn, title, author name(s), category name, subcategory name(may not have), price

But there's some complexity, as you can see, each book can have more than one author, the author name column should have author names separated by commas.

And for the category which is the more difficult part, there are some categories that have no subcategories and thus, some book records have subcategory_id set to 0 because its category_id refers to an category that has no subcategories, in this case, the subcategory name column in the book list does not need to show anything.

I really have no idea how such a complex a complex SQL statement can be built quickly to get the book list. Would somebody kindly think of a solution?

Many thanks to you all.

5
using the solution provided by Hamish Smith/J.Hendrix plus the group_cat function (only available in mysql 4.1 or above ), the author names can now be easily separated by commas, and the book list can then be obtained.bobo

5 Answers

0
votes

See @Pax's answer for a nicer way to handle the null / zero values for sub_category_id

select isbn, a.name as author_name, c.name as category_name, sc.name as subcategory_name, price
from Book 
join Author a on isbn = a.isbn
join Category c on category_id = c.category_id
join SubCategory sc on category_id = sc.category_id and subcategory_id = sc.subcategory_id
where subcategory_id != 0
union
select isbn, a.name as author_name, c.name as category_name, '' as subcategory_name, price
from Book
join Author a on isbn = a.isbn
join Category c on category_id = c.category_id
join SubCategory sc on category_id = sc.category_id and subcategory_id = sc.subcategory_id
where subcategory_id = 0
2
votes

When you find yourself building an "extremely complex SQL statement", it's usually best to step back and rethink.

Remember this - the vast majority of operations performed on database table are selects, not inserts or updates (though there are exceptions to every rule, of course).

The right time to be "spending" CPU cycles calculating things like author lists is when the list changes, not when you just want to extract the information.

Add another column to the book table called author_list and then create an insert/update trigger on authors so that this column is rebuilt whenever an author is changed for the specific ISBN.

That puts the cost where it should be and will make your query a lot simpler. The trigger ensures the data stays consistent, and it's okay to break 3NF if you know what you're doing.

As to the subcategory, the case statement can be your friend, but per-row functions on select never scale well.

I would just create a set of rows in subcategories with the id of 0 (one for each category) and make its name blank. Then it can be done with a simple join without having to concern yourself with performance. This could also be don with a trigger on category so every category will always have a subcategory of 0.

With those two changes, the query becomes a lot less complex, something along the lines of:

select b.isbn, b.title, b.author_list, c.name, sc.name, b.price
from Book b, Category c, SubCategory sc
where b.category_id = c.category_id
and   b.category_id = sc.category_id
and   b.subcategory_id = sc.subcategory_id
order by ...

This query should scream along since it's using just the basic levels of relational algebra (i.e., no per-row functions (including case statements), no subqueries). And that's an "old-school" query, you may get even more performance by using explicit rather than implicit JOINs.

One final point: a properly 3NF schema would not have the ISBN in the authors table - a better option would be to have a separate BookAuthor table holding the ISBN and author_id to properly model the many-to-many relationship. But you may have aleready changed that for performance (I don't know).

1
votes

That's an odd schema, not how I would have designed it. Being denormalized, it's probably going to have a lot of duplication in the author table.

Anyways, because you may have one or more authors, joins aren't really going to cut it for that information. Some things, to be honest, are better done outside of SQL and this is one of them. You can just build a loop that constructs the information and emits the data when the ISBN changes, assuming you do your ordering well.

As for the categories and subcategories, use a left join and it will return NULL on the subcategory information which you can test for. If there's more than one subcategory possible for the book (or categories for that matter), then you're really DOA with SQL here.

0
votes

Well, the subcategory business is poor database design. Even if you assume that a book can only be in one category, it's a poor design because (in that case), a category can always bee derived from the subcategory, so you've introduced redundancy by having book have attributes for both.

As far as the query you want, that's just a matter of doing the joins and projecting teh select statement. In you don't know enough SQL to do that, you probably shouldn't be trying to write queries (or you should be asking about basic joins and projections).

As to how you turn multiple rows into one (which is what you want to do with the authors), that depends on your RDBMS (which you don't specify) and/or your front-end.

0
votes

Something like this should be close.

select
   Book.ISBN,
   Book.Title,
   Author.Name,
   Category.Name as Category_Name,
   SubCategory.Name as SubCategory_Name,
   Book.Price
from
   Book join Author
      on Book.ISBN = Author.ISBN
   join Category
      on Book.Category_ID = Category.Category_ID
   join SubCategory
      on Book.Category_ID = SubCategory.Category_ID
         and Book.SubCategory_ID = SubCategory.Sub_Category_ID