115
votes

I store various user details in my MySQL database. Originally it was set up in various tables meaning data is linked with UserIds and outputting via sometimes complicated calls to display and manipulate the data as required. Setting up a new system, it almost makes sense to combine all of these tables into one big table of related content.

  • Is this going to be a help or hindrance?
  • Speed considerations in calling, updating or searching/manipulating?

Here's an example of some of my table structure(s):

  • users - UserId, username, email, encrypted password, registration date, ip
  • user_details - cookie data, name, address, contact details, affiliation, demographic data
  • user_activity - contributions, last online, last viewing
  • user_settings - profile display settings
  • user_interests - advertising targetable variables
  • user_levels - access rights
  • user_stats - hits, tallies

Edit: I've upvoted all answers so far, they all have elements that essentially answer my question.

Most of the tables have a 1:1 relationship which was the main reason for denormalising them.

Are there going to be issues if the table spans across 100+ columns when a large portion of these cells are likely to remain empty?

8
This other question might be helpful tooMosty Mostacho

8 Answers

72
votes

Multiple tables help in the following ways / cases:

(a) if different people are going to be developing applications involving different tables, it makes sense to split them.

(b) If you want to give different kind of authorities to different people for different part of the data collection, it may be more convenient to split them. (Of course, you can look at defining views and giving authorization on them appropriately).

(c) For moving data to different places, especially during development, it may make sense to use tables resulting in smaller file sizes.

(d) Smaller foot print may give comfort while you develop applications on specific data collection of a single entity.

(e) It is a possibility: what you thought as a single value data may turn out to be really multiple values in future. e.g. credit limit is a single value field as of now. But tomorrow, you may decide to change the values as (date from, date to, credit value). Split tables might come handy now.

My vote would be for multiple tables - with data appropriately split.

Good luck.

37
votes

Combining the tables is called denormalizing.

It may (or may not) help to make some queries (which make lots of JOINs) to run faster at the expense of creating a maintenance hell.

MySQL is capable of using only JOIN method, namely NESTED LOOPS.

This means that for each record in the driving table, MySQL locates a matching record in the driven table in a loop.

Locating a record is quite a costly operation which may take dozens times as long as the pure record scanning.

Moving all your records into one table will help you to get rid of this operation, but the table itself grows larger, and the table scan takes longer.

If you have lots of records in other tables, then increase in the table scan can overweight benefits of the records being scanned sequentially.

Maintenance hell, on the other hand, is guaranteed.

17
votes

Are all of them 1:1 relationships? I mean, if a user could belong to, say, different user levels, or if the users interests are represented as several records in the user interests table, then merging those tables would be out of the question immediately.

Regarding previous answers about normalization, it must be said that the database normalization rules have completely disregarded performance, and is only looking at what is a neat database design. That is often what you want to achieve, but there are times when it makes sense to actively denormalize in pursuit of performance.

All in all, I'd say the question comes down to how many fields there are in the tables, and how often they are accessed. If user activity is often not very interesting, then it might just be a nuisance to always have it on the same record, for performance and maintenance reasons. If some data, like settings, say, are accessed very often, but simply contains too many fields, it might also not be convenient to merge the tables. If you're only interested in the performance gain, you might consider other approaches, such as keeping the settings separate, but saving them in a session variable of their own so that you don't have to query the database for them very often.

15
votes

Do all of those tables have a 1-to-1 relationship? For example, will each user row only have one corresponding row in user_stats or user_levels? If so, it might make sense to combine them into one table. If the relationship is not 1 to 1 though, it probably wouldn't make sense to combine (denormalize) them.

Having them in separate tables vs. one table is probably going to have little effect on performance though unless you have hundreds of thousands or millions of user records. The only real gain you'll get is from simplifying your queries by combining them.

ETA:

If your concern is about having too many columns, then think about what stuff you typically use together and combine those, leaving the rest in a separate table (or several separate tables if needed).

If you look at the way you use the data, my guess is that you'll find that something like 80% of your queries use 20% of that data with the remaining 80% of the data being used only occasionally. Combine that frequently used 20% into one table, and leave the 80% that you don't often use in separate tables and you'll probably have a good compromise.

9
votes

Creating one massive table goes against relational database principals. I wouldn't combine all them into one table. Your going to get multiple instances of repeated data. If your user has three interests for example, you will have 3 rows, with the same user data in just to store the three different interests. Definatly go for the multiple 'normalized' table approach. See this Wiki page for database normalization.

Edit: I have updated my answer, as you have updated your question... I agree with my initial answer even more now since...

a large portion of these cells are likely to remain empty

If for example, a user didn't have any interests, if you normalize then you simple wont have a row in the interest table for that user. If you have everything in one massive table, then you will have columns (and apparently a lot of them) that contain just NULL's.

I have worked for a telephony company where there has been tons of tables, getting data could require many joins. When the performance of reading from these tables was critical then procedures where created that could generate a flat table (i.e. a denormalized table) that would require no joins, calculations etc that reports could point to. These where then used in conjunction with a SQL server agent to run the job at certain intervals (i.e. a weekly view of some stats would run once a week and so on).

7
votes

Why not use the same approach Wordpress does by having a users table with basic user information that everyone has and then adding a "user_meta" table that can basically be any key, value pair associated with the user id. So if you need to find all the meta information for the user you could just add that to your query. You would also not always have to add the extra query if not needed for things like logging in. The benefit to this approach also leaves your table open to adding new features to your users such as storing their twitter handle or each individual interest. You also won't have to deal with a maze of associated ID's because you have one table that rules all metadata and you will limit it to only one association instead of 50.

Wordpress specifically does this to allow for features to be added via plugins, therefore allowing for your project to be more scalable and will not require a complete database overhaul if you need to add a new feature.

5
votes

I think this is one of those "it depends" situation. Having multiple tables is cleaner and probably theoretically better. But when you have to join 6-7 tables to get information about a single user, you might start to rethink that approach.

1
votes

I would say it depends on what the other tables really mean. Does a user_details contain more then 1 more / users and so on. What level on normalization is best suited for your needs depends on your demands.

If you have one table with good index that would probably be faster. But on the other hand probably more difficult to maintain.

To me it look like you could skip User_Details as it probably is 1 to 1 relation with Users. But the rest are probably alot of rows per user?