Are there disadvantages to using a generic varchar(255) for all text-based fields?

Question

I have a contacts table which contains fields such as postcode, first name, last name, town, country, phone number etc, all of which are defined as VARCHAR(255) even though none of these fields will ever come close to having 255 characters. (If you're wondering, it's this way because Ruby on Rails migrations map String fields to VARCHAR(255) by default and I never bothered to override it).

Since VARCHAR will only store the number of actual characters of the field (along with the field length), is there any distinct advantage (performance or otherwise) to using, say, VARCHAR(16) over VARCHAR(255)?

Additionally, most of these fields have indexes on them. Does a larger VARCHAR size on the field affect the size or performance of the index at all?

FYI I'm using MySQL 5.

@ceejayoz, stating that the accepted answer is incorrect without explaining why doesn't really help. What makes it even worse is that the accepted answer can change over time and your comment will confuse people into thinking the new accepted answer is incorrect. — Gili
@Gili Deleted my comment as the OP apparently changed their acceptance. Good points, in the future I'll indicate which answer I'm talking about and why. — ceejayoz
Some other answers at this duplicate question, stackoverflow.com/questions/1262174/… — James McMahon

Bill Karwin Bill Karwin · Accepted Answer · 2008-11-04T16:56:03

In storage, VARCHAR(255) is smart enough to store only the length you need on a given row, unlike CHAR(255) which would always store 255 characters.

But since you tagged this question with MySQL, I'll mention a MySQL-specific tip: as rows are copied from the storage engine layer to the SQL layer, VARCHAR fields are converted to CHAR to gain the advantage of working with fixed-width rows. So the strings in memory become padded out to the maximum length of your declared VARCHAR column.

When your query implicitly generates a temporary table, for instance while sorting or GROUP BY, this can use a lot of memory. If you use a lot of VARCHAR(255) fields for data that doesn't need to be that long, this can make the temporary table very large.

You may also like to know that this "padding out" behavior means that a string declared with the utf8 character set pads out to three bytes per character even for strings you store with single-byte content (e.g. ascii or latin1 characters). And likewise utf8mb4 character set causes the string to pad out to four bytes per character in memory.

So a VARCHAR(255) in utf8 storing a short string like "No opinion" takes 11 bytes on disk (ten lower-charset characters, plus one byte for length) but it takes 765 bytes in memory, and thus in temp tables or sorted results.

I have helped MySQL users who unknowingly created 1.5GB temp tables frequently and filled up their disk space. They had lots of VARCHAR(255) columns that in practice stored very short strings.

It's best to define the column based on the type of data that you intend to store. It has benefits to enforce application-related constraints, as other folks have mentioned. But it has the physical benefits to avoid the memory waste I described above.

It's hard to know what the longest postal address is, of course, which is why many people choose a long VARCHAR that is certainly longer than any address. And 255 is customary because it is the maximum length of a VARCHAR for which the length can be encoded with one byte. It was also the maximum VARCHAR length in MySQL older than 5.0.

Are there disadvantages to using a generic varchar(255) for all text-based fields?

7 Answers