Is it just that nvarchar
supports multibyte characters? If that is the case, is there really any point, other than storage concerns, to using varchars
?
20 Answers
An nvarchar
column can store any Unicode data. A varchar
column is restricted to an 8-bit codepage. Some people think that varchar
should be used because it takes up less space. I believe this is not the correct answer. Codepage incompatabilities are a pain, and Unicode is the cure for codepage problems. With cheap disk and memory nowadays, there is really no reason to waste time mucking around with code pages anymore.
All modern operating systems and development platforms use Unicode internally. By using nvarchar
rather than varchar
, you can avoid doing encoding conversions every time you read from or write to the database. Conversions take time, and are prone to errors. And recovery from conversion errors is a non-trivial problem.
If you are interfacing with an application that uses only ASCII, I would still recommend using Unicode in the database. The OS and database collation algorithms will work better with Unicode. Unicode avoids conversion problems when interfacing with other systems. And you will be preparing for the future. And you can always validate that your data is restricted to 7-bit ASCII for whatever legacy system you're having to maintain, even while enjoying some of the benefits of full Unicode storage.
varchar: Variable-length, non-Unicode character data. The database collation determines which code page the data is stored using.
nvarchar: Variable-length Unicode character data. Dependent on the database collation for comparisons.
Armed with this knowledge, use whichever one matches your input data (ASCII v. Unicode).
It depends on how Oracle was installed. During the installation process, the NLS_CHARACTERSET option is set. You may be able to find it with the query SELECT value$ FROM sys.props$ WHERE name = 'NLS_CHARACTERSET'
.
If your NLS_CHARACTERSET is a Unicode encoding like UTF8, great. Using VARCHAR and NVARCHAR are pretty much identical. Stop reading now, just go for it. Otherwise, or if you have no control over the Oracle character set, read on.
VARCHAR — Data is stored in the NLS_CHARACTERSET encoding. If there are other database instances on the same server, you may be restricted by them; and vice versa, since you have to share the setting. Such a field can store any data that can be encoded using that character set, and nothing else. So for example if the character set is MS-1252, you can only store characters like English letters, a handful of accented letters, and a few others (like € and —). Your application would be useful only to a few locales, unable to operate anywhere else in the world. For this reason, it is considered A Bad Idea.
NVARCHAR — Data is stored in a Unicode encoding. Every language is supported. A Good Idea.
What about storage space? VARCHAR is generally efficient, since the character set / encoding was custom-designed for a specific locale. NVARCHAR fields store either in UTF-8 or UTF-16 encoding, base on the NLS setting ironically enough. UTF-8 is very efficient for "Western" languages, while still supporting Asian languages. UTF-16 is very efficient for Asian languages, while still supporting "Western" languages. If concerned about storage space, pick an NLS setting to cause Oracle to use UTF-8 or UTF-16 as appropriate.
What about processing speed? Most new coding platforms use Unicode natively (Java, .NET, even C++ std::wstring from years ago!) so if the database field is VARCHAR it forces Oracle to convert between character sets on every read or write, not so good. Using NVARCHAR avoids the conversion.
Bottom line: Use NVARCHAR! It avoids limitations and dependencies, is fine for storage space, and usually best for performance too.
My two cents
Indexes can fail when not using the correct datatypes:
In SQL Server: When you have an index over a VARCHAR column and present it a Unicode String, SQL Server does not make use of the index. The same thing happens when you present a BigInt to a indexed-column containing SmallInt. Even if the BigInt is small enough to be a SmallInt, SQL Server is not able to use the index. The other way around you do not have this problem (when providing SmallInt or Ansi-Code to an indexed BigInt ot NVARCHAR column).Datatypes can vary between different DBMS's (DataBase Management System):
Know that every database has slightly different datatypes and VARCHAR does not means the same everywhere. While SQL Server has VARCHAR and NVARCHAR, an Apache/Derby database has only VARCHAR and there VARCHAR is in Unicode.
Mainly nvarchar stores Unicode characters and varchar stores non-Unicode characters.
"Unicodes" means 16-bit character encoding scheme allowing characters from lots of other languages like Arabic, Hebrew, Chinese, Japanese, to be encoded in a single character set.
That means unicodes is using 2 bytes per character to store and nonunicodes uses only one byte per character to store. Which means unicodes need double capacity to store compared to non-unicodes.
You're right. nvarchar
stores Unicode data while varchar
stores single-byte character data. Other than storage differences (nvarchar
requires twice the storage space as varchar
), which you already mentioned, the main reason for preferring nvarchar
over varchar
would be internationalization (i.e. storing strings in other languages).
I would say, it depends.
If you develop a desktop application, where the OS works in Unicode (like all current Windows systems) and language does natively support Unicode (default strings are Unicode, like in Java or C#), then go nvarchar.
If you develop a web application, where strings come in as UTF-8, and language is PHP, which still does not support Unicode natively (in versions 5.x), then varchar will probably be a better choice.
Although NVARCHAR
stores Unicode, you should consider by the help of collation also you can use VARCHAR
and save your data of your local languages.
Just imagine the following scenario.
The collation of your DB is Persian and you save a value like 'علی' (Persian writing of Ali) in the VARCHAR(10)
datatype. There is no problem and the DBMS only uses three bytes to store it.
However, if you want to transfer your data to another database and see the correct result your destination database must have the same collation as the target which is Persian in this example.
If your target collation is different, you see some question marks(?) in the target database.
Finally, remember if you are using a huge database which is for usage of your local language, I would recommend to use location instead of using too many spaces.
I believe the design can be different. It depends on the environment you work on.
I had a look at the answers and many seem to recommend to use nvarchar
over varchar
, because space is not a problem anymore, so there is no harm in enabling Unicode for little extra storage. Well, this is not always true when you want to apply an index over your column. SQL Server has a limit of 900 bytes on the size of the field you can index. So if you have a varchar(900)
you can still index it, but not varchar(901)
. With nvarchar
, the number of characters is halved, so you can index up to nvarchar(450)
. So if you are confident you don't need nvarchar
, I don't recommend using it.
In general, in databases, I recommend sticking to the size you need, because you can always expand. For example, a colleague at work once thought that there is no harm in using nvarchar(max)
for a column, as we have no problem with storage at all. Later on, when we tried to apply an index over this column, SQL Server rejected this. If, however, he started with even varchar(5)
, we could have simply expanded it later to what we need without such a problem that will require us to do a field migration plan to fix this problem.
The main difference between Varchar(n)
and nvarchar(n)
is:
Varchar
( Variable-length, non-Unicode character data) size is upto 8000.
1.It is a variable length data type
Used to store non-Unicode characters
Occupies 1 byte of space for each character
Nvarchar
:Variable-length Unicode character data.
1.It is a variable-length data type
2.Used to store Unicode characters.
- Data is stored in a Unicode encoding. Every language is supported. (for example the languages Arabic, German,Hindi,etc and so on)
If a single byte is used to store a character, there are 256 possible combinations, and thereby you can save 256 different characters. Collation is the pattern which defines the characters and the rules by which they are compared and sorted.
1252, which is the Latin1 (ANSI), is the most common. Single-byte character sets are also inadequate to store all the characters used by many languages. For example, some Asian languages have thousands of characters, so they must use two bytes per character.
Unicode standard
When systems using multiple code pages are used in a network, it becomes difficult to manage communication. To standardize things, the ISO and Unicode consortium introduced the Unicode. Unicode uses two bytes to store each character. That is 65,536 different characters can be defined, so almost all the characters can be covered with Unicode. If two computers use Unicode, every symbol will be represented in the same way and no conversion is needed - this is the idea behind Unicode.
SQL Server has two categories of character datatypes:
- non-Unicode (char, varchar, and text)
- Unicode (nchar, nvarchar, and ntext)
If we need to save character data from multiple countries, always use Unicode.
I have to say here (I realise that I'm probably going to open myself up to a slating!), but surely the only time when NVARCHAR
is actually more useful (notice the more there!) than VARCHAR
is when all of the collations on all of the dependant systems and within the database itself are the same...? If not then collation conversion has to happen anyway and so makes VARCHAR
just as viable as NVARCHAR
.
To add to this, some database systems, such as SQL Server (before 2012) have a page size of approx. 8K. So, if you're looking at storing searchable data not held in something like a TEXT
or NTEXT
field then VARCHAR
provides the full 8k's worth of space whereas NVARCHAR
only provides 4k (double the bytes, double the space).
I suppose, to summarise, the use of either is dependent on:
- Project or context
- Infrastructure
- Database system
Follow Difference Between Sql Server VARCHAR and NVARCHAR Data Type. Here you could see in a very descriptive way.
In generalnvarchar stores data as Unicode, so, if you're going to store multilingual data (more than one language) in a data column you need the N variant.
Jeffrey L Whitledge with ~47000 reputation score recommends usage of nvarchar
Solomon Rutzky with with ~33200 reputation score recommends: Do NOT always use NVARCHAR. That is a very dangerous, and often costly, attitude / approach.
What are the main performance differences between varchar and nvarchar SQL Server data types?
https://www.sqlservercentral.com/articles/disk-is-cheap-orly-4
Both persons of such a high reputation, what does a learning sql server database developer choose?
There are many warnings in answers and comments about performance issues if you are not consistent in choices.
There are comments pro/con nvarchar for performance.
There are comments pro/con varchar for performance.
I have a particular requirement for a table with many hundreds of columns, which in itself is probably unusual ?
I'm choosing varchar to avoid going close to the 8060 byte table record size limit of SQL*server 2012.
Use of nvarchar, for me, goes over this 8060 byte limit.
I'm also thinking that I should match the data types of the related code tables to the data types of the primary central table.
I have seen use of varchar column at this place of work, South Australian Government, by previous experienced database developers, where the table row count is going to be several millions or more (and very few nvarchar columns, if any, in these very large tables), so perhaps the expected data row volumes becomes part of this decision.
varchar
is used for non-Unicode characters
only on the other hand nvarchar
is used for both unicode
and non-unicode
characters. Some other difference between them is given bellow.
VARCHAR vs. NVARCHAR
VARCHAR | NVARCHAR | |
---|---|---|
Character Data Type | Variable-length, non-Unicode characters | Variable-length, both Unicode and non-Unicode characters such as Japanese, Korean, and Chinese. |
Maximum Length | Up to 8,000 characters |
Up to 4,000 characters |
Character Size | Takes up 1 byte per character |
Takes up 2 bytes per Unicode/Non-Unicode character |
Storage Size | Actual Length (in bytes) | 2 times Actual Length (in bytes) |
Usage | Used when data length is variable or variable length columns and if actual data is always way less than capacity | Due to storage only, used only if you need Unicode support such as the Japanese Kanji or Korean Hangul characters. |
Since SQL Server 2019 varchar columns support UTF-8 encoding.
Thus, from now on, the difference is size.
In a database system that translates to difference in speed.
Less size = Less IO + Less Memory = More speed in general. Read the article above for the numbers.
Go for varchar in UTF8 from now on!
Only if you have big percentage of data with characters in ranges 2048 - 16383 and 16384 – 65535 - you will have to measure
nvarchar
is safe to use compared to varchar
in order to make our code error free (type mismatching) because nvarchar
allows unicode characters also.
When we use where
condition in SQL Server query and if we are using =
operator, it will throw error some times. Probable reason for this is our mapping column will be difined in varchar
. If we defined it in nvarchar
this problem my not happen. Still we stick to varchar
and avoid this issue we better use LIKE
key word rather than =
.