2
votes

In the Snowflake database data definition language, when I specify a VARCHAR(10) column, what is the unit of measure of the length "10"? Is it the number of bytes or the number of characters?

That is, if I want to store a single, 4-byte character, do I need to specify VARCHAR(1) or VARCHAR(4)?

I'm looking at the documentation here, https://docs.snowflake.net/manuals/sql-reference/data-types-text.html, and I see

The maximum number of Unicode characters that can be stored in a VARCHAR column depends on whether the characters are singlebyte or multibyte:

and

If a length is not specified, the default is the maximum length.

and

Snowflake currently deviates from common CHAR semantics in that strings shorter than the maximum length are not space-padded at the end.

But, I'm still not sure what the unit of measure of the length specifier is.

1

1 Answers

3
votes

It's characters, as opposed to bytes or code points. Here's a simple example:

create table t (x varchar(1)) as select '123' AS x;

String '123' is too long and would be truncated

Versus:

create or replace table t (x varchar(1)) as select '😊' AS x;

Table T successfully created.

The smiling face Unicode character is U+1F60A, which is more than a single byte.